News

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. We are happy to receive feedback and contributions. Deequ depends on ...
This repository contains two AWS Glue PySpark ETL jobs designed to process telecom and customer subscription data from the AWS Glue Data Catalog. The jobs demonstrate how to load, transform, and ...
The tech conglomerate announced on Wednesday that it will pour more than $4 billion into building an AWS infrastructure region of data centers in Chile by the end of 2026. The investment will go ...