# great-expectations/great_expectations

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/great-expectations-great-expectations).**

11,558 stars · 1,762 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/great-expectations/great_expectations
- Homepage: https://docs.greatexpectations.io/
- awesome-repositories: https://awesome-repositories.com/repository/great-expectations-great-expectations.md

## Topics

`cleandata` `data-engineering` `data-profilers` `data-profiling` `data-quality` `data-science` `data-unit-tests` `datacleaner` `datacleaning` `dataquality` `dataunittest` `eda` `exploratory-analysis` `exploratory-data-analysis` `exploratorydataanalysis` `mlops` `pipeline` `pipeline-debt` `pipeline-testing` `pipeline-tests`

## Description

Great Expectations is a data quality testing framework and observability platform designed to monitor the reliability of data pipelines. It provides a structured environment for defining, documenting, and automating data quality assertions, allowing teams to validate datasets against expected structure and content before they move through downstream processes.

The project distinguishes itself through a declarative domain-specific language that stores quality rules as version-controlled configuration files. It utilizes an execution engine abstraction to translate these high-level assertions into native queries for various data processing frameworks, while a rendering engine automatically transforms these rules and validation outcomes into human-readable documentation for stakeholders.

The platform supports a broad range of operational capabilities, including the ability to connect to diverse data sources and persist metadata and validation results across distributed environments. It integrates directly into existing orchestration pipelines to automate recurring quality checks, track data health trends over time, and trigger notifications when datasets deviate from established benchmarks.

## Tags

### Data & Databases

- [Data Quality Frameworks](https://awesome-repositories.com/f/data-databases/data-quality-frameworks.md) — Provides human-readable methods for defining declarative rules to validate data structure and content. ([source](https://docs.greatexpectations.io/docs/0.18/core/introduction/introduction))
- [Data Validation Libraries](https://awesome-repositories.com/f/data-databases/data-validation-libraries.md) — Compares datasets against predefined rules to identify anomalies and schema deviations. ([source](https://docs.greatexpectations.io/docs/reference))
- [Data Validation Tools](https://awesome-repositories.com/f/data-databases/data-validation-tools.md) — Integrates into data processing workflows to enforce quality standards and monitor reliability across diverse storage environments.
- [Query Abstraction Layers](https://awesome-repositories.com/f/data-databases/query-abstraction-layers.md) — Provides a modular backend layer that translates high-level validation rules into native queries for various data processing frameworks.
- [Validation Rule Applications](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-management-governance/data-integrity-validation/data-validation/validation-rule-applications.md) — Executes predefined quality assertions against datasets to identify discrepancies and validate compliance with internal standards. ([source](https://docs.greatexpectations.io/docs/reference/))
- [Data Ingestion Sources](https://awesome-repositories.com/f/data-databases/data-ingestion-sources.md) — Executes validation logic natively across diverse storage formats including local files, dataframes, and remote databases. ([source](https://docs.greatexpectations.io/docs/0.18/core/introduction/introduction))
- [Data Source Connections](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/data-integration/data-source-connections.md) — Connects to various cloud storage and database platforms to validate data consistency across environments. ([source](https://docs.greatexpectations.io/docs/help/compatibility_reference))
- [Data Pipeline Orchestration](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration.md) — Integrates validation tasks into automated workflows to ensure data reliability during scheduled processing jobs. ([source](https://docs.greatexpectations.io/docs/help/compatibility_reference))
- [Data Quality Monitors](https://awesome-repositories.com/f/data-databases/data-pipelines/data-quality-monitors.md) — Monitors data pipeline reliability by tracking validation results and alerting teams to quality regressions.
- [Pluggable Connector Frameworks](https://awesome-repositories.com/f/data-databases/pluggable-storage-drivers/pluggable-connector-frameworks.md) — Implements a standardized interface layer to read and validate data from diverse storage systems without modifying core logic.
- [Data Processing Workflows](https://awesome-repositories.com/f/data-databases/data-processing-workflows.md) — Integrates validation steps directly into data processing workflows to ensure reliability during scheduled jobs.
- [Data Reporting](https://awesome-repositories.com/f/data-databases/data-reporting.md) — Renders validation results and rule definitions into human-readable documentation for stakeholders. ([source](https://docs.greatexpectations.io/docs/0.18/core/introduction/introduction))
- [Validation Result Serializers](https://awesome-repositories.com/f/data-databases/data-structures/structured-return-objects/json-response-serializers/validation-result-serializers.md) — Captures validation outcomes as structured JSON objects to provide a machine-readable audit trail of data health.
- [Result Persistence Layers](https://awesome-repositories.com/f/data-databases/result-persistence-layers.md) — Persists validation metadata and historical test outcomes to configurable backends for long-term tracking. ([source](https://docs.greatexpectations.io/docs/0.18/core/introduction/introduction))
- [Database Connection Configurations](https://awesome-repositories.com/f/data-databases/database-management-systems/database-systems-management/connection-transaction-management/database-connection-configurations.md) — Provides configuration mechanisms for establishing and managing connections to diverse data storage systems and databases. ([source](https://docs.greatexpectations.io/docs/reference/))

### Software Engineering & Architecture

- [Declarative Configuration Languages](https://awesome-repositories.com/f/software-engineering-architecture/declarative-configuration-languages.md) — Uses a declarative domain-specific language to define data quality rules as version-controlled configuration files.
- [Expectation Documentation Generators](https://awesome-repositories.com/f/software-engineering-architecture/code-documentation-standards/documentation-validators/expectation-documentation-generators.md) — Generates human-readable documentation from validation rules to provide transparency into expected data quality. ([source](https://docs.greatexpectations.io/))

### System Administration & Monitoring

- [Observability Platforms](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms.md) — Acts as a comprehensive platform for tracking data health, generating reports, and alerting on pipeline anomalies.
- [Application Quality Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/application-quality-monitoring.md) — Tracks data quality metrics over time to proactively identify regressions and alert teams to pipeline deviations. ([source](https://docs.greatexpectations.io/))

### Content Management & Publishing

- [Data Quality Reports](https://awesome-repositories.com/f/content-management-publishing/static-site-document-generators/static-site-generators/static-documentation-generation/data-quality-reports.md) — Transforms validation rules and results into human-readable documentation for data transparency.

### Development Tools & Productivity

- [Automated Workflow Schedulers](https://awesome-repositories.com/f/development-tools-productivity/build-tooling/build-orchestration-logic/build-orchestration-configuration/build-automation-systems/automation/automated-workflow-schedulers.md) — Schedules recurring data quality checks to ensure continuous monitoring of data health throughout the pipeline lifecycle. ([source](https://docs.greatexpectations.io/docs/reference/))
- [Workflow Schedulers](https://awesome-repositories.com/f/development-tools-productivity/workflow-schedulers.md) — Executes automated validation tasks on schedules or triggers to maintain quality standards across environments. ([source](https://docs.greatexpectations.io/docs/reference))
- [Automated Documentation Generators](https://awesome-repositories.com/f/development-tools-productivity/code-quality-analysis/static-analysis-engines/static-analysis-tools/code-quality-tools/automated-documentation-generators.md) — Automatically generates human-readable documentation from data quality rules to provide transparency into data assets.

### DevOps & Infrastructure

- [Data Context Managers](https://awesome-repositories.com/f/devops-infrastructure/configuration-management/declarative-configuration-frameworks/metadata-driven-configurations/data-context-managers.md) — Centralizes project configuration and metadata to ensure consistent validation across distributed environments.
- [Project Configuration Managers](https://awesome-repositories.com/f/devops-infrastructure/configuration-management/configuration-resolution-engines/project-configuration-managers.md) — Organizes project configuration and metadata in a centralized environment to maintain consistency across shared data projects. ([source](https://docs.greatexpectations.io/docs/reference))
