3 Repos
Utilities for verifying, cleaning, and ensuring the integrity of data during import or processing operations.
Distinguishing note: Focuses on error handling and validation logic for data pipelines, distinct from general database management.
Explore 3 awesome GitHub repositories matching data & databases · Data Validation Tools. Refine with filters or upvote what's useful.
Twenty is a headless customer relationship management framework that enables developers to build, version, and deploy custom business applications using code. By utilizing a declarative approach to data modeling, the platform allows for the definition of custom objects, fields, and complex relationships directly within the source code. This schema-driven architecture automatically generates corresponding REST and GraphQL APIs, ensuring that data structures and interface components remain synchronized across development and production environments. The platform distinguishes itself through a m
Resolve data import errors by identifying, editing, or removing problematic records directly within the import interface before finalizing the data upload.
This project is a comprehensive framework for engineering financial data pipelines, designed to automate the collection, cleaning, and synchronization of large-scale market datasets. It functions as a quantitative trading data engine, providing the infrastructure necessary to manage historical and real-time asset pricing information for research and machine learning workflows. The system distinguishes itself through a configuration-driven approach to orchestration, allowing users to manage complex data acquisition tasks across multiple financial providers. It features resilient middleware tha
Compares results from multiple providers to verify data accuracy and consistency before analytical processing.
Great Expectations is a data quality testing framework and observability platform designed to monitor the reliability of data pipelines. It provides a structured environment for defining, documenting, and automating data quality assertions, allowing teams to validate datasets against expected structure and content before they move through downstream processes. The project distinguishes itself through a declarative domain-specific language that stores quality rules as version-controlled configuration files. It utilizes an execution engine abstraction to translate these high-level assertions in
Integrates into data processing workflows to enforce quality standards and monitor reliability across diverse storage environments.