6 dépôts
Processes that merge disparate data sources while detecting contradictions between different representations of the same information.
Distinguishing note: None of the candidates cover the specific act of merging multi-format sources with conflict detection.
Explore 6 awesome GitHub repositories matching data & databases · Data Source Unification. Refine with filters or upvote what's useful.
Falcor is a JavaScript library that models remote data as a single virtual JSON graph, providing a path-based query engine for efficient client-side data retrieval and updates. It represents multiple remote data sources as a unified document where entities are accessed via globally unique identity paths. The system distinguishes itself by treating the remote data model as a virtual JSON resource, allowing the client to query specific paths without managing individual endpoints. It uses a reference-aware graph model to handle many-to-many relationships and prevents data duplication. Network ef
Represents multiple remote data sources as a single virtual JSON model for consistent access.
Skill Seekers is a toolset for generating large language model knowledge bases, featuring a multi-source content scraper and a dedicated RAG data pipeline. It extracts technical data from documentation, code, and video to create structured assets and configuration files for AI-powered IDE extensions. The project distinguishes itself through the ability to transform raw data into polished tutorials and specialized skills for AI plugin marketplaces. It utilizes abstract syntax tree parsing and optical character recognition to analyze GitHub repositories, PDFs, and video frames, converting these
Merges content from docs, code, and documents while detecting conflicts between documentation and implementation.
CUE is a constraint-based configuration language designed for data validation, schema definition, and code generation. At its core, it unifies types and values into a single concept, enabling compile-time validation that catches structural and value errors before runtime. The language treats data and constraints as the same thing, allowing a single definition to serve as both a schema and concrete configuration data. CUE distinguishes itself through its constraint-based unification engine, which combines multiple configuration sources into a single coherent result by merging their constraints
Combines data from different sources by merging constraints into a single consistent result.
Blitzar est un moteur de preuve SQL vérifiable et une bibliothèque cryptographique conçue pour le calcul SQL vérifiable. Il permet l'exécution de requêtes de base de données hors chaîne tout en générant des preuves à divulgation nulle de connaissance (zero-knowledge proofs) qui certifient l'exactitude des résultats pour une vérification sur chaîne. Le projet se distingue par un accélérateur de preuve accéléré par GPU qui décharge les lourdes charges de travail cryptographiques vers les processeurs graphiques, réduisant le temps requis pour la génération de preuves succinctes. Il fournit des primitives cryptographiques haute performance pour les applications C++ et Rust, se concentrant sur les opérations de courbe elliptique et la multiplication multi-scalaire. Le système couvre une large surface de gestion de données et de sécurité, incluant l'intégration de données sans confiance qui combine l'indexation blockchain avec des jeux de données hors chaîne dans des tables relationnelles inviolables. Il utilise le consensus BFT et les signatures à seuil pour maintenir l'intégrité de l'état, parallèlement à des mécanismes pour la synchronisation de données basée sur le quorum et la livraison de résultats vérifiés via des callbacks de smart contract. La base de code fournit des liaisons natives pour C++ et Rust afin d'exposer ses ensembles d'outils cryptographiques et ses bibliothèques de calcul de preuve.
Unifies real-time indexed blockchain data and off-chain datasets into a single verifiable source.
Plunk is an SMTP email marketing platform and contact relationship manager used for sending bulk broadcasts and transactional emails. It provides a transactional email API for delivering personalized messages using templates and variable substitution, supported by built-in analytics and custom domain authentication. The platform features an email automation workflow engine with a visual builder for creating multi-step sequences triggered by user events and conditional logic. It includes a dynamic audience segmentation tool that groups contacts based on real-time data attributes and behavioral
Merges transactional, campaign, and workflow interactions into a single comprehensive user record.
Gravitino is a federated metadata lake and unified data catalog designed to manage tables, files, and AI models across diverse data sources and cloud storage. It serves as a centralized interface for governing schemas, access controls, and tagging across relational databases, messaging queues, and object stores. The project distinguishes itself by unifying the management of AI assets, such as machine learning models and their version lineages, alongside traditional tabular data. It also implements the Iceberg REST specification to provide a standardized metadata server and proxy for lakehouse
Organizes metadata from diverse sources into a hierarchical structure of metalakes, catalogs, and schemas.