6 repositorios
Processes that merge disparate data sources while detecting contradictions between different representations of the same information.
Distinguishing note: None of the candidates cover the specific act of merging multi-format sources with conflict detection.
Explore 6 awesome GitHub repositories matching data & databases · Data Source Unification. Refine with filters or upvote what's useful.
Falcor is a JavaScript library that models remote data as a single virtual JSON graph, providing a path-based query engine for efficient client-side data retrieval and updates. It represents multiple remote data sources as a unified document where entities are accessed via globally unique identity paths. The system distinguishes itself by treating the remote data model as a virtual JSON resource, allowing the client to query specific paths without managing individual endpoints. It uses a reference-aware graph model to handle many-to-many relationships and prevents data duplication. Network ef
Represents multiple remote data sources as a single virtual JSON model for consistent access.
Skill Seekers is a toolset for generating large language model knowledge bases, featuring a multi-source content scraper and a dedicated RAG data pipeline. It extracts technical data from documentation, code, and video to create structured assets and configuration files for AI-powered IDE extensions. The project distinguishes itself through the ability to transform raw data into polished tutorials and specialized skills for AI plugin marketplaces. It utilizes abstract syntax tree parsing and optical character recognition to analyze GitHub repositories, PDFs, and video frames, converting these
Merges content from docs, code, and documents while detecting conflicts between documentation and implementation.
CUE is a constraint-based configuration language designed for data validation, schema definition, and code generation. At its core, it unifies types and values into a single concept, enabling compile-time validation that catches structural and value errors before runtime. The language treats data and constraints as the same thing, allowing a single definition to serve as both a schema and concrete configuration data. CUE distinguishes itself through its constraint-based unification engine, which combines multiple configuration sources into a single coherent result by merging their constraints
Combines data from different sources by merging constraints into a single consistent result.
Blitzar es un motor de pruebas SQL verificables y una librería criptográfica diseñada para la computación SQL verificable. Permite la ejecución de consultas de base de datos fuera de la cadena (off-chain) mientras genera pruebas de conocimiento cero que certifican la corrección de los resultados para su verificación en la cadena (on-chain). El proyecto se distingue por un acelerador de pruebas basado en GPU que descarga cargas de trabajo criptográficas pesadas a procesadores gráficos, reduciendo el tiempo requerido para la generación de pruebas sucintas. Proporciona primitivas criptográficas de alto rendimiento para aplicaciones en C++ y Rust, centrándose en operaciones de curvas elípticas y multiplicación multiescalar. El sistema cubre una amplia superficie de gestión de datos y seguridad, incluyendo la integración de datos sin confianza (trustless) que combina la indexación de blockchain con conjuntos de datos fuera de la cadena en tablas relacionales a prueba de manipulaciones. Utiliza consenso BFT y firmas de umbral para mantener la integridad del estado, junto con mecanismos para la sincronización de datos basada en quórum y la entrega de resultados verificados mediante callbacks de contratos inteligentes. El código base proporciona bindings nativos para C++ y Rust para exponer sus conjuntos de herramientas criptográficas y librerías de computación de pruebas.
Unifies real-time indexed blockchain data and off-chain datasets into a single verifiable source.
Plunk is an SMTP email marketing platform and contact relationship manager used for sending bulk broadcasts and transactional emails. It provides a transactional email API for delivering personalized messages using templates and variable substitution, supported by built-in analytics and custom domain authentication. The platform features an email automation workflow engine with a visual builder for creating multi-step sequences triggered by user events and conditional logic. It includes a dynamic audience segmentation tool that groups contacts based on real-time data attributes and behavioral
Merges transactional, campaign, and workflow interactions into a single comprehensive user record.
Gravitino is a federated metadata lake and unified data catalog designed to manage tables, files, and AI models across diverse data sources and cloud storage. It serves as a centralized interface for governing schemas, access controls, and tagging across relational databases, messaging queues, and object stores. The project distinguishes itself by unifying the management of AI assets, such as machine learning models and their version lineages, alongside traditional tabular data. It also implements the Iceberg REST specification to provide a standardized metadata server and proxy for lakehouse
Organizes metadata from diverse sources into a hierarchical structure of metalakes, catalogs, and schemas.