8 repositorios
Tools for parsing and transforming database transaction logs into structured data streams.
Distinguishing note: Focuses on high-throughput log parsing rather than general database management.
Explore 8 awesome GitHub repositories matching data & databases · Log Processing Engines. Refine with filters or upvote what's useful.
Canal is a database replication middleware that performs change data capture by simulating a database replica. It monitors transaction logs to stream incremental data modifications to downstream systems in real time, acting as an event streaming infrastructure that transforms low-level binary logs into structured, consumable message streams. The project distinguishes itself through a high-throughput architecture that utilizes concurrent multi-threaded parsing and stateful log position tracking to ensure reliable data delivery. It employs a pluggable sink architecture that decouples data extra
Processes high-volume database logs using parallel worker threads to maximize throughput and minimize latency.
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Reduce storage requirements by keeping only the latest value for each key in a log to enable efficient state reconstruction and snapshotting of historical data streams.
NeDB is a JavaScript embedded NoSQL document store designed for Node.js and the browser. It functions as an in-memory data store with the option to persist documents to a local file system, ensuring data survives application restarts. The project utilizes a MongoDB-compatible API to perform data operations, allowing it to serve as a lightweight document indexing system and a persistent file database without requiring a separate database server. Capabilities include querying, inserting, updating, and deleting documents, as well as the ability to create indexes on specific fields to accelerate
Provides log compaction to remove obsolete records and reclaim disk space by rewriting the data log.
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with t
Implements log compaction to remove obsolete data markers and keep only the most recent version of each key.
Este proyecto es una traducción al idioma chino del artículo de investigación original que detalla el protocolo de consenso Raft. Sirve como una traducción de investigación técnica y una guía del protocolo de consenso, haciendo que las especificaciones del algoritmo Raft sean accesibles para los hablantes de chino. La documentación cubre los mecanismos centrales de los sistemas distribuidos, incluyendo la elección de líder, la replicación de registros y los protocolos de seguridad. Proporciona una explicación detallada de cómo mantener una única fuente de verdad a través de múltiples servidores para lograr una gestión de clusters tolerante a fallos. El material aborda la replicación de máquinas de estado distribuidas y la gestión de registros. Cubre conceptos técnicos como el seguimiento de vivacidad impulsado por latidos (heartbeats), elecciones con tiempo de espera aleatorio, control de versiones de consistencia basado en términos y compactación de registros basada en instantáneas (snapshots).
Describes replacing log sequences with state snapshots to save space and accelerate recovery.
m3 es una base de datos de series temporales distribuida, diseñada para métricas de alta resolución y gestión de datos de alta cardinalidad. Funciona como un sistema de almacenamiento escalable y un motor de consultas multiclúster, proporcionando un agregador de métricas distribuido capaz de realizar downsampling y resumir datos antes de que se confirmen en el almacenamiento. El proyecto se distingue por un modelo de clúster coordinado que utiliza etcd para la pertenencia a nodos y la colocación de shards. Soporta múltiples protocolos de ingesta, incluyendo el protocolo de escritura remota de Prometheus, el protocolo de línea de InfluxDB y el protocolo de texto plano de Graphite Carbon, y proporciona interfaces de consulta compatibles para PromQL y Graphite. El sistema cubre amplias áreas de capacidad, incluyendo almacenamiento de series temporales en columnas, replicación de datos síncrona y distribución de consultas (fan-out) distribuida. Incorpora automatización del ciclo de vida de los datos, ajuste de consistencia basado en quórum e indexación de series basada en etiquetas para mantener la integridad de los datos y la velocidad de recuperación en espacios de nombres aislados. La orquestación del clúster y la colocación de componentes se gestionan mediante herramientas y operadores automatizados para garantizar la alta disponibilidad y una distribución equilibrada de los datos.
Reclaims disk space and improves startup times by transforming raw commit logs into compressed snapshot files.
braft is an embeddable C++ library that implements the Raft consensus algorithm, providing a distributed consensus engine for building fault-tolerant, replicated state machines. At its core, it manages leader election, log replication, cluster membership changes, and state machine synchronization across a cluster of nodes, ensuring strong consistency and data durability even in the face of node failures. The library distinguishes itself through a comprehensive set of mechanisms for reliable distributed coordination. It uses a randomized timeout-based leader election process with term manageme
Periodically snapshots the state machine and truncates the log to prevent unbounded log growth.
NutsDB is an ACID-compliant, embedded transactional storage engine that functions as both a disk-backed key-value store and an in-memory data structure store. It provides atomic and serializable transactions with commit and rollback capabilities to ensure strict data consistency for applications requiring a lightweight persistence layer. The engine distinguishes itself by supporting a variety of complex data types, including lists, sets, and sorted sets, alongside standard byte-slice storage. It implements a transactional storage model featuring hot backups and a compaction algorithm to maint
Implements log compaction to retain only the latest value for each key, reclaiming space and optimizing reads.