27 مستودعات
Systems for maintaining consistent data copies across multiple distributed environments.
Distinguishing note: Focuses on the distributed nature of data synchronization.
Explore 27 awesome GitHub repositories matching data & databases · Distributed Data Synchronization Systems. Refine with filters or upvote what's useful.
Canal is a database replication middleware that performs change data capture by simulating a database replica. It monitors transaction logs to stream incremental data modifications to downstream systems in real time, acting as an event streaming infrastructure that transforms low-level binary logs into structured, consumable message streams. The project distinguishes itself through a high-throughput architecture that utilizes concurrent multi-threaded parsing and stateful log position tracking to ensure reliable data delivery. It employs a pluggable sink architecture that decouples data extra
Maintains consistent data copies across multiple environments by reliably replicating records.
Redisson is a Java client library for Redis and Valkey that provides a distributed data structure library, a distributed lock manager, and a distributed MapReduce framework. It enables application instances in a cluster to share state through thread-safe collections and objects. The project implements a JCache compliant caching layer for standardized data storage and retrieval. It also functions as a probabilistic data store, providing memory-efficient structures such as Bloom filters and HyperLogLog for high-volume data membership testing. The library covers distributed state management usi
Synchronizes application state using distributed maps, sets, queues, and locks stored remotely.
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Maintains consistent data across local replicas and remote primary instances for high availability.
TiKV is a cloud-native distributed transactional key-value store and storage engine. It provides a distributed database designed for horizontal scalability and strong consistency across a cluster of physical nodes. The system uses a Raft-based consensus mechanism to maintain data availability and state synchronization. It ensures ACID compliance for distributed transactions through a two-phase commit workflow and manages data distribution via multi-Raft sharding. The engine handles massive datasets using automated range splitting and cluster load balancing to distribute data across different
Spreads data copies across different physical locations to ensure high availability and protection against regional disasters.
Apache Pulsar is a cloud-native distributed pub-sub messaging system designed for high-performance data ingestion. It functions as a geo-replicated data streamer and a multi-tenant event streaming platform, providing a serverless stream processing engine and a tiered storage messaging broker. The system distinguishes itself by separating serving layers from storage layers to allow independent scaling of compute and data retention. It features native geo-replication to synchronize messages across different geographical regions and employs a multi-layered tenant isolation model using authentica
Synchronizes message sequences across geographically distinct regions to ensure high availability and disaster recovery.
Apache Pulsar is a cloud-native message queue and distributed publish-subscribe messaging system. It serves as a multi-tenant event streaming platform designed to route data streams for asynchronous communication between producers and consumers. The system distinguishes itself through geo-replication, synchronizing data across multiple geographic regions to ensure high availability and low latency. It implements a multi-tenant architecture that provides isolation and resource management for millions of independent topics. The platform covers high-throughput data streaming and event-driven da
Implements synchronization of message data across geographically distinct locations to ensure global availability and disaster recovery.
NodeBB is a real-time, self-hosted community forum platform built on Node.js. It is designed to support scalable discussion environments by utilizing a document-oriented database for content storage and an in-memory engine for high-speed data retrieval and session management. The platform provides a comprehensive administrative interface for managing user groups, forum settings, and system health. What distinguishes the platform is its native support for federated social networking via the ActivityPub protocol, allowing forums to exchange content, synchronize discussions, and interact with de
Uses a message broker to enable communication and session sharing between multiple application processes running in a cluster.
Azure Docs is the official technical documentation repository for Microsoft Azure, the cloud computing platform. It provides comprehensive guidance on the full spectrum of Azure services, covering everything from core infrastructure components like virtual machines, Kubernetes clusters, and serverless computing to platform services for AI, machine learning, data analytics, and storage. The documentation details how to provision, manage, and govern cloud resources at scale, including policy enforcement, identity management, and cost optimization. The documentation distinguishes Azure through i
Distributes data across multiple regions using synchronous replication within a region and asynchronous replication between regions for durability.
Mosquitto is a message broker that implements the MQTT protocol to route messages between connected devices and applications. It functions as a central hub for event-driven communication, supporting message exchange over both raw TCP and WebSockets. The software provides a persistent messaging infrastructure by writing message queues and client subscription states to disk, ensuring data recovery following service interruptions. The broker distinguishes itself through its support for distributed system synchronization, allowing for the federation of multiple remote brokers to share data across
Synchronizes data streams across geographic locations by connecting multiple remote brokers.
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
Implements regional replication to synchronize data across geographically distant locations for disaster recovery and local read performance.
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Utilizes CRDT technology to synchronize data across geographically distinct regions for low-latency global access.
Boto3 is the AWS SDK for Python, providing a programmatic interface for managing and automating AWS cloud infrastructure and services. It serves as a cloud management API client and resource manager for provisioning, configuring, and scaling virtual servers, databases, and storage. The library enables the implementation of infrastructure-as-code through declarative templates and scripts, allowing for the deployment of identical resource stacks across multiple accounts and geographic regions. It also provides a framework for coordinating distributed workflows, serverless functions, and contain
Synchronizes data across geographically distinct regions to improve local read/write performance.
This project provides educational materials and courseware focused on the theoretical and practical foundations of distributed systems design. It serves as a comprehensive curriculum covering the disciplines of consensus, data consistency, reliability engineering, and scalability. The instructional content focuses on achieving cluster agreement through consensus algorithms and managing system-wide state via coordination frameworks. It includes a dedicated guide to data theory, exploring replication strategies, consistency models, and data convergence. The courseware covers a broad capability
Teaches the use of regional replication across geographically distinct locations for disaster recovery and latency reduction.
OrbitDB is a decentralized data storage system that enables the creation of serverless databases residing across a network of peers. It functions as a peer-to-peer database that integrates with a content-addressed storage layer to distribute and replicate data without a central server. The system utilizes conflict-free replicated data types to ensure eventual consistency and state convergence across distributed nodes. It maintains an immutable record of updates using a directed acyclic graph to preserve causal ordering and cryptographic integrity. Access is managed through a decentralized ide
Maintains various data models, including immutable logs and document stores, across a network of peers.
Otter is a distributed database synchronization system and change data capture tool designed to replicate data between databases across multiple geographic regions. It functions as a synchronization orchestrator and ETL data pipeline that mirrors records and associated files in real time. The system employs incremental log parsing to capture database changes and utilizes a consistency-based convergence algorithm and loop-avoidance logic to manage bi-directional replication. It processes data through a pipeline of selection, extraction, transformation, and loading to handle joins and format co
Replicates data between databases across multiple geographic regions using incremental log parsing for near real-time consistency.
Noms is a distributed version control database and content-addressable data store. It identifies data by cryptographic hashes to ensure integrity and deduplication, while tracking dataset state changes through a sequence of immutable commits to enable branching, forking, and historical recovery. The system functions as a peer-to-peer data synchronizer, reconciling state between disconnected database instances to ensure all nodes converge on the same data. It distinguishes itself as a schema-flexible document store that supports self-describing types, allowing schemas to evolve and widen as ne
Provides mechanisms to reconcile state and ensure convergence across distributed database instances.
Perkeep is a personal content storage system designed for storing, syncing, and backing up digital assets. It functions as a distributed data synchronization engine and an S3 compatible backup tool, allowing users to persist data objects to cloud services for long-term preservation. The system utilizes a key-value content indexer to track data blobs for efficient retrieval and enumeration. It supports custom data modeling to define structures and relationships between stored information, moving beyond simple file storage. The platform includes capabilities for self-hosted content storage, pr
Implements a synchronization mechanism to keep data blobs consistent across multiple distributed devices and servers.
NetAlertX is a distributed network scanner and asset discovery tool designed to identify connected devices and track unauthorized hardware. It aggregates discovery results from multiple remote monitoring nodes into a single centralized inventory hub to provide unified network visibility. The project distinguishes itself by integrating as a bridge to MQTT brokers for smart home automation and providing a dedicated interface for AI agents to query system data. It employs multi-protocol identity resolution using DNS, mDNS, and NetBIOS to identify hardware and generates synthetic identifiers to e
Maintains a unified network inventory by synchronizing discovery results from distributed remote monitoring nodes.
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Synchronizes shared data structures across multiple application instances to maintain consistent state.
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Moves data from source databases to target systems in real-time or batch mode using a distributed engine.