Why is alibaba/canal a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Maintains consistent data copies across multiple environments by reliably replicating records.

Why is mrniko/redisson a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Synchronizes application state using distributed maps, sets, queues, and locks stored remotely.

Why is tursodatabase/libsql a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Maintains consistent data across local replicas and remote primary instances for high availability.

Why is pingcap/tikv a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Spreads data copies across different physical locations to ensure high availability and protection against regional disasters.

Why is apache/pulsar a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Synchronizes message sequences across geographically distinct regions to ensure high availability and disaster recovery.

Why is apache/incubator-pulsar a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Implements synchronization of message data across geographically distinct locations to ensure global availability and disaster recovery.

Why is nodebb/nodebb a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Uses a message broker to enable communication and session sharing between multiple application processes running in a cluster.

Why is microsoftdocs/azure-docs a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Distributes data across multiple regions using synchronous replication within a region and asynchronous replication between regions for durability.

Why is eclipse-mosquitto/mosquitto a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Synchronizes data streams across geographic locations by connecting multiple remote brokers.

Why is yugabyte/yugabyte-db a recommended Distributed Data Synchronization Systems GitHub Repositories repository?

Implements regional replication to synchronize data across geographically distant locations for disaster recovery and local read performance.

27 مستودعات

Awesome GitHub RepositoriesDistributed Data Synchronization Systems

Systems for maintaining consistent data copies across multiple distributed environments.

Distinguishing note: Focuses on the distributed nature of data synchronization.

Explore 27 awesome GitHub repositories matching data & databases · Distributed Data Synchronization Systems. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

alibaba/canal
alibaba/canal
29,697عرض على GitHub
Canal is a database replication middleware that performs change data capture by simulating a database replica. It monitors transaction logs to stream incremental data modifications to downstream systems in real time, acting as an event streaming infrastructure that transforms low-level binary logs into structured, consumable message streams. The project distinguishes itself through a high-throughput architecture that utilizes concurrent multi-threaded parsing and stateful log position tracking to ensure reliable data delivery. It employs a pluggable sink architecture that decouples data extra
Maintains consistent data copies across multiple environments by reliably replicating records.
Java
عرض على GitHub29,697
mrniko/redisson
mrniko/redisson
24,355عرض على GitHub
Redisson is a Java client library for Redis and Valkey that provides a distributed data structure library, a distributed lock manager, and a distributed MapReduce framework. It enables application instances in a cluster to share state through thread-safe collections and objects. The project implements a JCache compliant caching layer for standardized data storage and retrieval. It also functions as a probabilistic data store, providing memory-efficient structures such as Bloom filters and HyperLogLog for high-volume data membership testing. The library covers distributed state management usi
Synchronizes application state using distributed maps, sets, queues, and locks stored remotely.
Java
عرض على GitHub24,355
tursodatabase/libsql
tursodatabase/libsql
16,887عرض على GitHub
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Maintains consistent data across local replicas and remote primary instances for high availability.
Cdatabaseembedded-databaserust
عرض على GitHub16,887
pingcap/tikv
pingcap/tikv
16,724عرض على GitHub
TiKV is a cloud-native distributed transactional key-value store and storage engine. It provides a distributed database designed for horizontal scalability and strong consistency across a cluster of physical nodes. The system uses a Raft-based consensus mechanism to maintain data availability and state synchronization. It ensures ACID compliance for distributed transactions through a two-phase commit workflow and manages data distribution via multi-Raft sharding. The engine handles massive datasets using automated range splitting and cluster load balancing to distribute data across different
Spreads data copies across different physical locations to ensure high availability and protection against regional disasters.
Rust
عرض على GitHub16,724
apache/pulsar
apache/pulsar
15,276عرض على GitHub
Apache Pulsar is a cloud-native distributed pub-sub messaging system designed for high-performance data ingestion. It functions as a geo-replicated data streamer and a multi-tenant event streaming platform, providing a serverless stream processing engine and a tiered storage messaging broker. The system distinguishes itself by separating serving layers from storage layers to allow independent scaling of compute and data retention. It features native geo-replication to synchronize messages across different geographical regions and employs a multi-layered tenant isolation model using authentica
Synchronizes message sequences across geographically distinct regions to ensure high availability and disaster recovery.
Java
عرض على GitHub15,276
apache/incubator-pulsar
apache/incubator-pulsar
15,270عرض على GitHub
Apache Pulsar is a cloud-native message queue and distributed publish-subscribe messaging system. It serves as a multi-tenant event streaming platform designed to route data streams for asynchronous communication between producers and consumers. The system distinguishes itself through geo-replication, synchronizing data across multiple geographic regions to ensure high availability and low latency. It implements a multi-tenant architecture that provides isolation and resource management for millions of independent topics. The platform covers high-throughput data streaming and event-driven da
Implements synchronization of message data across geographically distinct locations to ensure global availability and disaster recovery.
Java
عرض على GitHub15,270
nodebb/nodebb
NodeBB/NodeBB
15,144عرض على GitHub
NodeBB is a real-time, self-hosted community forum platform built on Node.js. It is designed to support scalable discussion environments by utilizing a document-oriented database for content storage and an in-memory engine for high-speed data retrieval and session management. The platform provides a comprehensive administrative interface for managing user groups, forum settings, and system health. What distinguishes the platform is its native support for federated social networking via the ActivityPub protocol, allowing forums to exchange content, synchronize discussions, and interact with de
Uses a message broker to enable communication and session sharing between multiple application processes running in a cluster.
JavaScriptcommunityforumjavascript
عرض على GitHub15,144
microsoftdocs/azure-docs
MicrosoftDocs/azure-docs
10,894عرض على GitHub
Azure Docs is the official technical documentation repository for Microsoft Azure, the cloud computing platform. It provides comprehensive guidance on the full spectrum of Azure services, covering everything from core infrastructure components like virtual machines, Kubernetes clusters, and serverless computing to platform services for AI, machine learning, data analytics, and storage. The documentation details how to provision, manage, and govern cloud resources at scale, including policy enforcement, identity management, and cost optimization. The documentation distinguishes Azure through i
Distributes data across multiple regions using synchronous replication within a region and asynchronous replication between regions for durability.
Markdownskilling
عرض على GitHub10,894
eclipse-mosquitto/mosquitto
eclipse-mosquitto/mosquitto
10,644عرض على GitHub
Mosquitto is a message broker that implements the MQTT protocol to route messages between connected devices and applications. It functions as a central hub for event-driven communication, supporting message exchange over both raw TCP and WebSockets. The software provides a persistent messaging infrastructure by writing message queues and client subscription states to disk, ensuring data recovery following service interruptions. The broker distinguishes itself through its support for distributed system synchronization, allowing for the federation of multiple remote brokers to share data across
Synchronizes data streams across geographic locations by connecting multiple remote brokers.
Cbrokereclipse-iotmosquitto
عرض على GitHub10,644
yugabyte/yugabyte-db
yugabyte/yugabyte-db
10,349عرض على GitHub
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
Implements regional replication to synchronize data across geographically distant locations for disaster recovery and local read performance.
Ccloud-nativecppdatabase
عرض على GitHub10,349
tporadowski/redis
tporadowski/redis
9,987عرض على GitHub
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Utilizes CRDT technology to synchronize data across geographically distinct regions for low-latency global access.
Credisredis-for-windowsredis-msi-installer
عرض على GitHub9,987
boto/boto3
boto/boto3
9,834عرض على GitHub
Boto3 is the AWS SDK for Python, providing a programmatic interface for managing and automating AWS cloud infrastructure and services. It serves as a cloud management API client and resource manager for provisioning, configuring, and scaling virtual servers, databases, and storage. The library enables the implementation of infrastructure-as-code through declarative templates and scripts, allowing for the deployment of identical resource stacks across multiple accounts and geographic regions. It also provides a framework for coordinating distributed workflows, serverless functions, and contain
Synchronizes data across geographically distinct regions to improve local read/write performance.
Pythonawsaws-sdkcloud
عرض على GitHub9,834
aphyr/distsys-class
aphyr/distsys-class
9,717عرض على GitHub
This project provides educational materials and courseware focused on the theoretical and practical foundations of distributed systems design. It serves as a comprehensive curriculum covering the disciplines of consensus, data consistency, reliability engineering, and scalability. The instructional content focuses on achieving cluster agreement through consensus algorithms and managing system-wide state via coordination frameworks. It includes a dedicated guide to data theory, exploring replication strategies, consistency models, and data convergence. The courseware covers a broad capability
Teaches the use of regional replication across geographically distinct locations for disaster recovery and latency reduction.
عرض على GitHub9,717
orbitdb/orbitdb
orbitdb/orbitdb
8,737عرض على GitHub
OrbitDB is a decentralized data storage system that enables the creation of serverless databases residing across a network of peers. It functions as a peer-to-peer database that integrates with a content-addressed storage layer to distribute and replicate data without a central server. The system utilizes conflict-free replicated data types to ensure eventual consistency and state convergence across distributed nodes. It maintains an immutable record of updates using a directed acyclic graph to preserve causal ordering and cryptographic integrity. Access is managed through a decentralized ide
Maintains various data models, including immutable logs and document stores, across a network of peers.
JavaScriptcrdtdatabasedecentralized
عرض على GitHub8,737
alibaba/otter
alibaba/otter
8,127عرض على GitHub
Otter is a distributed database synchronization system and change data capture tool designed to replicate data between databases across multiple geographic regions. It functions as a synchronization orchestrator and ETL data pipeline that mirrors records and associated files in real time. The system employs incremental log parsing to capture database changes and utilizes a consistency-based convergence algorithm and loop-avoidance logic to manage bi-directional replication. It processes data through a pipeline of selection, extraction, transformation, and loading to handle joins and format co
Replicates data between databases across multiple geographic regions using incremental log parsing for near real-time consistency.
Java
عرض على GitHub8,127
attic-labs/noms
attic-labs/noms
7,422عرض على GitHub
Noms is a distributed version control database and content-addressable data store. It identifies data by cryptographic hashes to ensure integrity and deduplication, while tracking dataset state changes through a sequence of immutable commits to enable branching, forking, and historical recovery. The system functions as a peer-to-peer data synchronizer, reconciling state between disconnected database instances to ensure all nodes converge on the same data. It distinguishes itself as a schema-flexible document store that supports self-describing types, allowing schemas to evolve and widen as ne
Provides mechanisms to reconcile state and ensure convergence across distributed database instances.
Go
عرض على GitHub7,422
perkeep/perkeep
perkeep/perkeep
7,117عرض على GitHub
Perkeep is a personal content storage system designed for storing, syncing, and backing up digital assets. It functions as a distributed data synchronization engine and an S3 compatible backup tool, allowing users to persist data objects to cloud services for long-term preservation. The system utilizes a key-value content indexer to track data blobs for efficient retrieval and enumeration. It supports custom data modeling to define structures and relationships between stored information, moving beyond simple file storage. The platform includes capabilities for self-hosted content storage, pr
Implements a synchronization mechanism to keep data blobs consistent across multiple distributed devices and servers.
Go
عرض على GitHub7,117
netalertx/netalertx
netalertx/NetAlertX
6,604عرض على GitHub
NetAlertX is a distributed network scanner and asset discovery tool designed to identify connected devices and track unauthorized hardware. It aggregates discovery results from multiple remote monitoring nodes into a single centralized inventory hub to provide unified network visibility. The project distinguishes itself by integrating as a bridge to MQTT brokers for smart home automation and providing a dedicated interface for AI agents to query system data. It employs multi-protocol identity resolution using DNS, mDNS, and NetBIOS to identify hardware and generates synthetic identifiers to e
Maintains a unified network inventory by synchronizing discovery results from distributed remote monitoring nodes.
Pythonarp-scanasset-managementdcim
عرض على GitHub6,604
hazelcast/hazelcast
hazelcast/hazelcast
6,570عرض على GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Synchronizes shared data structures across multiple application instances to maintain consistent state.
Javabig-datacachingdata-in-motion
عرض على GitHub6,570
apache/flink-cdc
apache/flink-cdc
6,430عرض على GitHub
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Moves data from source databases to target systems in real-time or batch mode using a distributed engine.
Javabatchcdcchange-data-capture
عرض على GitHub6,430

Awesome Distributed Data Synchronization Systems GitHub Repositories

alibaba/canal

mrniko/redisson

tursodatabase/libsql

pingcap/tikv

apache/pulsar

apache/incubator-pulsar

NodeBB/NodeBB

MicrosoftDocs/azure-docs

eclipse-mosquitto/mosquitto

yugabyte/yugabyte-db

tporadowski/redis

boto/boto3

aphyr/distsys-class

orbitdb/orbitdb

alibaba/otter

attic-labs/noms

perkeep/perkeep

netalertx/NetAlertX

hazelcast/hazelcast

apache/flink-cdc

استكشف الوسوم الفرعية