67 مستودعات
Mechanisms for synchronizing data across distributed database nodes or clusters to ensure consistency and availability.
Distinguishing note: Focuses on the architectural synchronization of data between clusters, distinct from general database management.
Explore 67 awesome GitHub repositories matching data & databases · Data Replication. Refine with filters or upvote what's useful.
هذا المشروع عبارة عن مورد تعليمي شامل ودليل دراسي يركز على بنية الأنظمة الموزعة وتصميم البنية التحتية للـ backend. يوفر منهجاً منظماً لإتقان مبادئ القابلية للتوسع، والموثوقية، والأداء المطلوبة لتصميم أنظمة برمجية معقدة. يتميز المستودع بتقديم نهج منهجي للتحضير للمقابلات التقنية، حيث يدمج أنماط التصميم، والمقايضات المعمارية، وأدوات التكرار المتباعد لمساعدة المستخدمين على الاحتفاظ بالمفاهيم المعقدة. ويؤكد على التحليل القائم على القيود، حيث يعلم المستخدمين كيفية تقييم المتطلبات المتنافسة مثل زمن الوصول (latency)، والاتساق، والتوافر عند صياغة التصاميم المعمارية. يغطي المحتوى طيفاً واسعاً من قدرات تصميم النظام، بما في ذلك استراتيجيات توسيع قواعد البيانات، وإدارة حركة المرور، وتحسين البنية التحتية. ويفصل تقنيات التوسع الأفقي، والتخزين المؤقت متعدد الطبقات، والتواصل غير المتزامن، واكتشاف الخدمات، مع توفير أطر عمل لإجراء تقديرات الموارد وتخطيط السعة. يتم تنظيم التوثيق كدليل دراسي، مما يوفر مساراً منهجياً عبر أساسيات هندسة الـ backend وتصميم الأنظمة واسعة النطاق.
Teaches mechanisms for synchronizing data across distributed database nodes or clusters to ensure consistency and availability.
This project is a comprehensive Java backend engineering guide and technical reference focused on high-concurrency design, distributed systems, and microservices architecture. It provides detailed strategies for decomposing monolithic applications, managing service discovery, and implementing the architectural patterns required for scalable backend environments. The repository distinguishes itself through an extensive collection of big data algorithmic references and database scaling strategies. It covers memory-efficient techniques for analyzing massive datasets, such as Top-K element extrac
The system creates redundant replicas of data shards across multiple servers to prevent data loss during hardware failures.
etcd is a distributed key-value store and configuration store designed to maintain a consistent set of data across a cluster of nodes. It functions as a reliable registry for storing and synchronizing critical settings and metadata used by distributed applications. The system implements the Raft consensus algorithm to ensure data consistency and leader election across servers. To protect data transfers and verify node identities, it utilizes a network security layer based on mutual TLS and client certificates. Its capabilities cover distributed configuration management, cluster state synchro
Uses quorum-based write validation to ensure data durability and consistency across replicas.
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
TiDB implements disaster recovery by synchronizing data between a primary and a secondary cluster using change data capture to ensure business continuity.
Ente is a privacy-focused platform for end-to-end encrypted storage and two-factor authentication management. It functions as a zero-knowledge identity provider, ensuring that all cryptographic operations, key derivation, and data encryption occur locally on the user's device. By maintaining this architecture, the service provider remains unable to access or decrypt any stored personal information or authentication credentials. The platform distinguishes itself through a combination of on-device intelligence and resilient data distribution. It utilizes a local machine learning engine to perfo
Distributes encrypted data across independent cloud storage services and geographic regions to prevent data loss.
Typesense is a distributed search engine designed to provide sub-millisecond query latency across massive datasets. It functions as both a high-performance indexing and retrieval engine and a comprehensive search experience platform, offering built-in typo tolerance and tools for managing relevance through synonym configuration, result curation, and complex filtering. The platform distinguishes itself by utilizing in-memory indexing to maintain high-throughput data retrieval and integrating vector database capabilities to support semantic similarity searches. It ensures data consistency and h
Synchronizes data across distributed nodes by periodically streaming state snapshots to ensure fault tolerance.
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
Synchronizes local data with remote backends using a flexible pull-push model supporting GraphQL, WebSockets, and peer-to-peer protocols.
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Synchronizes data across distributed nodes to ensure availability and fault tolerance.
RocketMQ is a cloud-native distributed messaging platform and streaming engine. It functions as a distributed transactional queue that ensures atomicity between local transactions and message delivery, and serves as an MQTT IoT message broker to bridge lightweight device traffic into high-performance data streams. The system is distinguished by a Kubernetes-native architecture that decouples compute from storage to allow independent scaling of traffic and data retention. It utilizes a tiered storage model to offload older data to remote storage and employs quorum-based replication and automat
Requires a majority of replicas to acknowledge write operations to ensure data durability across a distributed cluster.
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Manages data synchronization across distributed clusters to ensure high availability.
This project provides a framework for managing multi-agent systems, designed to automate complex software development, infrastructure, and business workflows. It functions as a multi-agent workflow orchestrator that routes tasks to domain-specific workers while maintaining state persistence and infrastructure automation. By leveraging large language models, the system decomposes high-level objectives into actionable plans, ensuring that complex operations are executed with consistency and reliability. The framework distinguishes itself through its hierarchical agent registry and policy-driven
Configures replication and failover strategies to ensure high availability and data consistency.
Vitess is a database clustering system for horizontal scaling of MySQL. It functions as a middleware layer that abstracts complex sharding and physical topology, allowing applications to interact with a distributed database environment through a unified interface. By intercepting and routing SQL queries across multiple shards, it enables large-scale data management while maintaining the appearance of a single database instance. The platform distinguishes itself through its ability to perform online schema migrations and distributed transaction coordination without requiring application downti
Coordinates data synchronization across distributed database nodes to ensure consistency and high availability within a clustered environment.
Excelize is a library for reading and writing spreadsheet files in the Office Open XML format. It provides a comprehensive suite of tools for programmatically creating, modifying, and analyzing workbooks, worksheets, and cell data, ensuring compatibility across various office software suites through structured XML serialization. The library distinguishes itself with a built-in formula calculation engine that evaluates complex mathematical and logical expressions directly against workbook data. It also features a memory-mapped streaming architecture, which allows for the efficient processing o
Supports distributing data copies across multiple geographic regions for high availability.
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
Synchronizes message streams and consumer states across multiple server nodes using consensus groups to ensure high availability.
This project provides a comprehensive implementation of the AT Protocol, serving as a framework for building decentralized social networking applications. It enables the creation of distributed data repositories where users maintain cryptographic ownership of their identity and content, allowing for portable accounts that can be migrated between independent servers without central authority intervention. The platform distinguishes itself by decoupling content hosting from discovery through modular algorithmic curation. Users can select third-party services to filter and organize their feeds,
Synchronizes and maintains local copies of network records for infrastructure and analysis.
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
Distributes data across a cluster to ensure continuous service and availability during individual node outages.
QuestDB is a high-performance, distributed time-series database designed for the ingestion, storage, and analysis of massive datasets. It functions as a real-time analytics platform that utilizes a columnar storage engine to optimize disk input and output, enabling efficient analytical scans and complex windowing operations on streaming data. The platform distinguishes itself through specialized capabilities for handling asynchronous time-series streams, including advanced join algorithms that align disparate data sets based on precise timestamp lookups. It supports high-volume ingestion thro
Uses shared object storage as a consistent source of truth to synchronize data across standby nodes and read-only replicas.
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Distributes database content across multiple geographic locations to reduce latency and improve availability.
FoundationDB is an ACID-compliant distributed transactional key-value store. It functions as a scalable database engine that ensures strict serializability and data consistency across a cluster of servers using a shared-nothing architecture. The system is distinguished by its multi-region replication capabilities, allowing data to be synchronized across different datacenters for high availability and disaster recovery. It utilizes optimistic concurrency control to manage distributed transactions and employs a majority-based coordination system to maintain cluster state. The platform provides
Implements robust data replication across distributed nodes to ensure consistency and availability during hardware failures.
VictoriaMetrics is a high-performance, scalable time series database and observability platform designed for long-term storage and analysis of metric, log, and trace data. It functions as a unified backend for monitoring ecosystems, offering full compatibility with industry-standard protocols and query languages. The system is built to handle massive data volumes through a distributed architecture that supports horizontal scaling and efficient data lifecycle management. The platform distinguishes itself through a storage engine that utilizes consistent hashing for data sharding and log-struct
Enables high availability by replicating incoming metric streams to multiple independent database instances across different datacenters.