25 个仓库
Systems that store and manage data across multiple networked nodes to provide scalability and fault tolerance.
Distinguishing note: None of the candidates matched; this captures the core distributed architecture of the database.
Explore 25 awesome GitHub repositories matching data & databases · Distributed Databases. Refine with filters or upvote what's useful.
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
Contribute Copy as Markdown View as Markdown Download PDF Compared with the traditional standalone databases, TiDB has the following advantages: - Has a distributed architecture with flexible and elasti
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
Operates across embedded, edge, and cloud environments using a consistent binary and API.
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
Distributes data across multiple nodes and regions to ensure horizontal scalability and high availability.
Valkey is an in-memory, NoSQL database server designed for high-performance data storage and real-time state management. It operates as a distributed key-value store, maintaining datasets entirely within system memory to facilitate sub-millisecond response times for read and write operations. The system distinguishes itself through a single-threaded event loop that utilizes asynchronous I/O multiplexing to ensure high throughput. It supports high availability via master-replica replication and provides a decoupled communication model through a built-in publish-subscribe messaging pattern. To
Provides distributed key-value storage for horizontal scalability.
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
Provides a cluster-based architecture for high availability and fault tolerance.
Dolt is a relational database engine that integrates version control directly into the database management layer. It functions as a version-controlled SQL database that tracks every row and schema change using a commit-based history, allowing users to branch, merge, and audit data modifications. By implementing a wire-protocol-compatible server, the system enables standard SQL clients and tools to interact with versioned data as if they were connecting to a traditional relational database. The platform distinguishes itself by applying repository-style workflows to data management, including s
Facilitates distributed collaboration and data replication by pushing and pulling database states between remote instances.
Dgraph is a distributed graph database designed to store and query highly connected data. It organizes information as nodes and edges to represent complex relationships between entities, providing a platform for managing and analyzing deeply linked datasets. The system functions as a horizontally scalable cluster that partitions data across multiple nodes to maintain performance and availability as information volume increases. It utilizes a specialized query language built for low-latency navigation of interconnected data points, allowing for the execution of complex queries across large-sca
Distributes data across a cluster to maintain performance and availability as information volume and query load grow.
This project is a pure JavaScript database driver for Node.js that implements the native MySQL binary protocol. It serves as a comprehensive connector for managing persistent network links to MySQL servers, enabling applications to execute queries, manage transactions, and handle complex data operations without requiring external middleware. The driver distinguishes itself through its integrated support for connection pooling and distributed database routing. It maintains managed sets of reusable network sockets to optimize resource usage under high request volumes, while simultaneously provi
Distributes database requests across multiple server nodes to improve performance and ensure high availability through automated connection clustering.
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Distributes database content across multiple geographic regions to reduce latency and ensure high availability.
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
Manages data across multiple networked nodes to provide scalability and fault tolerance.
Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards. The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
Distributed database systems convert existing local tables into distributed ones without blocking application read or write operations during the migration process.
Talent Plan 提供以分布式数据库设计、系统编程和开源贡献工作流为核心的指导性培训计划和课程。该项目提供了一个分布式系统教育计划,由专注于数据库内部原理的精选课程和实验组成。 该课程强调使用 Rust 语言构建高性能网络应用和实现分布式算法。它整合了关于版本控制、社区治理以及向公共软件项目贡献所需特定流程的教育材料。 该项目涵盖了广泛的技术和组织领域,包括分布式数据库工程、开源社区管理和技术指导协调。它通过构建容错键值存储和研究专业分布式数据库架构来实现实践应用。 其他材料涵盖了开源基础知识,包括项目治理、软件许可,以及使用 Git 和 GitHub 等协作平台。
Analyzes the architecture and implementation details of professional distributed database systems.
Cassandra is a distributed NoSQL database and wide-column store designed for high availability and linear scalability. It functions as a fault-tolerant distributed system that utilizes an LSM-tree storage engine to optimize write throughput and manage massive datasets. The system is a CQL-compliant database, using a structured query language to manage and retrieve tabular data stored across multiple nodes. It organizes information into rows and columns based on a flexible schema and primary keys. The project provides capabilities for horizontal database scaling, distributed data partitioning
Uses a structured query language (CQL) to manage and retrieve data from distributed tables.
Orbit DB is a decentralized NoSQL database that utilizes conflict-free replicated data types to ensure eventual consistency across a network of nodes. It functions as a peer-to-peer data store that uses IPFS for content-addressing and synchronization, allowing for the maintenance of application state without a central server or authority. The system is built upon a cryptographically verifiable, immutable operation log, which serves as the foundation for custom decentralized data models. This architecture enables the implementation of various data storage patterns, including JSON document stor
Implements a decentralized NoSQL database utilizing CRDTs to ensure eventual consistency across nodes.
OrbitDB is a decentralized data storage system that enables the creation of serverless databases residing across a network of peers. It functions as a peer-to-peer database that integrates with a content-addressed storage layer to distribute and replicate data without a central server. The system utilizes conflict-free replicated data types to ensure eventual consistency and state convergence across distributed nodes. It maintains an immutable record of updates using a directed acyclic graph to preserve causal ordering and cryptographic integrity. Access is managed through a decentralized ide
Provides a decentralized database using conflict-free replicated data types to ensure eventual consistency across all nodes.
Iroh is a peer-to-peer networking stack and distributed system designed for secure direct connections, content-addressed storage, and synchronized data sharing. It provides a foundation for decentralized applications by combining a QUIC-based networking layer with primitives for distributed state and data transfer. The project distinguishes itself through a comprehensive suite of decentralized capabilities, including a distributed data store using conflict-free replicated data types for collaborative synchronization and a content-addressed storage system for verifiable, resumable transfers of
Implements an eventually consistent distributed store using conflict-free replicated data types for synchronized state.
pgloader is a command-line tool that automates the migration of data and schema from various source databases and file formats into PostgreSQL. It combines schema discovery, parallel data pipelines, and type casting into a single, declarative workflow, using PostgreSQL's COPY protocol for high-throughput bulk loading. The tool distinguishes itself by compiling a dedicated command language into concurrent reader-writer pipelines that handle schema introspection, data transformation, and error-resilient batch processing. It supports migrating entire databases from MySQL, MS SQL, SQLite, and Pos
Migrates data into Citus distributed PostgreSQL clusters with automatic shard distribution.
LiteFS 是一个基于 FUSE 的分布式文件系统,旨在跨机器集群复制 SQLite 数据库。它作为一个高可用层,通过拦截写操作来同步数据,从而确保多个服务器节点之间的一致性。 该系统通过用户空间驱动程序将文件操作映射到网络请求,从而管理分布式数据库存储。这允许跨区域数据同步以及将数据库内容分发到边缘节点,从而实现具有同步全局写入的本地读取。 复制过程利用预写日志(WAL)传输和事务感知拦截,将已提交的更改从主节点流式传输到备用副本。新副本在过渡到增量日志复制之前,通过基于快照的初始化进行引导。
Manages database storage across multiple networked nodes to provide scalability and fault tolerance.
Synapse 是一个 Matrix 家庭服务器实现,为去中心化、实时通信和消息传递提供了基础设施。它作为一个联邦聊天服务器,在独立的服务器实例之间同步房间数据和事件流,以实现跨域互操作性。 该服务器利用一个混合核心,将性能关键逻辑集成在 Rust 中,并配有一个 Python 编排层。它使用 PostgreSQL 关系数据库来持久化用户账户和对话历史,并使用基于 Redis 的消息系统在水平工作节点之间分配任务。 该项目涵盖了广泛的功能,包括通过 SAML 和 OpenID Connect 集成的安全身份管理、用于内容审核和房间管理的全面管理工具,以及自动媒体处理。它还包括用于去中心化联邦、异步数据库模式迁移和用于性能监控的遥测导出的系统。
Splits the datastore across multiple physical database nodes to improve horizontal scalability and performance.
InternetArchitect 是一个教育性文档和源代码合集,旨在作为高并发架构课程。它作为一个分布式系统实现指南,提供技术模式和实践示例,用于设计在重负载下保持稳定性的可扩展互联网架构。 该项目专注于高性能数据库优化和微服务设计模式。它涵盖了通过数据库分片和代理层减少延迟并提高吞吐量的策略,以及在分布式集群中协调全局状态的方法。 架构范围包括用于加速数据检索的多级缓存策略,以及用于管理解耦微服务之间通信的服务发现框架。它还解决了分布式状态协调问题,并使用负载均衡网格在后端服务器之间分配网络流量。
Implements systems that store and manage data across multiple networked nodes for scalability and fault tolerance.