25 dépôts
Systems that store and manage data across multiple networked nodes to provide scalability and fault tolerance.
Distinguishing note: None of the candidates matched; this captures the core distributed architecture of the database.
Explore 25 awesome GitHub repositories matching data & databases · Distributed Databases. Refine with filters or upvote what's useful.
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
Contribute Copy as Markdown View as Markdown Download PDF Compared with the traditional standalone databases, TiDB has the following advantages: - Has a distributed architecture with flexible and elasti
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
Operates across embedded, edge, and cloud environments using a consistent binary and API.
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
Distributes data across multiple nodes and regions to ensure horizontal scalability and high availability.
Valkey is an in-memory, NoSQL database server designed for high-performance data storage and real-time state management. It operates as a distributed key-value store, maintaining datasets entirely within system memory to facilitate sub-millisecond response times for read and write operations. The system distinguishes itself through a single-threaded event loop that utilizes asynchronous I/O multiplexing to ensure high throughput. It supports high availability via master-replica replication and provides a decoupled communication model through a built-in publish-subscribe messaging pattern. To
Provides distributed key-value storage for horizontal scalability.
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
Provides a cluster-based architecture for high availability and fault tolerance.
Dolt is a relational database engine that integrates version control directly into the database management layer. It functions as a version-controlled SQL database that tracks every row and schema change using a commit-based history, allowing users to branch, merge, and audit data modifications. By implementing a wire-protocol-compatible server, the system enables standard SQL clients and tools to interact with versioned data as if they were connecting to a traditional relational database. The platform distinguishes itself by applying repository-style workflows to data management, including s
Facilitates distributed collaboration and data replication by pushing and pulling database states between remote instances.
Dgraph is a distributed graph database designed to store and query highly connected data. It organizes information as nodes and edges to represent complex relationships between entities, providing a platform for managing and analyzing deeply linked datasets. The system functions as a horizontally scalable cluster that partitions data across multiple nodes to maintain performance and availability as information volume increases. It utilizes a specialized query language built for low-latency navigation of interconnected data points, allowing for the execution of complex queries across large-sca
Distributes data across a cluster to maintain performance and availability as information volume and query load grow.
This project is a pure JavaScript database driver for Node.js that implements the native MySQL binary protocol. It serves as a comprehensive connector for managing persistent network links to MySQL servers, enabling applications to execute queries, manage transactions, and handle complex data operations without requiring external middleware. The driver distinguishes itself through its integrated support for connection pooling and distributed database routing. It maintains managed sets of reusable network sockets to optimize resource usage under high request volumes, while simultaneously provi
Distributes database requests across multiple server nodes to improve performance and ensure high availability through automated connection clustering.
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Distributes database content across multiple geographic regions to reduce latency and ensure high availability.
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
Manages data across multiple networked nodes to provide scalability and fault tolerance.
Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards. The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
Distributed database systems convert existing local tables into distributed ones without blocking application read or write operations during the migration process.
Talent Plan provides guided training programs and curricula centered on distributed database design, systems programming, and open source contribution workflows. The project offers a distributed systems education program consisting of curated courses and labs focused on database internals. The curriculum emphasizes the use of the Rust language for building high-performance networked applications and implementing distributed algorithms. It integrates educational materials on version control, community governance, and the specific processes required to contribute to public software projects. T
Analyzes the architecture and implementation details of professional distributed database systems.
Cassandra is a distributed NoSQL database and wide-column store designed for high availability and linear scalability. It functions as a fault-tolerant distributed system that utilizes an LSM-tree storage engine to optimize write throughput and manage massive datasets. The system is a CQL-compliant database, using a structured query language to manage and retrieve tabular data stored across multiple nodes. It organizes information into rows and columns based on a flexible schema and primary keys. The project provides capabilities for horizontal database scaling, distributed data partitioning
Uses a structured query language (CQL) to manage and retrieve data from distributed tables.
Orbit DB is a decentralized NoSQL database that utilizes conflict-free replicated data types to ensure eventual consistency across a network of nodes. It functions as a peer-to-peer data store that uses IPFS for content-addressing and synchronization, allowing for the maintenance of application state without a central server or authority. The system is built upon a cryptographically verifiable, immutable operation log, which serves as the foundation for custom decentralized data models. This architecture enables the implementation of various data storage patterns, including JSON document stor
Implements a decentralized NoSQL database utilizing CRDTs to ensure eventual consistency across nodes.
OrbitDB is a decentralized data storage system that enables the creation of serverless databases residing across a network of peers. It functions as a peer-to-peer database that integrates with a content-addressed storage layer to distribute and replicate data without a central server. The system utilizes conflict-free replicated data types to ensure eventual consistency and state convergence across distributed nodes. It maintains an immutable record of updates using a directed acyclic graph to preserve causal ordering and cryptographic integrity. Access is managed through a decentralized ide
Provides a decentralized database using conflict-free replicated data types to ensure eventual consistency across all nodes.
Iroh is a peer-to-peer networking stack and distributed system designed for secure direct connections, content-addressed storage, and synchronized data sharing. It provides a foundation for decentralized applications by combining a QUIC-based networking layer with primitives for distributed state and data transfer. The project distinguishes itself through a comprehensive suite of decentralized capabilities, including a distributed data store using conflict-free replicated data types for collaborative synchronization and a content-addressed storage system for verifiable, resumable transfers of
Implements an eventually consistent distributed store using conflict-free replicated data types for synchronized state.
pgloader is a command-line tool that automates the migration of data and schema from various source databases and file formats into PostgreSQL. It combines schema discovery, parallel data pipelines, and type casting into a single, declarative workflow, using PostgreSQL's COPY protocol for high-throughput bulk loading. The tool distinguishes itself by compiling a dedicated command language into concurrent reader-writer pipelines that handle schema introspection, data transformation, and error-resilient batch processing. It supports migrating entire databases from MySQL, MS SQL, SQLite, and Pos
Migrates data into Citus distributed PostgreSQL clusters with automatic shard distribution.
LiteFS est un système de fichiers distribué basé sur FUSE conçu pour répliquer les bases de données SQLite sur un cluster de machines. Il fonctionne comme une couche de haute disponibilité qui synchronise les données en interceptant les opérations d'écriture pour assurer la cohérence sur plusieurs nœuds de serveur. Le système gère le stockage de base de données distribué en mappant les opérations de fichiers vers des requêtes réseau via un pilote en espace utilisateur. Cela permet la synchronisation des données multi-régions et la distribution du contenu de la base de données vers des nœuds en périphérie (edge nodes), facilitant les lectures locales avec des écritures globales synchronisées. Le processus de réplication utilise l'expédition de journaux d'écriture anticipée (write-ahead log shipping) et l'interception consciente des transactions pour diffuser les changements validés d'un nœud primaire vers des répliques de secours. Les nouvelles répliques sont initialisées via des instantanés (snapshots) avant de passer à la réplication incrémentielle des journaux.
Manages database storage across multiple networked nodes to provide scalability and fault tolerance.
Synapse est une implémentation de homeserver Matrix qui fournit l'infrastructure pour la communication et la messagerie décentralisées en temps réel. Il fonctionne comme un serveur de chat fédéré qui synchronise les données de salon et les flux d'événements à travers des instances de serveur indépendantes pour permettre l'interopérabilité inter-domaines. Le serveur utilise un cœur hybride qui intègre une logique critique pour la performance en Rust avec une couche d'orchestration en Python. Il emploie une base de données relationnelle PostgreSQL pour persister les comptes utilisateurs et l'historique des conversations, et utilise un système de messagerie basé sur Redis pour distribuer les tâches à travers des workers horizontaux. Le projet couvre un large éventail de capacités, incluant la gestion sécurisée des identités avec intégration SAML et OpenID Connect, des outils administratifs complets pour la modération de contenu et la gestion des salons, et la gestion automatisée des médias. Il inclut également des systèmes pour la fédération décentralisée, la migration asynchrone de schéma de base de données et l'exportation de télémétrie pour la surveillance des performances.
Splits the datastore across multiple physical database nodes to improve horizontal scalability and performance.
InternetArchitect est une collection éducative de documents et de code source conçue comme un cours d'architecture à haute concurrence. Il sert de guide d'implémentation de systèmes distribués, fournissant des patterns techniques et des exemples pratiques pour concevoir des architectures internet scalables qui maintiennent la stabilité sous de lourdes charges de trafic. Le projet se concentre sur l'optimisation des bases de données haute performance et les patterns de conception de microservices. Il couvre des stratégies pour réduire la latence et augmenter le débit via le sharding de base de données et les couches de proxy, ainsi que la coordination de l'état global à travers des clusters distribués. La portée architecturale inclut des stratégies de mise en cache multi-niveaux pour accélérer la récupération des données et l'implémentation de frameworks de découverte de services pour gérer la communication entre les microservices découplés. Il aborde également la coordination d'état distribué et l'utilisation de maillages d'équilibrage de charge pour distribuer le trafic réseau à travers les serveurs backend.
Implements systems that store and manage data across multiple networked nodes for scalability and fault tolerance.