Which open-source GitHub repositories match “Specialized and distributed databases”?

pingcap/tidb is the closest match — TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failove…

Why does pingcap/tidb match “Specialized and distributed databases”?

TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizont…

Why does cockroachdb/cockroach match “Specialized and distributed databases”?

Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accomm…

Why does taosdata/tdengine match “Specialized and distributed databases”?

TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information usi…

Why does valkey-io/valkey match “Specialized and distributed databases”?

Valkey is an in-memory, NoSQL database server designed for high-performance data storage and real-time state management. It operates as a distributed key-value store, maintaining datasets entirely within system memory to facilitate sub-millisecond response times for read and write operations. The…

Why does surrealdb/surrealdb match “Specialized and distributed databases”?

SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its qu…

Specialized and distributed databases

Explore high-performance distributed database systems, specialized storage engines, and scalable data management solutions for complex architectures.

Find the best repos with AI.We'll search the best matching repositories with AI.

pingcap/tidb
pingcap/tidb
40,166View on GitHub
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
GoAnalytical Query EnginesData Manipulation InterfacesDatabase Lifecycle Management
View on GitHub40,166
cockroachdb/cockroach
cockroachdb/cockroach
32,207View on GitHub
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
GoDistributed Relational DatabasesDistributed SQL DatabasesDistributed SQL Engines
View on GitHub32,207
taosdata/tdengine
taosdata/TDengine
24,734View on GitHub
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
CAnalytics EnginesColumnar Storage EnginesTime Series Databases
View on GitHub24,734
valkey-io/valkey
valkey-io/valkey
24,875View on GitHub
Valkey is an in-memory, NoSQL database server designed for high-performance data storage and real-time state management. It operates as a distributed key-value store, maintaining datasets entirely within system memory to facilitate sub-millisecond response times for read and write operations. The system distinguishes itself through a single-threaded event loop that utilizes asynchronous I/O multiplexing to ensure high throughput. It supports high availability via master-replica replication and provides a decoupled communication model through a built-in publish-subscribe messaging pattern. To
CIn-Memory Data StoresIn-Memory DatabasesKey-Value Stores
View on GitHub24,875
surrealdb/surrealdb
surrealdb/surrealdb
32,397View on GitHub
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
RustMulti-Model DatabasesAccess Control SystemsACID Transactional Cores
View on GitHub32,397
tursodatabase/libsql
tursodatabase/libsql
16,887View on GitHub
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
CDistributed DatabasesDistributed SQL DatabasesEdge Databases
View on GitHub16,887
orbitdb/orbit-db
orbitdb/orbit-db
8,791View on GitHub
Orbit DB is a decentralized NoSQL database that utilizes conflict-free replicated data types to ensure eventual consistency across a network of nodes. It functions as a peer-to-peer data store that uses IPFS for content-addressing and synchronization, allowing for the maintenance of application state without a central server or authority. The system is built upon a cryptographically verifiable, immutable operation log, which serves as the foundation for custom decentralized data models. This architecture enables the implementation of various data storage patterns, including JSON document stor
JavaScriptDecentralized StorageConflict-Free Replicated Data TypesContent-Addressable Storage
View on GitHub8,791
rethinkdb/rethinkdb
rethinkdb/rethinkdb
26,996View on GitHub
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
C++Document DatabasesChange Data CaptureQuery Builders
View on GitHub26,996
dolthub/dolt
dolthub/dolt
23,592View on GitHub
Dolt is a relational database engine that integrates version control directly into the database management layer. It functions as a version-controlled SQL database that tracks every row and schema change using a commit-based history, allowing users to branch, merge, and audit data modifications. By implementing a wire-protocol-compatible server, the system enables standard SQL clients and tools to interact with versioned data as if they were connecting to a traditional relational database. The platform distinguishes itself by applying repository-style workflows to data management, including s
GoData VersioningDistributed SQL DatabasesDatabase Drivers
View on GitHub23,592
seaweedfs/seaweedfs
seaweedfs/seaweedfs
32,937View on GitHub
SeaweedFS is a distributed object store and high-performance file system designed to manage massive volumes of unstructured data. It utilizes a decoupled architecture that separates metadata management from raw data storage, allowing for independent scalability and the efficient handling of billions of files. By providing a POSIX-compliant interface, it enables applications to interact with a unified namespace while maintaining the performance characteristics of a distributed object store. The system distinguishes itself through a multi-region data fabric that supports active-active replicati
GoObject StorageDistributed Object StoresHigh-Performance File Systems
View on GitHub32,937
mongodb/mongo
mongodb/mongo
28,158View on GitHub
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
C++Distributed DatabasesDocument DatabasesACID Transactional Cores
View on GitHub28,158
clickhouse/clickhouse
ClickHouse/ClickHouse
48,229View on GitHub
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
C++Access Control SystemsAgent AnalyticsAgentic Architectures
View on GitHub48,229
orbitdb/orbitdb
orbitdb/orbitdb
8,737View on GitHub
OrbitDB is a decentralized data storage system that enables the creation of serverless databases residing across a network of peers. It functions as a peer-to-peer database that integrates with a content-addressed storage layer to distribute and replicate data without a central server. The system utilizes conflict-free replicated data types to ensure eventual consistency and state convergence across distributed nodes. It maintains an immutable record of updates using a directed acyclic graph to preserve causal ordering and cryptographic integrity. Access is managed through a decentralized ide
JavaScriptCRDT DatabasesDecentralized StorageAppend-Only Log Storage
View on GitHub8,737
dragonflydb/dragonfly
dragonflydb/dragonfly
30,688View on GitHub
Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries. What distinguishes Dragonfly is its focus on effic
C++Access Control SystemsCluster ManagementConcurrency Models
View on GitHub30,688
dgraph-io/dgraph
dgraph-io/dgraph
21,700View on GitHub
Dgraph is a distributed graph database designed to store and query highly connected data. It organizes information as nodes and edges to represent complex relationships between entities, providing a platform for managing and analyzing deeply linked datasets. The system functions as a horizontally scalable cluster that partitions data across multiple nodes to maintain performance and availability as information volume increases. It utilizes a specialized query language built for low-latency navigation of interconnected data points, allowing for the execution of complex queries across large-sca
GoGraph DatabasesDistributed DatabasesDistributed Databases
View on GitHub21,700
apache/kafka
apache/kafka
32,846View on GitHub
Kafka is a distributed event streaming platform designed for capturing, storing, and processing real-time data streams across interconnected nodes. It functions as a distributed commit log, providing a fault-tolerant storage mechanism that records state changes sequentially to ensure data consistency and durability across distributed environments. The platform distinguishes itself through a partitioned commit log architecture that enables horizontal scaling and parallel processing of data streams. It integrates a stream processing engine for continuous transformations and aggregations, while
JavaDistributed Event Streaming PlatformsDistributed Commit LogsData Streaming Platforms
View on GitHub32,846
duckdb/duckdb
duckdb/duckdb
38,805View on GitHub
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
C++Analytical DatabasesColumnar EnginesEmbedded Databases
View on GitHub38,805
scylladb/scylladb
scylladb/scylladb
15,355View on GitHub
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
C++NoSQL DatabasesDatabase Compatibility LayersDistributed Databases
View on GitHub15,355
grafana/loki
grafana/loki
27,640View on GitHub
Loki is a horizontally scalable, highly available log aggregation engine designed to store and query massive volumes of unstructured log data. It functions as a distributed observability platform that correlates logs, metrics, and traces to provide comprehensive visibility into the health and performance of complex infrastructure. The system distinguishes itself through a distributed query execution model that processes large datasets in parallel across cluster nodes. It utilizes label-based stream indexing and a distributed index to map log data to specific chunks, enabling rapid retrieval w
GoDistributed Observability SystemsLog Storage EnginesObservability Platforms
View on GitHub27,640
milvus-io/milvus
milvus-io/milvus
44,804View on GitHub
Milvus is a specialized vector database engine designed for the indexing, management, and high-speed similarity retrieval of high-dimensional vector embeddings. It functions as a similarity search engine capable of identifying nearest neighbors within large-scale vector spaces, supporting the storage and retrieval of billions of data points while maintaining consistent performance. The system utilizes a distributed architecture that decouples storage, query, and coordination into independent services, allowing for horizontal scaling across clusters. It employs a global indexing mechanism that
GoSimilarity Search EnginesVector DatabasesVector Search Engines
View on GitHub44,804
citusdata/citus
citusdata/citus
12,562View on GitHub
Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards. The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based
CDistributed ExtensionsDistributed Relational DatabasesDistributed SQL Engines
View on GitHub12,562
pola-rs/polars
pola-rs/polars
38,855View on GitHub
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
RustAnalytical Data EnginesColumnar Data ProcessorsDistributed Query Engines
View on GitHub38,855
rqlite/rqlite
rqlite/rqlite
17,586View on GitHub
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
GoDistributed Relational DatabasesRaft Consensus ImplementationsChange Data Capture Streams
View on GitHub17,586
boto/boto3
boto/boto3
9,834View on GitHub
Boto3 is the AWS SDK for Python, providing a programmatic interface for managing and automating AWS cloud infrastructure and services. It serves as a cloud management API client and resource manager for provisioning, configuring, and scaling virtual servers, databases, and storage. The library enables the implementation of infrastructure-as-code through declarative templates and scripts, allowing for the deployment of identical resource stacks across multiple accounts and geographic regions. It also provides a framework for coordinating distributed workflows, serverless functions, and contain
PythonAWS ProvisionersCloud Provisioning TemplatesCloud Service SDKs
View on GitHub9,834
redis/redis
redis/redis
74,906View on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
CActive-Active Database ClustersDistributed CachesDistributed State Management
View on GitHub74,906
yugabyte/yugabyte-db
yugabyte/yugabyte-db
10,349View on GitHub
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
CDistributed Relational DatabasesRelationalDistributed Data Management
View on GitHub10,349
alibaba/canal
alibaba/canal
29,697View on GitHub
Canal is a database replication middleware that performs change data capture by simulating a database replica. It monitors transaction logs to stream incremental data modifications to downstream systems in real time, acting as an event streaming infrastructure that transforms low-level binary logs into structured, consumable message streams. The project distinguishes itself through a high-throughput architecture that utilizes concurrent multi-threaded parsing and stateful log position tracking to ensure reliable data delivery. It employs a pluggable sink architecture that decouples data extra
JavaChange Data Capture ServicesChange Data Capture ToolsDatabase Change Subscriptions
View on GitHub29,697
druid-io/druid
druid-io/druid
14,020View on GitHub
Druid is a distributed columnar store and online analytical processing database designed for real-time analytics. It functions as a SQL analytics platform and a streaming data ingestion engine, allowing for the analysis of large datasets with low latency to support interactive dashboards and high-concurrency operational workloads. The system integrates a streaming data ingestion engine that loads information via batch or streaming processes to enable immediate analysis of arriving data. It provides high-performance analytical processing to execute slice-and-dice queries on massive data volume
JavaReal-time Analytics PlatformsAnalytical DatabasesColumnar Databases
View on GitHub14,020
meilisearch/meilisearch
meilisearch/meilisearch
58,118View on GitHub
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
RustDeveloper-Focused Search ToolsDocument Indexing EnginesFinite State Transducers
View on GitHub58,118
neo4j/neo4j
neo4j/neo4j
15,928View on GitHub
Neo4j is a native graph database management system designed to store and query highly connected data using a property-graph model. It provides an ACID-compliant transaction engine that ensures data integrity, supported by a distributed cluster architecture that maintains causal consistency across nodes. Users interact with the system through a declarative query language, which allows for complex pattern matching and path traversal without requiring manual traversal logic. The platform distinguishes itself through its hybrid approach to data retrieval, combining traditional graph-based queries
JavaGraph DatabasesACID Transactional CoresAtomic Transactions
View on GitHub15,928
pathwaycom/llm-app
pathwaycom/llm-app
59,341View on GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream p
Jupyter NotebookData Processing FrameworksDifferential Dataflow EnginesDistributed State Management
View on GitHub59,341
slatedb/slatedb
slatedb/slatedb
2,730View on GitHub
SlateDB is a cloud-native key-value store and distributed database engine that utilizes a log-structured merge-tree architecture. It serves as a transactional storage layer designed to persist data directly to cloud object storage. The engine differentiates itself by optimizing read performance for remote storage through the use of bloom filters and multi-level block caching. It employs a single-writer multi-reader model and provides the ability to create zero-copy clones via copy-on-write checkpointing. The system supports atomic transactions, range queries, and snapshot-based concurrency c
RustLog-Structured Merge-TreesLSM-Tree Key-Value StoresAtomic Transactions
View on GitHub2,730
prisma/prisma
prisma/prisma
46,366View on GitHub
Prisma is a database toolkit that provides a unified access layer for interacting with relational and document databases. It centers on a declarative schema modeling approach, where developers define their data structures in a human-readable language. This schema serves as the single source of truth, from which the toolkit automatically generates type-safe database clients that provide compile-time validation and editor autocomplete for all data operations. The project distinguishes itself through a high-performance, Rust-based query engine that handles query planning and connection pooling o
TypeScriptObject-Relational MappersSchema Modeling ToolsType-Safe Client Generators
View on GitHub46,366
tikv/tikv
tikv/tikv
16,535View on GitHub
TiKV is a distributed transactional key-value store designed for horizontal scalability and high availability. It functions as a storage engine that maintains massive datasets across a cluster of physical nodes, ensuring that information remains accessible and consistent even when individual hardware components fail. The system utilizes a consensus-based replication model to synchronize data across nodes, ensuring that all replicas agree on the order of operations. It manages data distribution through a sharding mechanism that partitions large datasets into smaller groups, each governed by in
RustDistributed Key-Value StoresDistributed DatabasesKey-Value
View on GitHub16,535
influxdata/influxdb
influxdata/influxdb
31,556View on GitHub
InfluxDB is a specialized time series database platform engineered for the high-speed ingestion, compression, and retrieval of timestamped data at scale. It functions as a distributed metrics platform, providing the infrastructure necessary to organize and analyze massive volumes of time-stamped information to identify trends, patterns, and anomalies within complex data streams. The platform distinguishes itself through a functional dataflow engine that utilizes a specialized programming language for complex analytical transformations and automated tasks. This architecture is supported by a p
RustTime Series DatabasesDomain Specific LanguagesData Ingestion Plugins
View on GitHub31,556
golang-migrate/migrate
golang-migrate/migrate
18,118View on GitHub
This project is a command-line utility designed to manage database schema versioning and automate incremental schema updates. It functions as a version control system for database structures, ensuring consistency across environments by tracking applied migrations in a dedicated metadata table and executing scripts in a sequential, reliable manner. The tool distinguishes itself through a driver-based abstraction layer that supports a wide range of database engines, including various SQL and distributed cloud databases. It provides robust concurrency control through advisory locking, which prev
GoDatabase MigrationsDatabase Schema MigrationsAutomated Migrations
View on GitHub18,118
facebookresearch/faiss
facebookresearch/faiss
40,302View on GitHub
This project is a high-performance library designed for the similarity search and clustering of dense vectors across massive datasets. It functions as a vector similarity search engine, providing the necessary tools to organize complex numerical data into specialized structures that facilitate rapid retrieval and efficient querying of millions of records. The library distinguishes itself through a variety of advanced indexing and compression techniques, including hierarchical navigable small worlds for logarithmic time complexity and inverted file indexing to partition vector spaces into mana
C++Vector Search EnginesApproximate Nearest Neighbor SearchVector Similarity Search
View on GitHub40,302
nextcloud/all-in-one
nextcloud/all-in-one
9,082View on GitHub
all-in-one is a containerized deployment system designed to install and manage a complete suite of productivity and collaboration services. It functions as a cloud suite deployer that orchestrates the installation of a self-hosted content platform, incorporating necessary dependencies via Docker or Kubernetes. The project distinguishes itself by providing a web-based dashboard for orchestrating, updating, and monitoring the lifecycle of service containers. It also serves as a local AI inference server, enabling the execution of generative text models, image diffusion, and speech processing on
PHPCloud Suite DeployersCloud Suite OrchestratorsPrivate Cloud Storage
View on GitHub9,082
sequelize/sequelize
sequelize/sequelize
30,349View on GitHub
Sequelize is an object-relational mapping library that provides a unified interface for managing relational data through code. By implementing the Active Record pattern, it maps database tables to application objects, allowing developers to perform standard create, read, update, and delete operations using high-level method calls. The library abstracts complex database interactions by translating these calls into optimized, engine-specific SQL statements, ensuring consistent behavior across different database systems. The project distinguishes itself through a comprehensive suite of tools for
TypeScriptObject-Relational MappersDatabase DriversObject-Relational Mapping
View on GitHub30,349
pingcap/awesome-database-learning
pingcap/awesome-database-learning
10,672View on GitHub
This project is a curated collection of academic papers, books, and technical resources designed for studying the architecture and implementation of database management systems. It serves as a comprehensive educational guide for engineers and researchers looking to understand the fundamental principles behind modern data storage and retrieval. The repository distinguishes itself by providing structured learning paths across critical database domains, including the design of persistent storage engines, the mechanics of query optimization, and the complexities of distributed transaction managem
Awesome ListDatabase ArchitecturesDatabase Internals
View on GitHub10,672
typeorm/typeorm
typeorm/typeorm
36,540View on GitHub
TypeORM is an object-relational mapper for TypeScript and JavaScript that bridges the gap between object-oriented application code and relational database tables. It provides a comprehensive data persistence layer that allows developers to define database entities using class decorators or configuration objects, enabling seamless interaction with data through object-oriented patterns. The project distinguishes itself through a flexible architecture that supports both the data mapper and repository patterns, alongside a fluent query builder that translates high-level method calls into platform
TypeScriptObject-Relational MappersQuery BuildersData Access Layers
View on GitHub36,540
tigerbeetle/tigerbeetle
tigerbeetle/tigerbeetle
16,291View on GitHub
TigerBeetle is a distributed financial accounting database designed for high-volume transaction processing. It functions as a specialized transaction engine that enforces strict double-entry bookkeeping invariants, ensuring that every debit and credit is balanced and accounted for with absolute consistency. By utilizing a consensus-based replication model, the system provides high availability and data durability across geographically distributed clusters, making it suitable for mission-critical financial infrastructure. The system distinguishes itself through a performance-oriented architect
ZigDistributed LedgersAccounting EnginesAccounting Invariants
View on GitHub16,291
qdrant/qdrant
qdrant/qdrant
32,372View on GitHub
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
RustVector DatabasesVector Search EnginesHybrid Search Engines
View on GitHub32,372
chroma-core/chroma
chroma-core/chroma
26,198View on GitHub
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
RustVector DatabasesHybrid Search EnginesVector Search
View on GitHub26,198
ray-project/ray
ray-project/ray
42,895View on GitHub
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
PythonActor ModelsDistributed Computing FrameworksDistributed Datasets
View on GitHub42,895
localforage/localforage
localForage/localForage
25,755View on GitHub
This project is an asynchronous key-value store designed for client-side data persistence. It provides a unified interface that allows applications to save and retrieve complex data types, including binary objects, while maintaining responsiveness through non-blocking operations. By enabling offline-first functionality, it ensures that data remains accessible even when a network connection is unavailable. The library distinguishes itself through a driver-based abstraction layer that automatically detects the most efficient storage mechanism available in the current browser or mobile environme
JavaScriptKey-Value StoresOffline-First Web AppsStorage Abstraction Layers
View on GitHub25,755
go-gorm/gorm
go-gorm/gorm
39,798View on GitHub
GORM is a developer-focused object-relational mapping library for Go that provides a comprehensive data persistence framework. It serves as a database access layer, allowing developers to map application structures to database tables and perform CRUD operations using a fluent, type-safe query builder instead of writing raw SQL. The library distinguishes itself through its association-aware persistence, which automatically tracks and synchronizes complex entity relationships during database operations. It utilizes a driver-agnostic interface to maintain consistent behavior across various stora
GoDatabase TransactionsObject-Relational MappersObject-Relational Mapping
View on GitHub39,798
alibaba/druid
alibaba/druid
28,221View on GitHub
Druid is a database connection management and monitoring framework designed to maintain persistent, high-performance links between applications and relational databases. It functions as a resource manager that automates the lifecycle of connection pools, reducing the overhead associated with repeatedly opening and closing network connections. The project distinguishes itself through an integrated query analysis engine that decomposes database statements into structured components. This capability enables real-time security auditing, syntax validation, and metadata extraction, allowing for the
JavaConnection PoolsDatabase Abstraction LayersQuery Analyzers
View on GitHub28,221

Specialized and distributed databases

pingcap/tidb

cockroachdb/cockroach

taosdata/TDengine

valkey-io/valkey

surrealdb/surrealdb

tursodatabase/libsql

orbitdb/orbit-db

rethinkdb/rethinkdb

dolthub/dolt

seaweedfs/seaweedfs

mongodb/mongo

ClickHouse/ClickHouse

orbitdb/orbitdb

dragonflydb/dragonfly

dgraph-io/dgraph

apache/kafka

duckdb/duckdb

scylladb/scylladb

grafana/loki

milvus-io/milvus

citusdata/citus

pola-rs/polars

rqlite/rqlite

boto/boto3

redis/redis

yugabyte/yugabyte-db

alibaba/canal

druid-io/druid

meilisearch/meilisearch

neo4j/neo4j

pathwaycom/llm-app

slatedb/slatedb

prisma/prisma

tikv/tikv

influxdata/influxdb

golang-migrate/migrate

facebookresearch/faiss

nextcloud/all-in-one

sequelize/sequelize

pingcap/awesome-database-learning

typeorm/typeorm

tigerbeetle/tigerbeetle

qdrant/qdrant

chroma-core/chroma

ray-project/ray

localForage/localForage

go-gorm/gorm

alibaba/druid