Explore open-source implementations of storage engines, indexing structures, and core database management system architectures.
redb is an embedded key-value store and ACID-compliant storage engine. It functions as a persistent storage system for saving and retrieving data as key-value pairs within a tree structure. The engine is built as an MVCC transactional database, utilizing multi-version concurrency control to manage simultaneous reads and writes without blocking. It employs a single-writer multi-reader model to ensure data consistency while allowing multiple threads to access the store. The system provides persistent state management and atomic transaction management to prevent data corruption during crashes. It handles concurrent data access and ensures that groups of changes are applied as single units.
This is an embedded, ACID-compliant storage engine that provides a clear, modern implementation of B-tree structures, MVCC, and write-ahead logging, making it an excellent reference for studying database internals.
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capability areas including data management, storage maintenance, and performance optimization. It implements write-ahead logging and multi-level compaction to manage disk space, while utilizing block-based caching and data compression to optimize read and write throughput. Low-level integration is handled through an abstract file system interface that decouples the storage engine from the operating system.
LevelDB is a foundational, industry-standard reference implementation of an LSM-tree storage engine that provides clear, low-level examples of write-ahead logging, atomic batching, and persistent storage mechanics.
PostgreSQL is an object-relational database management system designed for the persistent storage and retrieval of structured information. It functions as an ACID-compliant database server, utilizing standard query language protocols to maintain data consistency and reliability across large-scale application datasets. The system distinguishes itself through an extensible architecture that allows for the definition of custom data types, operators, and indexing methods. It employs multi-version concurrency control to enable simultaneous read and write operations without blocking, supported by a cost-based query optimizer that evaluates statistical data to determine efficient execution paths. The engine manages data integrity through strict logical rules and constraints, ensuring accuracy across interconnected information sets. It utilizes write-ahead logging to guarantee durability during system failures and maintains a shared buffer cache to minimize disk input and output operations during concurrent access.
PostgreSQL is a production-grade, open-source database engine that serves as the definitive reference implementation for core storage mechanics like B-tree indexing, write-ahead logging, and multi-version concurrency control.
Bolt is a single-file embedded key-value store for Go applications. It is an ACID transactional database that organizes data in B+trees on disk to provide efficient sorted key retrieval and range scans. The system uses a memory-mapped model to map the database file directly into the process address space for fast random-access reads. The project distinguishes itself through a multi-version concurrency control architecture that allows multiple simultaneous readers to access a consistent snapshot of data without blocking a writer. It employs a single-writer multi-reader locking model and uses advisory filesystem locking to prevent multiple processes from opening the same database file simultaneously. The database provides hierarchical data organization through nested buckets and supports unique identifier generation for records. It includes capabilities for hot backups, batch write updates to reduce disk commit overhead, and cursor-based key seeking and iteration. Operational telemetry is available via internal database performance metrics to track system activity.
Bolt is a production-grade embedded key-value store that serves as an excellent reference implementation for B+tree storage, ACID transactions, and memory-mapped file management.
LMDB is an embedded key-value storage engine that provides ACID-compliant data persistence. It is a memory-mapped database that utilizes B+ trees to store key-value pairs, ensuring atomicity, consistency, isolation, and durability. The engine maps files directly into the virtual address space to minimize data copying and system calls. This approach enables high-performance local caching and low-latency data access, specifically optimizing for read-heavy database workflows. The system implements a transactional model with copy-on-write versioning and single-writer multi-reader locking. These mechanisms allow multiple concurrent read-only transactions to access consistent snapshots without locks while restricting modifications to a single writer.
LMDB is a high-performance, production-grade embedded storage engine that provides a concrete, low-level implementation of B+ trees, ACID transactions, and concurrency control, making it an excellent reference for studying database internals.
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with table schemas, primary keys, and foreign keys to enforce referential integrity. Its query engine parses SQL statements and executes them through an optimized pipeline of operators, supporting joins, aggregations, and heuristic query optimization. Data is persisted via pluggable storage backends, including log-structured and in-memory engines.
ToyDB is a purpose-built educational database that implements core storage engine mechanics like LSM trees, write-ahead logging, and ACID-compliant transactions, making it an ideal reference for studying database internals.
LiteDB is a serverless, embedded NoSQL document database for .NET applications. It persists data into a single portable file, functioning as a BSON data store that resides within the application process rather than running as a separate server. The system is ACID compliant, utilizing write-ahead logging to ensure atomic, consistent, isolated, and durable transactions. It includes built-in encryption to provide secure local data storage and protect files on disk from unauthorized access. The project covers object-document mapping to convert classes into document formats, indexed search capabilities via B-tree indexing, and specialized streaming for large binary objects. It also provides a dedicated administrative studio for visual data administration and modification.
LiteDB is a fully functional embedded NoSQL database that provides a practical reference for B-tree indexing, ACID compliance, and write-ahead logging, making it a useful resource for studying the mechanics of a production-grade storage engine.
RocksDB is a high-performance, embeddable persistent key-value library and storage engine based on Log-Structured Merge-trees. It is designed to provide durable storage for large-scale datasets, integrating directly into applications to manage data on flash and RAM-based hardware. The engine is distinguished by its focus on minimizing read and write amplification through multi-threaded compaction and custom memory allocators. It features specialized optimizations for flash storage, including support for zoned block devices, and provides the ability to extend store behavior via external plugins. Its broad capability surface includes atomic transactions, column family partitioning for logical keyspace division, and data-at-rest encryption. The system also supports secondary indexing, time-to-live data expiration, and integration with distributed filesystems. Observability is provided through internal statistics tracking, component performance benchmarking, and crash recovery simulation.
RocksDB is a production-grade, embeddable storage engine that serves as a definitive reference implementation for LSM-tree architecture, write-ahead logging, and ACID-compliant transactional operations.
MiniOB is an open-source educational relational database kernel designed for learning the internals of database systems. It implements a dual-engine storage architecture combining B+ Tree and LSM-Tree, supports SQL parsing and query execution, and provides transactional processing with multi-version concurrency control. The system communicates with clients using the MySQL wire protocol and includes a vector database extension for storing and querying high-dimensional vectors. The project distinguishes itself through its comprehensive coverage of core database concepts in a single, learnable codebase. It features a volcano-style execution model with SIMD-accelerated batch processing, write-ahead logging with checkpoint-based crash recovery, and a frame-based buffer pool for disk page management. The system supports multiple transaction isolation levels, deadlock detection and resolution, and includes an IVF-Flat vector index for approximate nearest neighbor search. The database includes a full SQL processing pipeline with query optimization techniques such as predicate pushdown, join reordering, and subquery flattening. It provides configurable thread pool models for client connection handling, supports custom SQL statement extensions, and includes monitoring tools for concurrency debugging and memory usage tracking. The project ships with a Docker-based development environment and an automated testing framework that can compare outputs against MySQL.
MiniOB is a purpose-built educational database kernel that provides a comprehensive reference implementation of B+ trees, LSM trees, write-ahead logging, and ACID-compliant transaction management within a single, accessible codebase.
Sled is an embedded key-value store and ACID-compliant database designed for high-performance data persistence. It functions as a log-structured storage engine that organizes data using B+ trees to support efficient range queries and prefix scans. The engine implements a zero-copy data store model, utilizing epoch-based reclamation to provide direct references to cached values without memory allocations. It distinguishes itself through a combination of write-ahead logging, page cache optimizations to reduce write amplification on flash storage, and serializable transactions for atomic multi-key updates. The library covers a broad range of capabilities, including crash recovery through checkpointing, disk storage defragmentation, and binary stream replication. It also provides reactive data observation via key-prefix event subscriptions and supports custom merge logic for atomic value transformations.
Sled is a high-performance, ACID-compliant embedded storage engine that provides a concrete, production-grade implementation of B+ trees, write-ahead logging, and concurrency control mechanisms useful for studying database internals.
Pebble is an embedded key-value storage engine written in Go, designed as a library that provides durable, write-optimized data persistence directly within applications. It organizes data using a log-structured merge-tree (LSM-tree) structure, where writes are first buffered in an in-memory skiplist memtable and persisted to a write-ahead log before being flushed to block-based SSTable files on disk. The engine supports atomic batch commits, configurable write synchronization, and automatic background compaction that merges and rewrites sorted runs to reclaim space and maintain read performance. The storage engine distinguishes itself through support for range keys, which allow key-value pairs to apply to contiguous ranges of keyspace and be interleaved with point keys during iteration. It also includes property-based filtering, where user-defined key-value properties attached to SSTable blocks and files enable iteration to skip irrelevant data blocks and entire files during scans. Pebble provides a storage format migrator that can upgrade on-disk database files to newer physical formats at runtime, using either background or blocking compaction to apply the change. The engine offers comprehensive key-value operations including point lookups, range scans in both forward and reverse directions, and batch write operations that group multiple mutations into a single atomic commit. It includes built-in benchmarking tools for measuring read, write, and mixed workload throughput and latency under realistic access patterns, using configurable key distributions and value sizes.
Pebble is a production-grade LSM-tree storage engine that provides a clear, well-documented reference for low-level database mechanics like write-ahead logging, SSTable management, and compaction strategies.
SlateDB is a cloud-native key-value store and distributed database engine that utilizes a log-structured merge-tree architecture. It serves as a transactional storage layer designed to persist data directly to cloud object storage. The engine differentiates itself by optimizing read performance for remote storage through the use of bloom filters and multi-level block caching. It employs a single-writer multi-reader model and provides the ability to create zero-copy clones via copy-on-write checkpointing. The system supports atomic transactions, range queries, and snapshot-based concurrency control to ensure data consistency. It also includes capabilities for capturing data changes to enable real-time synchronization and auditing.
This is a functional LSM-tree storage engine written in Rust that provides a clear, modern implementation of transactional storage mechanics, making it a valuable reference for studying cloud-native database internals.
Badger is an embeddable key-value store written in Go that provides persistent data storage for byte keys and values. It is a persistent database that utilizes a tiered LSM tree storage model to optimize disk storage and retrieval efficiency. The system features an ACID transaction engine that ensures data integrity through serializable snapshot isolation and multi-version concurrency control. It also provides an encrypted key-value store with data-at-rest encryption and a managed encrypted key registry to secure stored information. The engine covers a broad set of capabilities including high-throughput data ingestion, persistent data indexing, and crash recovery via write-ahead logging. It supports advanced data management such as asynchronous value merging, background compaction, and prefix-based data streaming. Monitoring and observability tools are included for data integrity validation and performance metrics. The project includes command-line utilities for performing offline database backup and restoration of the database state.
Badger is a production-grade, embeddable LSM-tree storage engine that provides a clear, high-performance reference for implementing write-ahead logging, ACID transactions, and concurrency control in Go.
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability surface including relational data management, analytical query execution, and database telemetry collection for gathering hardware and configuration statistics.
This is a full-scale production relational database management system that includes complex storage engine implementations like InnoDB and columnar engines, providing a comprehensive, albeit highly advanced, reference for database internals and storage mechanics.
etcd is a distributed, strongly consistent key-value store designed to provide reliable storage for critical system metadata and coordination primitives. It functions as a distributed consensus engine, utilizing a replicated log and leader-based state machine to ensure that all nodes in a cluster maintain a synchronized view of data. By providing atomic operations and linearizable reads and writes, it serves as a foundational component for distributed systems requiring high availability and fault tolerance. The system distinguishes itself through its multi-version concurrency control, which enables non-blocking read operations while maintaining strict consistency for concurrent writes. It supports complex distributed coordination through features like lease-based expiration, which allows for the automatic removal of data based on client activity, and asynchronous key change monitoring, which provides real-time event notifications for data modifications. These capabilities are supported by a persistent B-tree-based storage engine and write-ahead logging to ensure durability across system crashes. Beyond its core storage functions, the project provides a comprehensive suite of tools for cluster management, including automated peer discovery via DNS or service registries and robust security enforcement. It includes built-in mechanisms for transport layer security, role-based access control, and certificate management to protect data in transit and at rest. Operational reliability is further maintained through snapshot-based disaster recovery, cluster health monitoring, and granular performance tuning for disk and network resources. The system is configured through structured files or command-line flags, allowing for flexible deployment across diverse infrastructure environments.
While this is a production-grade distributed key-value store rather than a pedagogical project, it serves as a high-quality reference implementation for B-tree storage, write-ahead logging, and concurrency control in a distributed context.
bbolt is an ACID-compliant embedded key-value store for Go applications. It persists all data in a single memory-mapped file on disk, organizing information using B+ trees to facilitate sorted key iteration and efficient range queries. The project distinguishes itself through a hierarchical data organization model, allowing buckets to be nested within other buckets to create a tree-like structure. It employs a single-writer, multi-reader locking mechanism and copy-on-write transactions to ensure serializable isolation and data integrity. The system includes comprehensive data management capabilities, such as unique identifier generation, cursor-based iteration, and hot backup generation. Maintenance tools are provided for database compaction, consistency verification, and the repair of corrupted pages. Command-line utilities are available for querying database content and inspecting internal structural metadata.
This is a production-grade embedded key-value store that serves as an excellent reference implementation for B+ tree architecture, ACID-compliant transactions, and low-level disk-based storage mechanics.
Helix DB is a distributed graph database and knowledge graph platform that persists nodes and edges on object storage for durable and unlimited scaling. It operates as an ACID-compliant system, ensuring data consistency through serializable snapshot isolation during concurrent operations. The project distinguishes itself by combining a vector search engine and a property graph, utilizing hybrid vector and full-text search to locate entry points for graph traversals. It enables dynamic graph querying through a domain-specific language, allowing complex logic and recursive queries to be executed via an API without redeploying application code. The system provides high availability through a distributed cluster of gateways and reader nodes that scale automatically based on load. Its broader capabilities include graph data mutation, multi-hop relationship traversal, and query output shaping with filtering and pagination. A command-line interface is provided for cluster management and project bootstrapping.
Helix DB is a distributed graph database that implements core storage engine concepts like ACID compliance and query execution, serving as a practical example of a modern, cloud-native database architecture.
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data updates. Instead of polling for changes, developers can maintain persistent cursors on tables to stream document modifications in real-time. This is complemented by a fluent, functional query language that translates native code constructs into optimized, parallelized execution plans. By embedding these queries directly into application code, the system provides a type-safe interface that helps prevent injection vulnerabilities while enabling complex data manipulation and aggregation. The platform provides a comprehensive suite of administrative tools for managing production environments, including granular user permissions, TLS network encryption, and visual cluster monitoring. It supports advanced data modeling through document embedding and cross-table linking, as well as specialized geospatial processing for proximity-based queries. The system is designed for integration with modern web frameworks and message brokers, facilitating real-time synchronization with external services and search engines. RethinkDB is configured via key-value files and command-line interfaces, with support for containerized deployment and automated infrastructure orchestration.
RethinkDB is a full-scale distributed document database that provides a practical, production-grade implementation of B-tree indexing and log-structured storage, serving as a robust reference for how these components function in a real-world system.
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongside hierarchical JSON documents and high-dimensional vector embeddings. It supports advanced operational patterns such as active-active database deployment for global distribution, real-time data streaming, and probabilistic statistics for large-scale data analysis. These capabilities are complemented by a pluggable indexing engine that enables semantic similarity matching and full-text retrieval. The platform offers a comprehensive ecosystem for managing distributed state, including master-replica replication, automated cluster management, and granular security controls like access control lists and TLS encryption. Developers can interact with the database through language-specific client libraries that support connection multiplexing and object mapping, or via a command-line interface for direct administrative tasks and scripting. Redis is deployed through standard package managers and supports both self-managed clusters and managed cloud instances. Observability is provided through integrated tools for performance analysis, slow log monitoring, and bulk data management.
Redis is a production-grade, in-memory key-value store that serves as a valuable reference for understanding event-driven architectures, append-only persistence logs, and efficient data structure implementation in C.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through advanced storage and execution techniques, including vectorized query processing and a merge tree storage engine that maintains performance during massive insertions. It features adaptive subcolumn mapping for semi-structured data and supports native vector search for machine learning and generative AI applications. To facilitate efficient data movement, the engine utilizes zero-copy shared memory buffers, minimizing overhead when interacting with external analytical tools or processing diverse file formats like Parquet, JSON, and Arrow. Beyond its core storage and processing capabilities, the project provides a comprehensive suite of tools for observability, security, and data integration. It includes built-in support for natural language querying, automated workflow orchestration for AI agents, and extensive diagnostic features for query plan inspection. The platform also offers robust cloud infrastructure management, including support for private networking, compliant deployment strategies, and integrated billing consolidation.
ClickHouse is a production-grade, high-performance analytical database that serves as a sophisticated reference for advanced storage techniques like merge trees and vectorized execution, though it is a complex industrial system rather than a simplified pedagogical implementation.