Educational projects and resources for implementing storage engines, query parsers, and database internals from scratch.
Pebble is an embedded key-value storage engine written in Go, designed as a library that provides durable, write-optimized data persistence directly within applications. It organizes data using a log-structured merge-tree (LSM-tree) structure, where writes are first buffered in an in-memory skiplist memtable and persisted to a write-ahead log before being flushed to block-based SSTable files on disk. The engine supports atomic batch commits, configurable write synchronization, and automatic background compaction that merges and rewrites sorted runs to reclaim space and maintain read performance. The storage engine distinguishes itself through support for range keys, which allow key-value pairs to apply to contiguous ranges of keyspace and be interleaved with point keys during iteration. It also includes property-based filtering, where user-defined key-value properties attached to SSTable blocks and files enable iteration to skip irrelevant data blocks and entire files during scans. Pebble provides a storage format migrator that can upgrade on-disk database files to newer physical formats at runtime, using either background or blocking compaction to apply the change. The engine offers comprehensive key-value operations including point lookups, range scans in both forward and reverse directions, and batch write operations that group multiple mutations into a single atomic commit. It includes built-in benchmarking tools for measuring read, write, and mixed workload throughput and latency under realistic access patterns, using configurable key distributions and value sizes.
Pebble is a production-grade LSM-tree storage engine that provides a clear, well-documented codebase for studying core storage components like write-ahead logging, SSTable management, and compaction strategies.
LMDB is an embedded key-value storage engine that provides ACID-compliant data persistence. It is a memory-mapped database that utilizes B+ trees to store key-value pairs, ensuring atomicity, consistency, isolation, and durability. The engine maps files directly into the virtual address space to minimize data copying and system calls. This approach enables high-performance local caching and low-latency data access, specifically optimizing for read-heavy database workflows. The system implements a transactional model with copy-on-write versioning and single-writer multi-reader locking. These mechanisms allow multiple concurrent read-only transactions to access consistent snapshots without locks while restricting modifications to a single writer.
LMDB is a production-grade embedded storage engine that provides a concrete, high-performance implementation of B+ trees, ACID-compliant transactions, and copy-on-write concurrency control, making it an excellent reference for studying these core storage components.
LevelDB is an embedded database library and persistent storage engine that provides a sorted key-value store. It uses a log-structured merge-tree architecture to map byte arrays to values, running directly within a process to provide storage without the need for a separate server process. The system is distinguished by its use of custom comparison functions to define key ordering, enabling efficient range scans and sequenced lookups. It ensures data reliability through atomic batch execution, consistent snapshot generation, and log-based recovery after failures. The engine covers broad capability areas including data management, storage maintenance, and performance optimization. It implements write-ahead logging and multi-level compaction to manage disk space, while utilizing block-based caching and data compression to optimize read and write throughput. Low-level integration is handled through an abstract file system interface that decouples the storage engine from the operating system.
LevelDB is a production-grade implementation of an LSM-tree storage engine that provides a clear, readable codebase for studying write-ahead logging, disk I/O management, and persistent key-value storage.
This is an educational relational database engine used in Carnegie Mellon University's database systems course. Students learn internals by implementing core components of a working database, including storage, indexing, concurrency control, and crash recovery. The system covers key database architecture: a B+ tree index for fast key-based lookups and range scans, a disk-oriented buffer pool that caches pages from disk, an iterator-based query execution model that composes physical operators, page-based storage for records, two-phase locking for coordinating concurrent transactions, and write-ahead logging to ensure durability and recoverability after failures. It also provides an interactive SQL shell for executing queries against a relational schema, concurrent transaction management with ACID guarantees, and crash recovery via logging and checkpoint techniques—giving students a complete hands-on environment for building and experimenting with database internals.
This is a purpose-built educational database engine that provides a hands-on implementation of B+ trees, write-ahead logging, buffer pool management, and concurrency control, covering all the core components requested.
MiniOB is an open-source educational relational database kernel designed for learning the internals of database systems. It implements a dual-engine storage architecture combining B+ Tree and LSM-Tree, supports SQL parsing and query execution, and provides transactional processing with multi-version concurrency control. The system communicates with clients using the MySQL wire protocol and includes a vector database extension for storing and querying high-dimensional vectors. The project distinguishes itself through its comprehensive coverage of core database concepts in a single, learnable codebase. It features a volcano-style execution model with SIMD-accelerated batch processing, write-ahead logging with checkpoint-based crash recovery, and a frame-based buffer pool for disk page management. The system supports multiple transaction isolation levels, deadlock detection and resolution, and includes an IVF-Flat vector index for approximate nearest neighbor search. The database includes a full SQL processing pipeline with query optimization techniques such as predicate pushdown, join reordering, and subquery flattening. It provides configurable thread pool models for client connection handling, supports custom SQL statement extensions, and includes monitoring tools for concurrency debugging and memory usage tracking. The project ships with a Docker-based development environment and an automated testing framework that can compare outputs against MySQL.
MiniOB is a purpose-built educational database kernel that explicitly implements B+ trees, LSM-trees, write-ahead logging, and buffer management, making it an ideal codebase for studying storage engine internals.
Sled is an embedded key-value store and ACID-compliant database designed for high-performance data persistence. It functions as a log-structured storage engine that organizes data using B+ trees to support efficient range queries and prefix scans. The engine implements a zero-copy data store model, utilizing epoch-based reclamation to provide direct references to cached values without memory allocations. It distinguishes itself through a combination of write-ahead logging, page cache optimizations to reduce write amplification on flash storage, and serializable transactions for atomic multi-key updates. The library covers a broad range of capabilities, including crash recovery through checkpointing, disk storage defragmentation, and binary stream replication. It also provides reactive data observation via key-prefix event subscriptions and supports custom merge logic for atomic value transformations.
Sled is a high-performance, production-ready embedded database engine that provides a practical, real-world implementation of B+ trees, log-structured storage, and write-ahead logging, making it an excellent codebase for studying these core storage components.
SlateDB is a cloud-native key-value store and distributed database engine that utilizes a log-structured merge-tree architecture. It serves as a transactional storage layer designed to persist data directly to cloud object storage. The engine differentiates itself by optimizing read performance for remote storage through the use of bloom filters and multi-level block caching. It employs a single-writer multi-reader model and provides the ability to create zero-copy clones via copy-on-write checkpointing. The system supports atomic transactions, range queries, and snapshot-based concurrency control to ensure data consistency. It also includes capabilities for capturing data changes to enable real-time synchronization and auditing.
SlateDB is a functional, production-oriented LSM-tree storage engine that provides a concrete codebase for studying modern log-structured storage, concurrency control, and cloud-native disk I/O management.
etcd is a distributed, strongly consistent key-value store designed to provide reliable storage for critical system metadata and coordination primitives. It functions as a distributed consensus engine, utilizing a replicated log and leader-based state machine to ensure that all nodes in a cluster maintain a synchronized view of data. By providing atomic operations and linearizable reads and writes, it serves as a foundational component for distributed systems requiring high availability and fault tolerance. The system distinguishes itself through its multi-version concurrency control, which enables non-blocking read operations while maintaining strict consistency for concurrent writes. It supports complex distributed coordination through features like lease-based expiration, which allows for the automatic removal of data based on client activity, and asynchronous key change monitoring, which provides real-time event notifications for data modifications. These capabilities are supported by a persistent B-tree-based storage engine and write-ahead logging to ensure durability across system crashes. Beyond its core storage functions, the project provides a comprehensive suite of tools for cluster management, including automated peer discovery via DNS or service registries and robust security enforcement. It includes built-in mechanisms for transport layer security, role-based access control, and certificate management to protect data in transit and at rest. Operational reliability is further maintained through snapshot-based disaster recovery, cluster health monitoring, and granular performance tuning for disk and network resources. The system is configured through structured files or command-line flags, allowing for flexible deployment across diverse infrastructure environments.
This is a production-grade distributed key-value store that utilizes a B-tree storage engine and write-ahead logging, providing a concrete, high-performance example of these components in a real-world codebase.
RocksDB is a high-performance, embeddable persistent key-value library and storage engine based on Log-Structured Merge-trees. It is designed to provide durable storage for large-scale datasets, integrating directly into applications to manage data on flash and RAM-based hardware. The engine is distinguished by its focus on minimizing read and write amplification through multi-threaded compaction and custom memory allocators. It features specialized optimizations for flash storage, including support for zoned block devices, and provides the ability to extend store behavior via external plugins. Its broad capability surface includes atomic transactions, column family partitioning for logical keyspace division, and data-at-rest encryption. The system also supports secondary indexing, time-to-live data expiration, and integration with distributed filesystems. Observability is provided through internal statistics tracking, component performance benchmarking, and crash recovery simulation.
RocksDB is a production-grade, embeddable storage engine that provides a comprehensive implementation of LSM-trees, write-ahead logging, and advanced disk I/O management, making it an ideal reference for studying high-performance storage architecture.
Bolt is a single-file embedded key-value store for Go applications. It is an ACID transactional database that organizes data in B+trees on disk to provide efficient sorted key retrieval and range scans. The system uses a memory-mapped model to map the database file directly into the process address space for fast random-access reads. The project distinguishes itself through a multi-version concurrency control architecture that allows multiple simultaneous readers to access a consistent snapshot of data without blocking a writer. It employs a single-writer multi-reader locking model and uses advisory filesystem locking to prevent multiple processes from opening the same database file simultaneously. The database provides hierarchical data organization through nested buckets and supports unique identifier generation for records. It includes capabilities for hot backups, batch write updates to reduce disk commit overhead, and cursor-based key seeking and iteration. Operational telemetry is available via internal database performance metrics to track system activity.
This is a production-grade embedded key-value store that provides a clear, readable implementation of B+tree storage, ACID transactions, and write-ahead logging, making it an excellent codebase for studying these core engine components.
LiteDB is a serverless, embedded NoSQL document database for .NET applications. It persists data into a single portable file, functioning as a BSON data store that resides within the application process rather than running as a separate server. The system is ACID compliant, utilizing write-ahead logging to ensure atomic, consistent, isolated, and durable transactions. It includes built-in encryption to provide secure local data storage and protect files on disk from unauthorized access. The project covers object-document mapping to convert classes into document formats, indexed search capabilities via B-tree indexing, and specialized streaming for large binary objects. It also provides a dedicated administrative studio for visual data administration and modification.
LiteDB is a functional embedded database engine that demonstrates core storage concepts like B-tree indexing, write-ahead logging, and ACID-compliant transaction management within a single-file architecture.
Badger is an embeddable key-value store written in Go that provides persistent data storage for byte keys and values. It is a persistent database that utilizes a tiered LSM tree storage model to optimize disk storage and retrieval efficiency. The system features an ACID transaction engine that ensures data integrity through serializable snapshot isolation and multi-version concurrency control. It also provides an encrypted key-value store with data-at-rest encryption and a managed encrypted key registry to secure stored information. The engine covers a broad set of capabilities including high-throughput data ingestion, persistent data indexing, and crash recovery via write-ahead logging. It supports advanced data management such as asynchronous value merging, background compaction, and prefix-based data streaming. Monitoring and observability tools are included for data integrity validation and performance metrics. The project includes command-line utilities for performing offline database backup and restoration of the database state.
Badger is a production-grade, embeddable key-value store that provides a practical, high-performance implementation of an LSM-tree storage engine, including essential components like write-ahead logging and concurrency control.
redb is an embedded key-value store and ACID-compliant storage engine. It functions as a persistent storage system for saving and retrieving data as key-value pairs within a tree structure. The engine is built as an MVCC transactional database, utilizing multi-version concurrency control to manage simultaneous reads and writes without blocking. It employs a single-writer multi-reader model to ensure data consistency while allowing multiple threads to access the store. The system provides persistent state management and atomic transaction management to prevent data corruption during crashes. It handles concurrent data access and ensures that groups of changes are applied as single units.
This is a functional, ACID-compliant embedded storage engine that provides a practical codebase for studying B-tree structures, MVCC, and write-ahead logging in a production-grade Rust implementation.
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with table schemas, primary keys, and foreign keys to enforce referential integrity. Its query engine parses SQL statements and executes them through an optimized pipeline of operators, supporting joins, aggregations, and heuristic query optimization. Data is persisted via pluggable storage backends, including log-structured and in-memory engines.
This project serves as a comprehensive educational codebase that implements core storage engine components, including log-structured storage and write-ahead logging, within a functional distributed SQL database.
PostgreSQL is an object-relational database management system designed for the persistent storage and retrieval of structured information. It functions as an ACID-compliant database server, utilizing standard query language protocols to maintain data consistency and reliability across large-scale application datasets. The system distinguishes itself through an extensible architecture that allows for the definition of custom data types, operators, and indexing methods. It employs multi-version concurrency control to enable simultaneous read and write operations without blocking, supported by a cost-based query optimizer that evaluates statistical data to determine efficient execution paths. The engine manages data integrity through strict logical rules and constraints, ensuring accuracy across interconnected information sets. It utilizes write-ahead logging to guarantee durability during system failures and maintains a shared buffer cache to minimize disk input and output operations during concurrent access.
PostgreSQL is a production-grade relational database management system that provides a comprehensive, real-world implementation of core storage engine components like B-trees, write-ahead logging, and multi-version concurrency control.
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability surface including relational data management, analytical query execution, and database telemetry collection for gathering hardware and configuration statistics.
While this is a full-scale relational database management system that utilizes B-trees and write-ahead logging, it is a production-ready application rather than an educational resource or codebase designed to demonstrate the implementation of core storage engine components.
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through a layered architecture that separates the relational SQL abstraction from a distributed key-value store. It achieves global consistency without requiring perfectly synchronized hardware clocks by employing a hybrid logical clock synchronization mechanism. To support high-concurrency environments, it utilizes multi-version concurrency control and lock-free transaction execution, which allow for consistent snapshots and efficient conflict resolution. Furthermore, the engine is built for compatibility, implementing the standard wire protocol to support existing relational database drivers and tools. Beyond its core transactional capabilities, the platform includes comprehensive tooling for cluster orchestration, security, and performance diagnostics. It supports a variety of deployment models, ranging from self-hosted on-premises configurations to fully managed cloud services. The system provides a command-line interface for session management and query execution, ensuring that administrators can monitor cluster health and manage workloads through standard relational interfaces.
This is a production-grade distributed SQL database that contains complex implementations of storage engine components like LSM-trees and Raft-based consensus, though it is a full-scale system rather than a simplified educational codebase.
TiKV is a distributed transactional key-value store designed for horizontal scalability and high availability. It functions as a storage engine that maintains massive datasets across a cluster of physical nodes, ensuring that information remains accessible and consistent even when individual hardware components fail. The system utilizes a consensus-based replication model to synchronize data across nodes, ensuring that all replicas agree on the order of operations. It manages data distribution through a sharding mechanism that partitions large datasets into smaller groups, each governed by independent consensus instances. To handle concurrent access, the engine employs multi-version concurrency control, allowing for consistent reads without blocking ongoing write operations. The architecture supports complex distributed transactions by coordinating multi-stage voting processes to ensure that all participating nodes either commit or abort changes together. It maintains data integrity through a storage engine that organizes information into sorted files on disk to optimize performance. The cluster maintains a consistent view of its state and topology through peer-to-peer communication and centralized orchestration.
TiKV is a production-grade distributed storage engine that implements core components like LSM-trees via RocksDB and write-ahead logging, making it a robust codebase for studying advanced storage engine architecture.