# Database Internals and Storage Engines

> Search results for `learn database internals and storage engine design` on awesome-repositories.com. 117 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/learn-database-internals-and-storage-engine-design

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/learn-database-internals-and-storage-engine-design).**

## Results

- [pingcap/awesome-database-learning](https://awesome-repositories.com/repository/pingcap-awesome-database-learning.md) (10,672 ⭐) — This project is a curated collection of academic papers, books, and technical resources designed for studying the architecture and implementation of database management systems. It serves as a comprehensive educational guide for engineers and researchers looking to understand the fundamental principles behind modern data storage and retrieval.

The repository distinguishes itself by providing structured learning paths across critical database domains, including the design of persistent storage engines, the mechanics of query optimization, and the complexities of distributed transaction managem
- [etcd-io/etcd](https://awesome-repositories.com/repository/etcd-io-etcd.md) (51,838 ⭐) — etcd is a distributed, strongly consistent key-value store designed to provide reliable storage for critical system metadata and coordination primitives. It functions as a distributed consensus engine, utilizing a replicated log and leader-based state machine to ensure that all nodes in a cluster maintain a synchronized view of data. By providing atomic operations and linearizable reads and writes, it serves as a foundational component for distributed systems requiring high availability and fault tolerance.

The system distinguishes itself through its multi-version concurrency control, which e
- [clickhouse/clickhouse](https://awesome-repositories.com/repository/clickhouse-clickhouse.md) (48,229 ⭐) — ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring.

The platform distinguishes itself through ad
- [alextanhongpin/database-design](https://awesome-repositories.com/repository/alextanhongpin-database-design.md) (500 ⭐) — Ideas on better database design
- [bytebytegohq/system-design-101](https://awesome-repositories.com/repository/bytebytegohq-system-design-101.md) (83,491 ⭐) — This project is a centralized engineering knowledge repository that provides a structured curriculum for mastering system design, architectural patterns, and fundamental software development workflows. It serves as a professional development resource for engineers, offering foundational knowledge and real-world case studies to support the design of scalable, secure, and efficient distributed systems.

The repository distinguishes itself through a visual-first approach to knowledge synthesis, distilling complex technical concepts into high-density graphical diagrams and succinct illustrations.
- [oceanbase/miniob](https://awesome-repositories.com/repository/oceanbase-miniob.md) (4,318 ⭐) — MiniOB is an open-source educational relational database kernel designed for learning the internals of database systems. It implements a dual-engine storage architecture combining B+ Tree and LSM-Tree, supports SQL parsing and query execution, and provides transactional processing with multi-version concurrency control. The system communicates with clients using the MySQL wire protocol and includes a vector database extension for storing and querying high-dimensional vectors.

The project distinguishes itself through its comprehensive coverage of core database concepts in a single, learnable c
- [flipperdevices/flipperzero-firmware](https://awesome-repositories.com/repository/flipperdevices-flipperzero-firmware.md) (15,563 ⭐) — This project provides an open-source firmware platform and complete build environment for portable multi-tool hardware. It functions as an embedded operating system designed to manage radio, infrared, and physical interface components, enabling users to develop custom applications and system logic for specialized hardware devices.

The firmware distinguishes itself through a modular architecture that organizes system functionality into isolated units, allowing for the development of custom user interfaces and logic. It includes a comprehensive collection of low-level drivers and applications s
- [dagster-io/dagster](https://awesome-repositories.com/repository/dagster-io-dagster.md) (14,974 ⭐) — Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality.

The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows.
- [thumlp/internal](https://awesome-repositories.com/repository/thumlp-internal.md) (1 ⭐) — Code of the NeurIPS 2025 paper "Investigating and Mitigating Catastrophic Forgetting in Medical Knowledge Injection through Internal Knowledge Augmentation Learning"
- [charlax/professional-programming](https://awesome-repositories.com/repository/charlax-professional-programming.md) (51,116 ⭐) — This project is a curated knowledge repository designed to support the professional development of software engineers. It functions as a comprehensive index of industry best practices, methodologies, and design principles, providing a structured roadmap for those seeking to improve their technical skills, architectural decision-making, and career trajectory.

The repository distinguishes itself through a community-driven approach, relying on peer-reviewed contributions to maintain an up-to-date collection of resources. It organizes vast amounts of technical information into a hierarchical taxo
- [fireproof-storage/mcp-database-server](https://awesome-repositories.com/repository/fireproof-storage-mcp-database-server.md) (32 ⭐) — Store and load JSON documents from LLM tool use
- [pubkey/rxdb](https://awesome-repositories.com/repository/pubkey-rxdb.md) (23,048 ⭐) — This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored.

The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
- [agno-agi/agno](https://awesome-repositories.com/repository/agno-agi-agno.md) (40,717 ⭐) — Agno is an agent operating system designed to manage the lifecycle, tool execution, and persistent state of autonomous agents across distributed infrastructure. It provides a unified runtime environment that wraps diverse agent frameworks into a consistent, interoperable protocol, allowing developers to build and deploy complex multi-agent systems that coordinate tasks and delegate sub-processes.

The platform distinguishes itself through a robust governance and orchestration layer that includes human-in-the-loop approval gates, role-based access control, and a centralized API gateway. It feat
- [tidwall/buntdb](https://awesome-repositories.com/repository/tidwall-buntdb.md) (4,834 ⭐) — BuntDB is an embedded key-value store for Go applications, providing in-memory storage with optional disk persistence. It structures data using a B-tree for ordered key-value access and an R-tree for spatial indexing, allowing both range scans and geometric intersection queries. Support for indexing on nested JSON document fields enables efficient lookups by values within JSON objects, and per-key time-to-live (TTL) expiration automatically removes stale entries.

The store uses copy-on-write transaction isolation, ensuring each transaction sees a consistent snapshot and changes are applied at
- [gofiber/storage](https://awesome-repositories.com/repository/gofiber-storage.md) (332 ⭐) — Premade storage drivers that implement the Storage interface, designed to be used with various Fiber middlewares.
- [atom-archive/xray](https://awesome-repositories.com/repository/atom-archive-xray.md) (8,420 ⭐) — Xray is a collaborative text editor and distributed workspace manager that utilizes conflict-free replicated data types to synchronize real-time edits and directory structures across peers. It functions as both an Electron-based desktop application and a headless editor server that manages workspaces and file systems remotely for connected clients.

The project distinguishes itself by integrating fine-grained version control tracking, recording keystroke-level changes and uncommitted edits between Git commits. It employs a decentralized synchronization model for working copies and uses operati
- [duplicati/duplicati](https://awesome-repositories.com/repository/duplicati-duplicati.md) (14,283 ⭐) — Duplicati is a self-hosted backup server designed to perform encrypted, incremental, and compressed backups to a wide range of local, network, and cloud-based storage providers. It functions as a background service that automates recurring data protection tasks, ensuring that only changed data blocks are stored to maximize efficiency and minimize bandwidth usage.

The project distinguishes itself through a centralized management console that allows for the orchestration of multiple distributed backup agents from a single web-based dashboard. It supports multi-tenant management, enabling the or
- [managedcode/storage](https://awesome-repositories.com/repository/managedcode-storage.md) (134 ⭐) — Storage library provides a universal interface for accessing and manipulating data in different cloud blob storage providers
- [balloonwj/cppguide](https://awesome-repositories.com/repository/balloonwj-cppguide.md) (6,030 ⭐) — CppGuide is a curated collection of educational resources and practical guides focused on C++ server development, Linux kernel internals, concurrent programming, network protocols, and security exploitation. It provides structured learning paths for backend developers, covering everything from interview preparation to building high-performance network servers and understanding operating system fundamentals.

The guide distinguishes itself by offering in-depth, hands-on tutorials that walk through real-world implementations, including building a Redis-like server from scratch, designing custom
- [curl/curl](https://awesome-repositories.com/repository/curl-curl.md) (42,214 ⭐) — Curl is a command-line tool and portable library for transferring data across a wide range of network protocols. It functions as a unified engine that abstracts diverse communication standards, allowing users and developers to move files and information between servers using a consistent interface. The project provides both a versatile command-line client for terminal-based automation and a stable programmatic interface for integrating complex network operations into applications.

The system is distinguished by its protocol-agnostic core and its ability to manage both synchronous and asynchro
- [encode/databases](https://awesome-repositories.com/repository/encode-databases.md) (4,002 ⭐) — Async database support for Python. 🗄
- [spacejam/sled](https://awesome-repositories.com/repository/spacejam-sled.md) (8,928 ⭐) — Sled is an embedded key-value store and ACID-compliant database designed for high-performance data persistence. It functions as a log-structured storage engine that organizes data using B+ trees to support efficient range queries and prefix scans.

The engine implements a zero-copy data store model, utilizing epoch-based reclamation to provide direct references to cached values without memory allocations. It distinguishes itself through a combination of write-ahead logging, page cache optimizations to reduce write amplification on flash storage, and serializable transactions for atomic multi-k
- [dexidp/dex](https://awesome-repositories.com/repository/dexidp-dex.md) (10,902 ⭐) — Dex is an OpenID Connect provider and identity federation proxy that translates authentication signals from various upstream sources into a unified OpenID Connect interface. It functions as a multi-protocol identity broker, enabling client applications to implement a single standard while delegating user verification to external identity providers.

The project distinguishes itself through a pluggable connector architecture that bridges disparate protocols including LDAP, SAML, and OAuth2. It provides specific integrations for services such as GitHub, Google, GitLab, and Microsoft, while offer
- [react-native-async-storage/async-storage](https://awesome-repositories.com/repository/react-native-async-storage-async-storage.md) (5,067 ⭐) — React Native AsyncStorage is a persistent key-value storage library designed for React Native applications. It provides a unified local storage interface that works identically on both iOS and Android, ensuring saved data remains available across app restarts and when the device has no network connectivity.

The library uses an asynchronous background I/O queue to handle all storage operations without blocking the JavaScript thread, communicating with native storage engines through React Native's bridge protocol. It includes a serialization layer that converts JavaScript values to strings for
- [boltdb/bolt](https://awesome-repositories.com/repository/boltdb-bolt.md) (14,642 ⭐) — Bolt is a single-file embedded key-value store for Go applications. It is an ACID transactional database that organizes data in B+trees on disk to provide efficient sorted key retrieval and range scans. The system uses a memory-mapped model to map the database file directly into the process address space for fast random-access reads.

The project distinguishes itself through a multi-version concurrency control architecture that allows multiple simultaneous readers to access a consistent snapshot of data without blocking a writer. It employs a single-writer multi-reader locking model and uses a
- [gokumohandas/made-with-ml](https://awesome-repositories.com/repository/gokumohandas-made-with-ml.md) (48,343 ⭐) — Made-With-ML is an automated documentation generator and developer experience platform designed to transform source code into structured, searchable reference websites. It functions as a codebase intelligence tool that parses implementation details to provide clear explanations of logic and data requirements.

The system distinguishes itself by leveraging language-level type annotations and structured code comments to generate interface specifications. By utilizing static analysis to extract metadata, it automates the transformation of docstrings into web-ready documentation, ensuring that tec
- [enggen/deepmind-advanced-deep-learning-and-reinforcement-learning](https://awesome-repositories.com/repository/enggen-deepmind-advanced-deep-learning-and-reinforcement-learning.md) (862 ⭐) — Advanced Deep Learning and Reinforcement Learning course taught at UCL in partnership with Deepmind
- [lmdb/lmdb](https://awesome-repositories.com/repository/lmdb-lmdb.md) (2,907 ⭐) — LMDB is an embedded key-value storage engine that provides ACID-compliant data persistence. It is a memory-mapped database that utilizes B+ trees to store key-value pairs, ensuring atomicity, consistency, isolation, and durability.

The engine maps files directly into the virtual address space to minimize data copying and system calls. This approach enables high-performance local caching and low-latency data access, specifically optimizing for read-heavy database workflows.

The system implements a transactional model with copy-on-write versioning and single-writer multi-reader locking. These
- [sindresorhus/internal-ip](https://awesome-repositories.com/repository/sindresorhus-internal-ip.md) (350 ⭐) — ``sh npm install internal-ip ``
- [camel-ai/camel](https://awesome-repositories.com/repository/camel-ai-camel.md) (17,253 ⭐) — This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer.

The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
- [zuzoovn/machine-learning-for-software-engineers](https://awesome-repositories.com/repository/zuzoovn-machine-learning-for-software-engineers.md) (28,797 ⭐) — A complete daily plan for studying to become a machine learning engineer.
- [louischatriot/nedb](https://awesome-repositories.com/repository/louischatriot-nedb.md) (13,540 ⭐) — NeDB is a JavaScript embedded NoSQL document store designed for Node.js and the browser. It functions as an in-memory data store with the option to persist documents to a local file system, ensuring data survives application restarts.

The project utilizes a MongoDB-compatible API to perform data operations, allowing it to serve as a lightweight document indexing system and a persistent file database without requiring a separate database server.

Capabilities include querying, inserting, updating, and deleting documents, as well as the ability to create indexes on specific fields to accelerate
- [deuxfleurs-org/garage](https://awesome-repositories.com/repository/deuxfleurs-org-garage.md) (2,944 ⭐) — Garage is a distributed object storage system that provides an S3-compatible API gateway. It is designed to synchronize metadata across distributed nodes using conflict-free replicated data types and Merkle-tree state alignment to maintain cluster-wide consistency.

The system ensures data resilience through zone-aware replication, distributing data copies across multiple physical locations. It employs quorum-based request routing and versioned layout management to validate and commit cluster configuration changes.

The project covers a broad range of operational capabilities, including automa
- [mariadb/server](https://awesome-repositories.com/repository/mariadb-server.md) (7,196 ⭐) — This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings.

The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings.

The software covers a broad capability su
- [fermyon/spin](https://awesome-repositories.com/repository/fermyon-spin.md) (6,443 ⭐) — Spin is a WebAssembly serverless framework and development toolchain for building and running portable microservices. It functions as an event-driven orchestrator and runtime that executes WebAssembly components, allowing developers to map HTTP requests, Redis messages, and cron schedules to specific modules.

The project distinguishes itself by implementing a Wasm-based AI inference gateway, enabling components to perform model inference and generate text embeddings. It utilizes the WebAssembly Component Model and WASI for language-agnostic composition and portable host interfacing, while emp
- [internlm/intern-s1](https://awesome-repositories.com/repository/internlm-intern-s1.md) (814 ⭐) — 🤗Intern-S2 Model Collections • 🤗Intern-S1 Model Collections • ModelScope • 📜Technical Report(S1) • 📜Technical Report(S1-Pro) • 💬Online Chat
- [aws/aws-cdk](https://awesome-repositories.com/repository/aws-aws-cdk.md) (12,817 ⭐) — The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane.

The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
- [vonng/ddia](https://awesome-repositories.com/repository/vonng-ddia.md) (22,648 ⭐) — This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure.

The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
- [illuminate/database](https://awesome-repositories.com/repository/illuminate-database.md) (2,766 ⭐) — [READ ONLY] Subtree split of the Illuminate Database component (see laravel/framework)
- [realm/realm-java](https://awesome-repositories.com/repository/realm-realm-java.md) (11,464 ⭐) — Realm Java is a NoSQL mobile object database and reactive database engine. It provides a persistent local data store that saves native objects directly to disk, replacing traditional SQL storage and object-relational mapping layers.

The system functions as a real-time data synchronizer, coordinating local database changes with a cloud backend across multiple devices. It integrates a reactive engine that uses change listeners and asynchronous event streams to automatically update user interfaces when underlying data changes.

The project covers object-oriented data modeling, CRUD operations, a
- [facebook/react](https://awesome-repositories.com/repository/facebook-react.md) (245,669 ⭐) — React is a JavaScript library for building user interfaces based on a component-driven architecture and unidirectional data flow.
- [stemmlerjs/software-design-and-architecture-roadmap](https://awesome-repositories.com/repository/stemmlerjs-software-design-and-architecture-roadmap.md) (3,402 ⭐) — 🧱 The software design and architecture roadmap for any developer
- [dask/dask](https://awesome-repositories.com/repository/dask-dask.md) (13,746 ⭐) — Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements.

The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
- [realm/realm-cocoa](https://awesome-repositories.com/repository/realm-realm-cocoa.md) (16,608 ⭐) — Realm-Cocoa is a NoSQL mobile database engine and reactive object database designed for local data storage on mobile devices. It serves as a non-relational alternative to Core Data and SQLite, storing data as objects rather than tables.

The system functions as an encrypted local store that protects sensitive application data using encryption. It provides reactive data synchronization, allowing application objects and user interfaces to update automatically when the underlying database changes.
- [huggingface/ml-intern](https://awesome-repositories.com/repository/huggingface-ml-intern.md) (10,521 ⭐) — This project is an autonomous AI agent framework and workflow orchestrator designed to automate machine learning engineering. It functions as a reasoning engine that reads research papers and writes code to train and deploy machine learning models through iterative reasoning loops and tool execution.

The system distinguishes itself by integrating a GPU-accelerated sandboxed execution environment, allowing it to run and verify machine learning scripts in isolated remote containers. It utilizes a model provider integration gateway to route inference requests across various hosted or local endpo
- [etcd-io/bbolt](https://awesome-repositories.com/repository/etcd-io-bbolt.md) (9,573 ⭐) — bbolt is an ACID-compliant embedded key-value store for Go applications. It persists all data in a single memory-mapped file on disk, organizing information using B+ trees to facilitate sorted key iteration and efficient range queries.

The project distinguishes itself through a hierarchical data organization model, allowing buckets to be nested within other buckets to create a tree-like structure. It employs a single-writer, multi-reader locking mechanism and copy-on-write transactions to ensure serializable isolation and data integrity.

The system includes comprehensive data management capa
- [0voice/interview_internal_reference](https://awesome-repositories.com/repository/0voice-interview-internal-reference.md) (37,235 ⭐) — This project is a comprehensive technical interview question bank and reference library designed for software engineering roles at major technology companies. It serves as a study guide and knowledge base covering the core principles of high-performance systems programming and computer science theory.

The collection focuses on deep technical domains, including C++ language mastery, distributed systems design, and database engineering. It provides detailed material on consensus protocols, cluster coordination, and the architectural differences between SQL and NoSQL implementations.

The resour
- [dragonflydb/dragonfly](https://awesome-repositories.com/repository/dragonflydb-dragonfly.md) (30,688 ⭐) — Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries.

What distinguishes Dragonfly is its focus on effic
- [susom/database](https://awesome-repositories.com/repository/susom-database.md) (43 ⭐) — The point of this project is to provide a simplified way of accessing databases. It is a wrapper around the JDBC driver, and tries to hide some of the more error-prone, unsafe, and non-portable parts of the standard API. It uses standard Java types for all operations (as opposed to java.sql.*),…
- [coollabsio/coolify](https://awesome-repositories.com/repository/coollabsio-coolify.md) (57,055 ⭐) — This project is a self-hosted platform-as-a-service that provides a centralized management interface for deploying, configuring, and monitoring containerized applications and databases on private infrastructure. It functions as a visual control plane, automating the end-to-end lifecycle of services from source code to production. By managing container orchestration, networking, and resource allocation, it allows users to maintain full control over their own hardware while streamlining the delivery of software.

The platform distinguishes itself through its agentless architecture, which uses se
