Explore open-source relational, NoSQL, and distributed database systems for managing and scaling your application data.
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
This is a comprehensive, distributed document-oriented database that natively supports vector search, high availability through clustering, and flexible data modeling, making it a flagship example of the requested category.
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Manticore Search is a high-performance database management system that supports SQL, vector search, and distributed clustering, making it a comprehensive solution for managing both structured and unstructured data.
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
ScyllaDB is a high-performance, distributed NoSQL database that natively supports vector search, time-series data, and cluster-based high availability, making it a comprehensive solution for managing large-scale structured and unstructured data.
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
YugabyteDB is a distributed SQL database that provides high availability, horizontal scaling, and native vector search capabilities, making it a comprehensive solution for managing structured data in a self-hostable environment.
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
TiDB is a distributed, self-hostable SQL database that supports high availability, clustering, and vector search, making it a comprehensive solution for managing structured data at scale.
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
MariaDB is a robust relational database management system that provides SQL support, high availability through clustering, and modern vector search capabilities, though it lacks native NoSQL document store functionality.
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Chroma is a specialized vector database that manages unstructured data and metadata, fitting the category of a database management system even though it focuses on vector-based retrieval rather than traditional SQL or time-series workloads.
Databend is a cloud-native data warehouse and OLAP database designed for large-scale analytics. It functions as a SQL-compliant engine and serverless analytics platform that separates compute from storage to allow for independent scaling. The system integrates vector database capabilities, indexing high-dimensional embeddings to enable semantic, hybrid, and full-text searches across massive datasets. It further distinguishes itself through serverless compute management that automatically scales resources based on demand and shuts them down during idle periods. The platform covers a broad set
Databend is a cloud-native OLAP database and data warehouse that provides SQL support and vector search capabilities, making it a powerful tool for large-scale analytical data management.
QuestDB is a high-performance, distributed time-series database designed for the ingestion, storage, and analysis of massive datasets. It functions as a real-time analytics platform that utilizes a columnar storage engine to optimize disk input and output, enabling efficient analytical scans and complex windowing operations on streaming data. The platform distinguishes itself through specialized capabilities for handling asynchronous time-series streams, including advanced join algorithms that align disparate data sets based on precise timestamp lookups. It supports high-volume ingestion thro
QuestDB is a high-performance, SQL-compatible database specifically optimized for time-series data, making it a strong choice for analytical workloads even though it lacks native NoSQL document storage or general-purpose vector search features.
immudb is a tamperproof database that maintains an immutable record of entries using cryptographic commit logging. It ensures verifiable database integrity by utilizing Merkle trees to generate membership and consistency proofs that detect unauthorized data alterations. The system employs a multi-model storage engine that unifies key-value, document, and relational data structures within a single immutable backend. It provides compatibility with the PostgreSQL wire protocol, allowing it to integrate with standard SQL clients, ORMs, and database tools. The project covers broad capabilities in
This is a multi-model database system that supports SQL, document storage, and temporal queries, making it a capable tool for managing structured and unstructured data with a focus on immutability and auditability.
KeyDB is a multithreaded in-memory key-value store and distributed cache. It functions as a NoSQL database utilizing multi-version concurrency control to execute non-blocking queries and scans. The project is a multithreaded fork of Redis that maintains protocol compatibility while utilizing a multithreaded architecture to scale across multi-core hardware. It distinguishes itself with flash-tiered storage, allowing the system to offload data from primary RAM to SSD or flash storage to increase total capacity. The system supports high availability through active-active mesh replication and mu
KeyDB is a high-performance, multithreaded NoSQL key-value store that provides robust clustering and high availability, though it lacks native SQL support and specialized vector search capabilities.
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
LibSQL is a distributed SQL database engine that extends SQLite with vector search and remote synchronization capabilities, making it a capable choice for edge and serverless environments despite its primary identity as an embedded library rather than a standalone server-based DBMS.
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
This is a distributed relational database that provides high availability and SQL support by clustering SQLite, though it lacks native NoSQL, vector search, or time-series specific optimizations.
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
RethinkDB is a distributed, document-oriented database that provides high availability and clustering for JSON data, though it lacks native SQL support and specialized vector search capabilities.
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
Redis is a high-performance, self-hostable NoSQL database that supports vector search, clustering, and time-series data, though it lacks native SQL support.
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
TDengine is a specialized, self-hostable distributed database that provides SQL-compatible querying for time-series and IoT data, though it is optimized for high-speed metrics rather than general-purpose document or vector storage.
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
This is a reactive, local-first NoSQL database engine designed for client-side and cross-environment synchronization, which fits the category of a database management system despite its primary focus on browser and application-state persistence rather than traditional server-side clustering.
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
DuckDB is a high-performance, in-process analytical database that provides robust SQL support and efficient data processing, though it is designed as an embedded engine rather than a traditional client-server system for high-availability clustering.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is a specialized vector database and columnar store that provides efficient indexing and retrieval for high-dimensional data, though it lacks the broad general-purpose SQL or time-series features of a traditional relational database management system.
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
AliSQL is a high-performance fork of MySQL that provides a robust relational database management system with added support for vector search, JSON document storage, and enterprise-grade clustering features.
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
CockroachDB is a robust, distributed SQL database that excels at horizontal scaling and high availability, though it focuses primarily on relational data rather than providing native NoSQL or specialized time-series storage engines.
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Redis is a high-performance, self-hostable NoSQL database that supports vector search, clustering, and high availability, though it lacks native SQL support.
Turso is a distributed SQL database platform that provides managed, edge-hosted SQLite instances. It functions as a serverless database provider, enabling the deployment of relational databases that synchronize data across multiple geographic regions to support high availability and performance. The platform distinguishes itself by utilizing a fork of SQLite as its core storage engine, which supports both local file storage and remote network-based replication. It employs an edge-optimized proxy to route queries through a global network, minimizing latency by connecting users to the nearest d
Turso is a distributed SQL database built on SQLite that provides high availability and edge-optimized performance, though it is primarily a managed platform rather than a traditional self-hosted database server.
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
GreptimeDB is a distributed, cloud-native database specifically optimized for time-series data and observability, offering SQL support and high availability while functioning as a powerful, self-hostable engine for structured and semi-structured metrics, logs, and traces.
Pglite is a client-side relational database engine that runs a full-featured PostgreSQL instance directly within browser and Node.js environments. By leveraging WebAssembly, it provides a persistent SQL storage solution that enables complex data management and querying without requiring an external database server. The project distinguishes itself through a reactive SQL data layer that automatically synchronizes user interface components with live query results. It manages database operations using worker threads to prevent main-thread blocking and coordinates access across multiple browser t
Pglite is a full-featured PostgreSQL instance running in WebAssembly, providing a robust SQL-based database engine for local-first applications, though it lacks the clustering and time-series features required for large-scale server-side deployments.