What are the best open-source GitHub repositories for databases?

mongodb/mongo is the closest match — This is a comprehensive, distributed document-oriented database that natively supports vector search, high availability through clustering, and flexible data modeling, making it a flagship example of the requested category.. Other strong matches: manticoresoftware/manticoresearch, scylladb/scylladb, yugabyte/yugabyte-db, pingcap/tidb.

Why does mongodb/mongo match “databases”?

This is a comprehensive, distributed document-oriented database that natively supports vector search, high availability through clustering, and flexible data modeling, making it a flagship example of the requested category.

Why does manticoresoftware/manticoresearch match “databases”?

Manticore Search is a high-performance database management system that supports SQL, vector search, and distributed clustering, making it a comprehensive solution for managing both structured and unstructured data.

Why does scylladb/scylladb match “databases”?

ScyllaDB is a high-performance, distributed NoSQL database that natively supports vector search, time-series data, and cluster-based high availability, making it a comprehensive solution for managing large-scale structured and unstructured data.

Why does yugabyte/yugabyte-db match “databases”?

YugabyteDB is a distributed SQL database that provides high availability, horizontal scaling, and native vector search capabilities, making it a comprehensive solution for managing structured data in a self-hostable environment.

Why does pingcap/tidb match “databases”?

TiDB is a distributed, self-hostable SQL database that supports high availability, clustering, and vector search, making it a comprehensive solution for managing structured data at scale.

डेटाबेस

Explore open-source relational, NoSQL, and distributed database systems for managing and scaling your application data.

AI के साथ बेहतरीन रिपॉजिटरी खोजें।हम AI का उपयोग करके सबसे सटीक रिपॉजिटरी खोजेंगे।

mongodb/mongo
mongodb/mongo
28,158GitHub पर देखें
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
This is a comprehensive, distributed document-oriented database that natively supports vector search, high availability through clustering, and flexible data modeling, making it a flagship example of the requested category.
C++Document DatabasesDocument StoresHigh Availability Configurations
GitHub पर देखें28,158
manticoresoftware/manticoresearch
manticoresoftware/manticoresearch
11,819GitHub पर देखें
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Manticore Search is a high-performance database management system that supports SQL, vector search, and distributed clustering, making it a comprehensive solution for managing both structured and unstructured data.
C++High Availability ArchitecturesVector SearchVector Databases
GitHub पर देखें11,819
scylladb/scylladb
scylladb/scylladb
15,355GitHub पर देखें
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
ScyllaDB is a high-performance, distributed NoSQL database that natively supports vector search, time-series data, and cluster-based high availability, making it a comprehensive solution for managing large-scale structured and unstructured data.
C++Time SeriesVector DatabasesNoSQL Databases
GitHub पर देखें15,355
yugabyte/yugabyte-db
yugabyte/yugabyte-db
10,349GitHub पर देखें
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
YugabyteDB is a distributed SQL database that provides high availability, horizontal scaling, and native vector search capabilities, making it a comprehensive solution for managing structured data in a self-hostable environment.
CHigh Availability ArchitecturesRelationalVector Databases
GitHub पर देखें10,349
pingcap/tidb
pingcap/tidb
40,166GitHub पर देखें
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
TiDB is a distributed, self-hostable SQL database that supports high availability, clustering, and vector search, making it a comprehensive solution for managing structured data at scale.
GoData ReplicationDatabase ReplicationHigh Availability Architectures
GitHub पर देखें40,166
mariadb/server
MariaDB/server
7,196GitHub पर देखें
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
MariaDB is a robust relational database management system that provides SQL support, high availability through clustering, and modern vector search capabilities, though it lacks native NoSQL document store functionality.
C++Vector SearchRelationalVector Databases
GitHub पर देखें7,196
chroma-core/chroma
chroma-core/chroma
26,198GitHub पर देखें
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Chroma is a specialized vector database that manages unstructured data and metadata, fitting the category of a database management system even though it focuses on vector-based retrieval rather than traditional SQL or time-series workloads.
RustDocument StoresVector SearchVector Databases
GitHub पर देखें26,198
databendlabs/databend
databendlabs/databend
9,351GitHub पर देखें
Databend is a cloud-native data warehouse and OLAP database designed for large-scale analytics. It functions as a SQL-compliant engine and serverless analytics platform that separates compute from storage to allow for independent scaling. The system integrates vector database capabilities, indexing high-dimensional embeddings to enable semantic, hybrid, and full-text searches across massive datasets. It further distinguishes itself through serverless compute management that automatically scales resources based on demand and shuts them down during idle periods. The platform covers a broad set
Databend is a cloud-native OLAP database and data warehouse that provides SQL support and vector search capabilities, making it a powerful tool for large-scale analytical data management.
RustSQL EnginesVector SearchVector Databases
GitHub पर देखें9,351
questdb/questdb
questdb/questdb
17,062GitHub पर देखें
QuestDB is a high-performance, distributed time-series database designed for the ingestion, storage, and analysis of massive datasets. It functions as a real-time analytics platform that utilizes a columnar storage engine to optimize disk input and output, enabling efficient analytical scans and complex windowing operations on streaming data. The platform distinguishes itself through specialized capabilities for handling asynchronous time-series streams, including advanced join algorithms that align disparate data sets based on precise timestamp lookups. It supports high-volume ingestion thro
QuestDB is a high-performance, SQL-compatible database specifically optimized for time-series data, making it a strong choice for analytical workloads even though it lacks native NoSQL document storage or general-purpose vector search features.
JavaData ReplicationHigh Availability ArchitecturesTime Series Databases
GitHub पर देखें17,062
codenotary/immudb
codenotary/immudb
8,982GitHub पर देखें
immudb is a tamperproof database that maintains an immutable record of entries using cryptographic commit logging. It ensures verifiable database integrity by utilizing Merkle trees to generate membership and consistency proofs that detect unauthorized data alterations. The system employs a multi-model storage engine that unifies key-value, document, and relational data structures within a single immutable backend. It provides compatibility with the PostgreSQL wire protocol, allowing it to integrate with standard SQL clients, ORMs, and database tools. The project covers broad capabilities in
This is a multi-model database system that supports SQL, document storage, and temporal queries, making it a capable tool for managing structured and unstructured data with a focus on immutability and auditability.
GoData ReplicationDatabase ReplicationRelational
GitHub पर देखें8,982
snapchat/keydb
Snapchat/KeyDB
12,487GitHub पर देखें
KeyDB is a multithreaded in-memory key-value store and distributed cache. It functions as a NoSQL database utilizing multi-version concurrency control to execute non-blocking queries and scans. The project is a multithreaded fork of Redis that maintains protocol compatibility while utilizing a multithreaded architecture to scale across multi-core hardware. It distinguishes itself with flash-tiered storage, allowing the system to offload data from primary RAM to SSD or flash storage to increase total capacity. The system supports high availability through active-active mesh replication and mu
KeyDB is a high-performance, multithreaded NoSQL key-value store that provides robust clustering and high availability, though it lacks native SQL support and specialized vector search capabilities.
C++Data ReplicationDatabase ReplicationNoSQL Databases
GitHub पर देखें12,487
tursodatabase/libsql
tursodatabase/libsql
16,887GitHub पर देखें
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
LibSQL is a distributed SQL database engine that extends SQLite with vector search and remote synchronization capabilities, making it a capable choice for edge and serverless environments despite its primary identity as an embedded library rather than a standalone server-based DBMS.
CData ReplicationVector SearchVector Databases
GitHub पर देखें16,887
rqlite/rqlite
rqlite/rqlite
17,586GitHub पर देखें
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
This is a distributed relational database that provides high availability and SQL support by clustering SQLite, though it lacks native NoSQL, vector search, or time-series specific optimizations.
GoData ReplicationHigh Availability Configurations
GitHub पर देखें17,586
rethinkdb/rethinkdb
rethinkdb/rethinkdb
26,996GitHub पर देखें
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
RethinkDB is a distributed, document-oriented database that provides high availability and clustering for JSON data, though it lacks native SQL support and specialized vector search capabilities.
C++Document DatabasesHigh Availability Configurations
GitHub पर देखें26,996
redis/redis
redis/redis
74,906GitHub पर देखें
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
Redis is a high-performance, self-hostable NoSQL database that supports vector search, clustering, and time-series data, though it lacks native SQL support.
CHigh-Availability ConfigurationsVector SearchVector Databases
GitHub पर देखें74,906
taosdata/tdengine
taosdata/TDengine
24,734GitHub पर देखें
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
TDengine is a specialized, self-hostable distributed database that provides SQL-compatible querying for time-series and IoT data, though it is optimized for high-speed metrics rather than general-purpose document or vector storage.
CTime Series Databases
GitHub पर देखें24,734
pubkey/rxdb
pubkey/rxdb
23,048GitHub पर देखें
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
This is a reactive, local-first NoSQL database engine designed for client-side and cross-environment synchronization, which fits the category of a database management system despite its primary focus on browser and application-state persistence rather than traditional server-side clustering.
TypeScriptData ReplicationDatabase ReplicationDocument Stores
GitHub पर देखें23,048
duckdb/duckdb
duckdb/duckdb
38,805GitHub पर देखें
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
DuckDB is a high-performance, in-process analytical database that provides robust SQL support and efficient data processing, though it is designed as an embedded engine rather than a traditional client-server system for high-availability clustering.
C++SQL Engines
GitHub पर देखें38,805
lancedb/lancedb
lancedb/lancedb
9,031GitHub पर देखें
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is a specialized vector database and columnar store that provides efficient indexing and retrieval for high-dimensional data, though it lacks the broad general-purpose SQL or time-series features of a traditional relational database management system.
HTMLHigh Availability ArchitecturesVector SearchVector Databases
GitHub पर देखें9,031
alibaba/alisql
alibaba/AliSQL
5,706GitHub पर देखें
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
AliSQL is a high-performance fork of MySQL that provides a robust relational database management system with added support for vector search, JSON document storage, and enterprise-grade clustering features.
C++Data ReplicationVector SearchRaw SQL Execution
GitHub पर देखें5,706
cockroachdb/cockroach
cockroachdb/cockroach
32,207GitHub पर देखें
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
CockroachDB is a robust, distributed SQL database that excels at horizontal scaling and high availability, though it focuses primarily on relational data rather than providing native NoSQL or specialized time-series storage engines.
GoDistributed Relational DatabasesDistributed SQL DatabasesDistributed SQL Engines
GitHub पर देखें32,207
tporadowski/redis
tporadowski/redis
9,987GitHub पर देखें
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Redis is a high-performance, self-hostable NoSQL database that supports vector search, clustering, and high availability, though it lacks native SQL support.
CHigh Availability ArchitecturesVector DatabasesNoSQL Databases
GitHub पर देखें9,987
tursodatabase/turso
tursodatabase/turso
21,655GitHub पर देखें
Turso is a distributed SQL database platform that provides managed, edge-hosted SQLite instances. It functions as a serverless database provider, enabling the deployment of relational databases that synchronize data across multiple geographic regions to support high availability and performance. The platform distinguishes itself by utilizing a fork of SQLite as its core storage engine, which supports both local file storage and remote network-based replication. It employs an edge-optimized proxy to route queries through a global network, minimizing latency by connecting users to the nearest d
Turso is a distributed SQL database built on SQLite that provides high availability and edge-optimized performance, though it is primarily a managed platform rather than a traditional self-hosted database server.
RustSQLite DatabasesDistributed SQL DatabasesEdge Databases
GitHub पर देखें21,655
greptimeteam/greptimedb
GreptimeTeam/greptimedb
5,968GitHub पर देखें
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
GreptimeDB is a distributed, cloud-native database specifically optimized for time-series data and observability, offering SQL support and high availability while functioning as a powerful, self-hostable engine for structured and semi-structured metrics, logs, and traces.
RustTime Series Databases
GitHub पर देखें5,968
electric-sql/pglite
electric-sql/pglite
14,707GitHub पर देखें
Pglite is a client-side relational database engine that runs a full-featured PostgreSQL instance directly within browser and Node.js environments. By leveraging WebAssembly, it provides a persistent SQL storage solution that enables complex data management and querying without requiring an external database server. The project distinguishes itself through a reactive SQL data layer that automatically synchronizes user interface components with live query results. It manages database operations using worker threads to prevent main-thread blocking and coordinates access across multiple browser t
Pglite is a full-featured PostgreSQL instance running in WebAssembly, providing a robust SQL-based database engine for local-first applications, though it lacks the clustering and time-series features required for large-scale server-side deployments.
TypeScriptBrowser DatabasesClient-Side DatabasesIn-Browser Database Engines
GitHub पर देखें14,707

डेटाबेस

mongodb/mongo

manticoresoftware/manticoresearch

scylladb/scylladb

yugabyte/yugabyte-db

pingcap/tidb

MariaDB/server

chroma-core/chroma

databendlabs/databend

questdb/questdb

codenotary/immudb

Snapchat/KeyDB

tursodatabase/libsql

rqlite/rqlite

rethinkdb/rethinkdb

redis/redis

taosdata/TDengine

pubkey/rxdb

duckdb/duckdb

lancedb/lancedb

alibaba/AliSQL

cockroachdb/cockroach

tporadowski/redis

tursodatabase/turso

GreptimeTeam/greptimedb

electric-sql/pglite