What are the best open-source GitHub repositories for databases & data?

scylladb/scylladb is the closest match — ScyllaDB is a high-performance, distributed NoSQL database that natively supports vector search, time-series data, and self-hosted cluster deployments, making it a comprehensive solution for modern application data storage.. Other strong matches: mongodb/mongo, yugabyte/yugabyte-db, cockroachdb/cockroach, oceanbase/oceanbase.

Why does scylladb/scylladb match “databases & data”?

ScyllaDB is a high-performance, distributed NoSQL database that natively supports vector search, time-series data, and self-hosted cluster deployments, making it a comprehensive solution for modern application data storage.

Why does mongodb/mongo match “databases & data”?

MongoDB is a distributed, document-oriented database system that natively supports vector search, horizontal scaling, and self-hosting, making it a comprehensive solution for modern application data storage.

Why does yugabyte/yugabyte-db match “databases & data”?

YugabyteDB is a distributed, self-hostable SQL database that natively supports vector search and horizontal scaling, making it a comprehensive solution for modern application development.

Why does cockroachdb/cockroach match “databases & data”?

CockroachDB is a distributed, self-hostable SQL database that provides horizontal scalability and ACID compliance, making it a robust choice for modern application development.

Why does oceanbase/oceanbase match “databases & data”?

OceanBase is a distributed SQL database that natively supports vector search and hybrid transactional/analytical processing, making it a comprehensive solution for modern application data storage.

Bases de datos y datos

Explora sistemas de gestión de bases de datos, frameworks de procesamiento de datos y soluciones de almacenamiento para arquitecturas de software modernas.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

scylladb/scylladb
scylladb/scylladb
15,355Ver en GitHub
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
ScyllaDB is a high-performance, distributed NoSQL database that natively supports vector search, time-series data, and self-hosted cluster deployments, making it a comprehensive solution for modern application data storage.
C++Distributed DatabasesTime SeriesVector Databases
Ver en GitHub15,355
mongodb/mongo
mongodb/mongo
28,158Ver en GitHub
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
MongoDB is a distributed, document-oriented database system that natively supports vector search, horizontal scaling, and self-hosting, making it a comprehensive solution for modern application data storage.
C++Distributed DatabasesDistributed DatabasesDocument Stores
Ver en GitHub28,158
yugabyte/yugabyte-db
yugabyte/yugabyte-db
10,349Ver en GitHub
YugabyteDB is a distributed SQL database and relational data store designed for horizontal scalability and high availability across multiple nodes or regions. It functions as a cloud-native system that ensures continuous availability and supports PostgreSQL compatible query languages and drivers. The system includes specialized capabilities as a vector database for AI, utilizing high-dimensional indexing to perform similarity searches. It is engineered as a multi-region cloud database that synchronizes data across different geographic locations to maintain global availability. The project co
YugabyteDB is a distributed, self-hostable SQL database that natively supports vector search and horizontal scaling, making it a comprehensive solution for modern application development.
CVector Similarity SearchDistributed Relational DatabasesDistributed SQL Databases
Ver en GitHub10,349
cockroachdb/cockroach
cockroachdb/cockroach
32,207Ver en GitHub
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
CockroachDB is a distributed, self-hostable SQL database that provides horizontal scalability and ACID compliance, making it a robust choice for modern application development.
GoDistributed Relational DatabasesDistributed SQL Databases
Ver en GitHub32,207
oceanbase/oceanbase
oceanbase/oceanbase
9,980Ver en GitHub
OceanBase is a distributed SQL database designed for high availability and strong consistency across multiple nodes and regions. It functions as a hybrid transactional and analytical processing engine, allowing real-time analytics and transactions to execute on a single data copy. The system also serves as a vector database engine for indexing and querying vector data to power semantic search and recommendation systems. The platform features native compatibility layers for MySQL and Oracle, enabling the migration of legacy workloads without rewriting SQL code. It utilizes a Paxos-based distri
OceanBase is a distributed SQL database that natively supports vector search and hybrid transactional/analytical processing, making it a comprehensive solution for modern application data storage.
C++Vector SearchDistributed SQL DatabasesVector Databases
Ver en GitHub9,980
pingcap/tidb
pingcap/tidb
40,166Ver en GitHub
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
TiDB is a distributed, cloud-native SQL database that supports horizontal scaling, transactional and analytical processing, and integrated vector search, making it a comprehensive solution for modern application development.
GoDistributed DatabasesDistributed SQL DatabasesVector Databases
Ver en GitHub40,166
tursodatabase/libsql
tursodatabase/libsql
16,887Ver en GitHub
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
LibSQL is a distributed, SQLite-compatible SQL database engine that supports vector search and edge-native synchronization, making it a powerful storage solution for modern application development.
CDistributed DatabasesVector SearchVector Similarity Search
Ver en GitHub16,887
chroma-core/chroma
chroma-core/chroma
26,198Ver en GitHub
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Chroma is a specialized vector database that provides robust document storage and retrieval capabilities, making it a highly effective solution for AI-driven application development despite lacking traditional SQL or time-series features.
RustDocument StoresVector SearchVector Databases
Ver en GitHub26,198
mariadb/server
MariaDB/server
7,196Ver en GitHub
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
MariaDB is a robust, self-hostable relational database management system that provides comprehensive SQL support and has expanded its capabilities to include vector search, though it lacks native NoSQL document store and time-series specific engines compared to specialized multi-model databases.
C++Vector SearchVector Similarity SearchRelational
Ver en GitHub7,196
surrealdb/surrealdb
surrealdb/surrealdb
32,397Ver en GitHub
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
SurrealDB is a multi-model database that natively supports SQL, document storage, vector search, and distributed deployment, making it a comprehensive solution for modern application development.
RustDistributed Databases
Ver en GitHub32,397
opensearch-project/opensearch
opensearch-project/OpenSearch
13,196Ver en GitHub
OpenSearch is a distributed search and analytics engine designed for indexing, searching, and analyzing massive volumes of structured and unstructured data in real time. It functions as a comprehensive platform that integrates enterprise-grade search capabilities, a vector database for high-dimensional similarity lookups, and a unified observability suite for monitoring logs, metrics, and traces across complex distributed environments. The platform distinguishes itself through its support for agentic workflow automation, allowing users to orchestrate multi-agent tasks and integrate foundation
OpenSearch is a distributed search and analytics engine that functions as a powerful NoSQL document store and vector database, making it a highly capable data storage solution for modern application development.
JavaVector SearchVector Similarity SearchVector Databases
Ver en GitHub13,196
dolthub/dolt
dolthub/dolt
23,592Ver en GitHub
Dolt is a relational database engine that integrates version control directly into the database management layer. It functions as a version-controlled SQL database that tracks every row and schema change using a commit-based history, allowing users to branch, merge, and audit data modifications. By implementing a wire-protocol-compatible server, the system enables standard SQL clients and tools to interact with versioned data as if they were connecting to a traditional relational database. The platform distinguishes itself by applying repository-style workflows to data management, including s
Dolt is a relational SQL database engine that provides unique version control capabilities, making it a robust choice for application development that requires data auditing and branching.
GoDistributed DatabasesDistributed SQL DatabasesRelational Database Engines
Ver en GitHub23,592
louischatriot/nedb
louischatriot/nedb
13,540Ver en GitHub
NeDB is a JavaScript embedded NoSQL document store designed for Node.js and the browser. It functions as an in-memory data store with the option to persist documents to a local file system, ensuring data survives application restarts. The project utilizes a MongoDB-compatible API to perform data operations, allowing it to serve as a lightweight document indexing system and a persistent file database without requiring a separate database server. Capabilities include querying, inserting, updating, and deleting documents, as well as the ability to create indexes on specific fields to accelerate
NeDB is a lightweight, embedded NoSQL document store for Node.js and browser environments that provides a simple, serverless way to manage persistent JSON data within an application.
JavaScriptDocument StoresJSON Document StoresNoSQL Databases
Ver en GitHub13,540
pubkey/rxdb
pubkey/rxdb
23,048Ver en GitHub
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
This is a reactive, client-side NoSQL database engine that excels at local-first data synchronization and state management, though it functions as an embedded library for applications rather than a standalone server-side database management system.
TypeScriptDocument StoresJSON Document StoresVector Similarity Search
Ver en GitHub23,048
manticoresoftware/manticoresearch
manticoresoftware/manticoresearch
11,819Ver en GitHub
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Manticore Search is a high-performance database and search engine that supports SQL, vector search, and distributed architecture, making it a robust choice for application data storage despite lacking native time-series specific features.
C++Vector SearchVector Similarity SearchVector Databases
Ver en GitHub11,819
taosdata/tdengine
taosdata/TDengine
24,734Ver en GitHub
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
TDengine is a distributed, SQL-compatible database specifically optimized for time-series and IoT data, making it a robust choice for applications requiring high-throughput ingestion and storage of timestamped metrics.
CDistributed DatabasesTime Series Databases
Ver en GitHub24,734
databendlabs/databend
databendlabs/databend
9,351Ver en GitHub
Databend is a cloud-native data warehouse and OLAP database designed for large-scale analytics. It functions as a SQL-compliant engine and serverless analytics platform that separates compute from storage to allow for independent scaling. The system integrates vector database capabilities, indexing high-dimensional embeddings to enable semantic, hybrid, and full-text searches across massive datasets. It further distinguishes itself through serverless compute management that automatically scales resources based on demand and shuts them down during idle periods. The platform covers a broad set
Databend is a cloud-native OLAP database and data warehouse that provides SQL support and vector search capabilities, making it a powerful tool for large-scale analytical data storage.
RustVector SearchVector Similarity SearchVector Databases
Ver en GitHub9,351
redis/redis
redis/redis
74,906Ver en GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
Redis is a high-performance, self-hostable, distributed key-value store that supports NoSQL document storage, vector search, and time-series data, though it lacks native SQL support.
CVector SearchActive-Active Database ClustersVector Databases
Ver en GitHub74,906
rqlite/rqlite
rqlite/rqlite
17,586Ver en GitHub
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
This is a distributed relational database that provides SQL support and high availability through SQLite replication, making it a robust self-hostable storage solution for application development.
GoDistributed Relational Databases
Ver en GitHub17,586
alibaba/alisql
alibaba/AliSQL
5,706Ver en GitHub
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
AliSQL is a high-performance, enterprise-grade fork of MySQL that provides robust SQL support, vector search, and document store capabilities, making it a powerful self-hostable database solution for application development.
C++JSON Document StorageRaw SQL ExecutionVector Search
Ver en GitHub5,706
lancedb/lancedb
lancedb/lancedb
9,031Ver en GitHub
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is a specialized vector database and columnar store that provides high-performance embedding retrieval and SQL-like filtering, making it a capable data storage solution for AI-driven application development.
HTMLVector SearchVector Similarity SearchVector Databases
Ver en GitHub9,031
tporadowski/redis
tporadowski/redis
9,987Ver en GitHub
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Redis is a high-performance, distributed NoSQL database that supports vector search and self-hosting, though it lacks native SQL support and is primarily optimized for in-memory key-value storage rather than traditional relational data management.
CJSON Document StorageVector Similarity SearchActive-Active Database Clusters
Ver en GitHub9,987
memvid/memvid
memvid/memvid
15,679Ver en GitHub
Memvid is an embedded memory framework designed to provide persistent, versioned context for intelligent agents. It functions as a local vector database library that stores all data within a single binary file, removing the need for external database infrastructure or network dependencies. The system distinguishes itself by integrating in-process vector indexing with append-only versioning, allowing for high-speed semantic similarity searches alongside the ability to track and roll back state changes over time. It includes built-in transparent data encryption and masking to secure sensitive i
This is an embedded vector database library designed for in-process agent memory rather than a general-purpose database management system for broader application development.
RustVector SearchVector DatabasesVector Search Engines
Ver en GitHub15,679
rethinkdb/rethinkdb
rethinkdb/rethinkdb
26,996Ver en GitHub
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
RethinkDB is a distributed, self-hostable document-oriented database that excels at real-time data streaming, though it lacks native SQL support and specialized vector search capabilities.
C++Document DatabasesChange Data CaptureQuery Builders
Ver en GitHub26,996
duckdb/duckdb
duckdb/duckdb
38,805Ver en GitHub
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
DuckDB is a high-performance, in-process analytical database engine that provides robust SQL support for application development, though it is designed as an embedded library rather than a distributed, multi-model server.
C++Analytical DatabasesColumnar EnginesEmbedded Databases
Ver en GitHub38,805
electric-sql/pglite
electric-sql/pglite
14,707Ver en GitHub
Pglite is a client-side relational database engine that runs a full-featured PostgreSQL instance directly within browser and Node.js environments. By leveraging WebAssembly, it provides a persistent SQL storage solution that enables complex data management and querying without requiring an external database server. The project distinguishes itself through a reactive SQL data layer that automatically synchronizes user interface components with live query results. It manages database operations using worker threads to prevent main-thread blocking and coordinates access across multiple browser t
Pglite is a full-featured PostgreSQL instance running in WebAssembly, providing a robust SQL-based storage solution for local-first and browser-based application development.
TypeScriptBrowser DatabasesClient-Side DatabasesIn-Browser Database Engines
Ver en GitHub14,707

Bases de datos y datos

scylladb/scylladb

mongodb/mongo

yugabyte/yugabyte-db

cockroachdb/cockroach

oceanbase/oceanbase

pingcap/tidb

tursodatabase/libsql

chroma-core/chroma

MariaDB/server

surrealdb/surrealdb

opensearch-project/OpenSearch

dolthub/dolt

louischatriot/nedb

pubkey/rxdb

manticoresoftware/manticoresearch

taosdata/TDengine

databendlabs/databend

redis/redis

rqlite/rqlite

alibaba/AliSQL

lancedb/lancedb

tporadowski/redis

memvid/memvid

rethinkdb/rethinkdb

duckdb/duckdb

electric-sql/pglite