High-performance database systems designed for storing, indexing, and querying high-dimensional vector embeddings for machine learning.
This project is a high-performance library designed for the similarity search and clustering of dense vectors across massive datasets. It functions as a vector similarity search engine, providing the necessary tools to organize complex numerical data into specialized structures that facilitate rapid retrieval and efficient querying of millions of records. The library distinguishes itself through a variety of advanced indexing and compression techniques, including hierarchical navigable small worlds for logarithmic time complexity and inverted file indexing to partition vector spaces into mana
This is a high-performance library for similarity search and indexing that provides the core engine capabilities required for vector databases, though it functions as a foundational component rather than a full-featured, standalone database server with built-in CRUD APIs.
Milvus is a specialized vector database engine designed for the indexing, management, and high-speed similarity retrieval of high-dimensional vector embeddings. It functions as a similarity search engine capable of identifying nearest neighbors within large-scale vector spaces, supporting the storage and retrieval of billions of data points while maintaining consistent performance. The system utilizes a distributed architecture that decouples storage, query, and coordination into independent services, allowing for horizontal scaling across clusters. It employs a global indexing mechanism that
Milvus is a purpose-built, distributed vector database engine that provides comprehensive support for high-dimensional indexing, similarity search, and scalable storage, making it a flagship solution for managing large-scale embedding data.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is a purpose-built vector database that provides high-dimensional indexing, similarity search, and embedding integration, making it a comprehensive solution for managing and querying vector data at scale.
USearch is a high-performance vector similarity search engine and approximate nearest neighbor index designed for dense embeddings. It functions as a low-level vector database core and high-dimensional vector indexer, providing the primitives necessary to store and retrieve vectors across massive datasets. The engine distinguishes itself through hardware-level SIMD acceleration for distance kernels and a proximity-graph indexing system that enables fast retrieval across billions of vectors. It supports multi-precision vector quantization to balance memory usage and accuracy, and utilizes memo
USearch is a high-performance vector search engine that provides the core indexing and similarity search primitives required for a vector database, though it functions more as a specialized library for embedding management than a full-featured, standalone database server with built-in API networking.
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Chroma is a purpose-built vector database that provides high-dimensional indexing, embedding model integration, and hybrid search capabilities, making it a comprehensive solution for managing and querying vector embeddings.
Cozo is a logic-based database engine that functions as a relational data store, an embedded graph database, and a temporal vector database. It utilizes a Datalog-inspired query language to execute relational, recursive, and graph queries. The system distinguishes itself through specialized indexing for high-dimensional vector similarity searches and near-duplicate detection using locality sensitive hashing. It also provides built-in temporal versioning, allowing for historical state retrieval and time-travel queries to access data as it existed at specific points in time. Its broader capabi
Cozo is a multi-model database engine that includes native support for high-dimensional vector similarity search and indexing, making it a capable choice for vector-heavy workloads despite its broader focus on relational and graph data.
TurboVec is a high-performance Rust vector database and quantized search index designed for storing and retrieving high-dimensional embeddings. It functions as a pluggable vector store for large language model orchestration frameworks, providing a memory-efficient alternative to standard in-memory storage. The project distinguishes itself through a high-dimensional vector compressor that utilizes random rotation and data-oblivious scalar quantization to reduce memory footprints. Retrieval is accelerated via SIMD kernels that process distance calculations and search operations for increased th
TurboVec is a specialized vector database engine that provides high-performance similarity search and memory-efficient quantization, though it is primarily designed as a pluggable store for orchestration frameworks rather than a full-scale distributed database system.
Vector similarity search extension for PostgreSQL.
This is a specialized extension that transforms PostgreSQL into a vector database, providing robust similarity search and indexing capabilities while allowing you to combine vector operations with traditional relational data.
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
MariaDB is a mature relational database that has integrated vector storage and similarity search capabilities, making it a viable option for users who need to combine traditional SQL data management with vector search functionality.
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
This is a general-purpose document database that has evolved to include native vector indexing and similarity search capabilities, making it a viable option for managing high-dimensional embeddings alongside traditional data.
Hub is a multimodal AI data lake and vector database designed for storing and querying embeddings, text, audio, and images. It functions as a dataset version control system and a machine learning data streaming engine to support large-scale model training. The system utilizes a serverless PostgreSQL vector store to index high-dimensional embeddings for semantic search. It provides a visual interface for inspecting multimodal datasets and viewing annotations such as bounding boxes and masks. The platform handles cloud-agnostic storage synchronization and implements lazy, compressed data strea
This repository functions as a multimodal data lake and vector database that supports high-dimensional indexing and similarity search, making it a specialized tool for managing AI-ready datasets and embeddings.
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
PostgresML functions as a relational vector database by extending PostgreSQL to support high-dimensional indexing, vector similarity search, and integrated embedding generation, making it a capable tool for managing vector data alongside traditional relational records.
Neo4j is a native graph database management system designed to store and query highly connected data using a property-graph model. It provides an ACID-compliant transaction engine that ensures data integrity, supported by a distributed cluster architecture that maintains causal consistency across nodes. Users interact with the system through a declarative query language, which allows for complex pattern matching and path traversal without requiring manual traversal logic. The platform distinguishes itself through its hybrid approach to data retrieval, combining traditional graph-based queries
Neo4j is a graph database that natively supports high-dimensional vector indexing and similarity search, allowing you to combine relational graph queries with vector-based retrieval in a single system.
Redisson is a Java library and Redis client that functions as a distributed Java object mapper, caching provider, and locking framework. It maps Java collections and concurrency primitives to distributed implementations backed by Redis and Valkey, providing synchronous, asynchronous, and reactive APIs for interacting with these data stores. The project distinguishes itself by providing a comprehensive suite of distributed coordination tools, including a locking framework for managing semaphores and countdown latches across multiple application nodes. It also serves as a distributed messaging
Redisson is a Java client and framework that provides an interface for Redis-based vector similarity search, making it a functional tool for interacting with vector data even though it acts as a client library rather than a standalone database engine.
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
ScyllaDB is a high-performance distributed NoSQL database that has integrated native support for vector similarity search and high-dimensional indexing, making it a capable engine for vector-based workloads alongside its primary storage functions.
zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ
This is a specialized embedded vector database engine that provides high-dimensional indexing, similarity search, and embedding model integration, making it a direct fit for your requirements.
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
Redis is a versatile, high-performance in-memory data platform that natively supports vector similarity search and high-dimensional indexing, making it a capable solution for vector database requirements.
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
AliSQL is a relational database that includes support for vector similarity search via HNSW indexing, making it a viable option for users who need vector capabilities within a traditional SQL environment.
Annoy is a C++ library designed for approximate nearest neighbor search in high-dimensional vector spaces. It functions as a vector similarity search engine that constructs static, disk-based data structures to facilitate fast lookups. By mapping identifiers to vector data and persisting these structures to disk, the library enables efficient, memory-mapped access to large datasets. The project distinguishes itself through the use of random projection trees and distance-metric-based partitioning, which organize data into hierarchical binary trees to balance search precision against computatio
This is a specialized library for approximate nearest neighbor search and indexing rather than a full-featured vector database, as it lacks built-in CRUD operations, native API services, and distributed storage management.
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Redis is a high-performance key-value store that includes robust vector similarity search and indexing capabilities, making it a capable, multi-purpose solution for managing high-dimensional embeddings alongside traditional data.
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
Meilisearch is a full-text search engine that has added support for vector-based semantic search, making it a capable tool for similarity search even though its primary focus remains on traditional document-based retrieval.
Typesense is a distributed search engine designed to provide sub-millisecond query latency across massive datasets. It functions as both a high-performance indexing and retrieval engine and a comprehensive search experience platform, offering built-in typo tolerance and tools for managing relevance through synonym configuration, result curation, and complex filtering. The platform distinguishes itself by utilizing in-memory indexing to maintain high-throughput data retrieval and integrating vector database capabilities to support semantic similarity searches. It ensures data consistency and h
Typesense is a distributed search engine that natively integrates vector similarity search and high-dimensional indexing alongside its traditional full-text capabilities, making it a capable choice for applications requiring both keyword and semantic retrieval.
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
SurrealDB is a multi-model database that natively supports vector storage and similarity search alongside relational and graph capabilities, making it a capable choice for applications requiring integrated vector operations.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
ClickHouse is a high-performance analytical database that includes native support for vector similarity search and high-dimensional indexing, making it a capable engine for vector-based workloads despite its broader focus on general-purpose OLAP.
Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries. What distinguishes Dragonfly is its focus on effic
Dragonfly is a high-performance, multi-model in-memory data store that includes native support for vector similarity search and indexing, making it a capable engine for vector-based workloads despite its broader focus as a general-purpose database.