The visitor is looking for a specialized database engine designed to store, index, and perform similarity searches on high-dimensional vector embeddings.

facebookresearch/faiss is the closest match — This is a high-performance library for similarity search and indexing that provides the core engine capabilities required for vector databases, though it functions as a foundational component rather than a full-featured, standalone database server with built-in CRUD APIs.. Other strong matches: milvus-io/milvus, lancedb/lancedb, unum-cloud/usearch, chroma-core/chroma.

Why does facebookresearch/faiss match “a vector database for storing machine embeddings”?

This is a high-performance library for similarity search and indexing that provides the core engine capabilities required for vector databases, though it functions as a foundational component rather than a full-featured, standalone database server with built-in CRUD APIs.

Why does milvus-io/milvus match “a vector database for storing machine embeddings”?

Milvus is a purpose-built, distributed vector database engine that provides comprehensive support for high-dimensional indexing, similarity search, and scalable storage, making it a flagship solution for managing large-scale embedding data.

Why does lancedb/lancedb match “a vector database for storing machine embeddings”?

LanceDB is a purpose-built vector database that provides high-dimensional indexing, similarity search, and embedding integration, making it a comprehensive solution for managing and querying vector data at scale.

Why does unum-cloud/usearch match “a vector database for storing machine embeddings”?

USearch is a high-performance vector search engine that provides the core indexing and similarity search primitives required for a vector database, though it functions more as a specialized library for embedding management than a full-featured, standalone database server with built-in API networkin…

Why does chroma-core/chroma match “a vector database for storing machine embeddings”?

Chroma is a purpose-built vector database that provides high-dimensional indexing, embedding model integration, and hybrid search capabilities, making it a comprehensive solution for managing and querying vector embeddings.

Vector Databases for Machine Embeddings

High-performance database systems designed for storing, indexing, and querying high-dimensional vector embeddings for machine learning.

Find the best repos with AI.We'll search the best matching repositories with AI.

facebookresearch/faiss
facebookresearch/faiss
40,302View on GitHub
This project is a high-performance library designed for the similarity search and clustering of dense vectors across massive datasets. It functions as a vector similarity search engine, providing the necessary tools to organize complex numerical data into specialized structures that facilitate rapid retrieval and efficient querying of millions of records. The library distinguishes itself through a variety of advanced indexing and compression techniques, including hierarchical navigable small worlds for logarithmic time complexity and inverted file indexing to partition vector spaces into mana
This is a high-performance library for similarity search and indexing that provides the core engine capabilities required for vector databases, though it functions as a foundational component rather than a full-featured, standalone database server with built-in CRUD APIs.
C++Vector IndicesVector Similarity SearchVector Search Engines
View on GitHub40,302
milvus-io/milvus
milvus-io/milvus
44,804View on GitHub
Milvus is a specialized vector database engine designed for the indexing, management, and high-speed similarity retrieval of high-dimensional vector embeddings. It functions as a similarity search engine capable of identifying nearest neighbors within large-scale vector spaces, supporting the storage and retrieval of billions of data points while maintaining consistent performance. The system utilizes a distributed architecture that decouples storage, query, and coordination into independent services, allowing for horizontal scaling across clusters. It employs a global indexing mechanism that
Milvus is a purpose-built, distributed vector database engine that provides comprehensive support for high-dimensional indexing, similarity search, and scalable storage, making it a flagship solution for managing large-scale embedding data.
GoSimilarity Search EnginesVector Search Engines
View on GitHub44,804
lancedb/lancedb
lancedb/lancedb
9,031View on GitHub
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is a purpose-built vector database that provides high-dimensional indexing, similarity search, and embedding integration, making it a comprehensive solution for managing and querying vector data at scale.
HTMLData Upsert OperationsEmbedding ModelsVector Indexing
View on GitHub9,031
unum-cloud/usearch
unum-cloud/USearch
3,888View on GitHub
USearch is a high-performance vector similarity search engine and approximate nearest neighbor index designed for dense embeddings. It functions as a low-level vector database core and high-dimensional vector indexer, providing the primitives necessary to store and retrieve vectors across massive datasets. The engine distinguishes itself through hardware-level SIMD acceleration for distance kernels and a proximity-graph indexing system that enables fast retrieval across billions of vectors. It supports multi-precision vector quantization to balance memory usage and accuracy, and utilizes memo
USearch is a high-performance vector search engine that provides the core indexing and similarity search primitives required for a vector database, though it functions more as a specialized library for embedding management than a full-featured, standalone database server with built-in API networking.
C++Vector IndexingVector Similarity SearchSimilarity Search Engines
View on GitHub3,888
chroma-core/chroma
chroma-core/chroma
26,198View on GitHub
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Chroma is a purpose-built vector database that provides high-dimensional indexing, embedding model integration, and hybrid search capabilities, making it a comprehensive solution for managing and querying vector embeddings.
RustVector Indexing
View on GitHub26,198
cozodb/cozo
cozodb/cozo
3,880View on GitHub
Cozo is a logic-based database engine that functions as a relational data store, an embedded graph database, and a temporal vector database. It utilizes a Datalog-inspired query language to execute relational, recursive, and graph queries. The system distinguishes itself through specialized indexing for high-dimensional vector similarity searches and near-duplicate detection using locality sensitive hashing. It also provides built-in temporal versioning, allowing for historical state retrieval and time-travel queries to access data as it existed at specific points in time. Its broader capabi
Cozo is a multi-model database engine that includes native support for high-dimensional vector similarity search and indexing, making it a capable choice for vector-heavy workloads despite its broader focus on relational and graph data.
RustData Upsert OperationsVector Search IndexesVector Similarity Search
View on GitHub3,880
ryancodrai/turbovec
RyanCodrai/turbovec
11,738View on GitHub
TurboVec is a high-performance Rust vector database and quantized search index designed for storing and retrieving high-dimensional embeddings. It functions as a pluggable vector store for large language model orchestration frameworks, providing a memory-efficient alternative to standard in-memory storage. The project distinguishes itself through a high-dimensional vector compressor that utilizes random rotation and data-oblivious scalar quantization to reduce memory footprints. Retrieval is accelerated via SIMD kernels that process distance calculations and search operations for increased th
TurboVec is a specialized vector database engine that provides high-performance similarity search and memory-efficient quantization, though it is primarily designed as a pluggable store for orchestration frameworks rather than a full-scale distributed database system.
PythonVector Embedding IndexesVector Search Indexes
View on GitHub11,738
pgvector/pgvector
pgvector/pgvector
21,787View on GitHub
Vector similarity search extension for PostgreSQL.
This is a specialized extension that transforms PostgreSQL into a vector database, providing robust similarity search and indexing capabilities while allowing you to combine vector operations with traditional relational data.
CVector IndexingVector Similarity Search
View on GitHub21,787
mariadb/server
MariaDB/server
7,196View on GitHub
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
MariaDB is a mature relational database that has integrated vector storage and similarity search capabilities, making it a viable option for users who need to combine traditional SQL data management with vector search functionality.
C++Vector Embedding IndexesVector Similarity Search
View on GitHub7,196
mongodb/mongo
mongodb/mongo
28,158View on GitHub
This project is a distributed, document-oriented database system designed to store information in flexible, hierarchical structures. It supports horizontal scaling through automated sharding and maintains high availability across global clusters using a multi-node replication protocol. By executing multi-document operations as atomic units, the system ensures data integrity and consistency across distributed environments. The platform distinguishes itself by integrating advanced vector-based indexing, which enables semantic similarity searches alongside traditional geospatial and lexical quer
This is a general-purpose document database that has evolved to include native vector indexing and similarity search capabilities, making it a viable option for managing high-dimensional embeddings alongside traditional data.
C++Horizontal Database ScalingVector IndexingSharding Strategies
View on GitHub28,158
activeloopai/hub
activeloopai/Hub
9,177View on GitHub
Hub is a multimodal AI data lake and vector database designed for storing and querying embeddings, text, audio, and images. It functions as a dataset version control system and a machine learning data streaming engine to support large-scale model training. The system utilizes a serverless PostgreSQL vector store to index high-dimensional embeddings for semantic search. It provides a visual interface for inspecting multimodal datasets and viewing annotations such as bounding boxes and masks. The platform handles cloud-agnostic storage synchronization and implements lazy, compressed data strea
This repository functions as a multimodal data lake and vector database that supports high-dimensional indexing and similarity search, making it a specialized tool for managing AI-ready datasets and embeddings.
C++Vector Embedding IndexesVector Indexing
View on GitHub9,177
postgresml/postgresml
postgresml/postgresml
6,801View on GitHub
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
PostgresML functions as a relational vector database by extending PostgreSQL to support high-dimensional indexing, vector similarity search, and integrated embedding generation, making it a capable tool for managing vector data alongside traditional relational records.
RustVector IndexingVector Similarity Search
View on GitHub6,801
neo4j/neo4j
neo4j/neo4j
15,928View on GitHub
Neo4j is a native graph database management system designed to store and query highly connected data using a property-graph model. It provides an ACID-compliant transaction engine that ensures data integrity, supported by a distributed cluster architecture that maintains causal consistency across nodes. Users interact with the system through a declarative query language, which allows for complex pattern matching and path traversal without requiring manual traversal logic. The platform distinguishes itself through its hybrid approach to data retrieval, combining traditional graph-based queries
Neo4j is a graph database that natively supports high-dimensional vector indexing and similarity search, allowing you to combine relational graph queries with vector-based retrieval in a single system.
JavaVector Embedding IndexesVector IndexingVector Similarity Search
View on GitHub15,928
redisson/redisson
redisson/redisson
24,355View on GitHub
Redisson is a Java library and Redis client that functions as a distributed Java object mapper, caching provider, and locking framework. It maps Java collections and concurrency primitives to distributed implementations backed by Redis and Valkey, providing synchronous, asynchronous, and reactive APIs for interacting with these data stores. The project distinguishes itself by providing a comprehensive suite of distributed coordination tools, including a locking framework for managing semaphores and countdown latches across multiple application nodes. It also serves as a distributed messaging
Redisson is a Java client and framework that provides an interface for Redis-based vector similarity search, making it a functional tool for interacting with vector data even though it acts as a client library rather than a standalone database engine.
JavaVector Embedding IndexesVector Similarity Search
View on GitHub24,355
scylladb/scylladb
scylladb/scylladb
15,355View on GitHub
ScyllaDB is a distributed NoSQL database engine designed for high-throughput data storage and low-latency performance at scale. It functions as a shard-aware platform that manages large-scale datasets across distributed clusters, providing a foundation for real-time applications that require consistent availability and operational stability. The system distinguishes itself through a shared-nothing architecture that distributes data across independent CPU cores to eliminate lock contention. It incorporates a user-space networking stack and an asynchronous event-driven engine to maximize hardwa
ScyllaDB is a high-performance distributed NoSQL database that has integrated native support for vector similarity search and high-dimensional indexing, making it a capable engine for vector-based workloads alongside its primary storage functions.
C++Vector Embedding IndexesSharding Architectures
View on GitHub15,355
alibaba/zvec
alibaba/zvec
5,198View on GitHub
zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ
This is a specialized embedded vector database engine that provides high-dimensional indexing, similarity search, and embedding model integration, making it a direct fit for your requirements.
C++Vector IndexingVector Similarity Search
View on GitHub5,198
redis/redis
redis/redis
74,906View on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
Redis is a versatile, high-performance in-memory data platform that natively supports vector similarity search and high-dimensional indexing, making it a capable solution for vector database requirements.
CVector Embedding IndexesSharding Strategies
View on GitHub74,906
alibaba/alisql
alibaba/AliSQL
5,706View on GitHub
AliSQL is a fork of MySQL by Alibaba that extends the relational database management system with enhancements for high performance, scalability, and enterprise-grade availability. It retains the core MySQL identity as a SQL-based database for storing, organizing, and retrieving structured data, while adding optimizations for large-scale transactional and analytical workloads. The project differentiates itself through a set of Alibaba-specific improvements, including a columnar engine for accelerating analytical queries directly on MySQL tables, and a distributed, shared-nothing NDB Cluster en
AliSQL is a relational database that includes support for vector similarity search via HNSW indexing, making it a viable option for users who need vector capabilities within a traditional SQL environment.
C++Horizontal Database ScalingVector Similarity SearchDatabase REST APIs
View on GitHub5,706
spotify/annoy
spotify/annoy
14,157View on GitHub
Annoy is a C++ library designed for approximate nearest neighbor search in high-dimensional vector spaces. It functions as a vector similarity search engine that constructs static, disk-based data structures to facilitate fast lookups. By mapping identifiers to vector data and persisting these structures to disk, the library enables efficient, memory-mapped access to large datasets. The project distinguishes itself through the use of random projection trees and distance-metric-based partitioning, which organize data into hierarchical binary trees to balance search precision against computatio
This is a specialized library for approximate nearest neighbor search and indexing rather than a full-featured vector database, as it lacks built-in CRUD operations, native API services, and distributed storage management.
C++Vector Search IndexesVector Similarity SearchSimilarity Search Engines
View on GitHub14,157
tporadowski/redis
tporadowski/redis
9,987View on GitHub
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
Redis is a high-performance key-value store that includes robust vector similarity search and indexing capabilities, making it a capable, multi-purpose solution for managing high-dimensional embeddings alongside traditional data.
CVector IndexingVector Similarity Search
View on GitHub9,987
meilisearch/meilisearch
meilisearch/meilisearch
58,118View on GitHub
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
Meilisearch is a full-text search engine that has added support for vector-based semantic search, making it a capable tool for similarity search even though its primary focus remains on traditional document-based retrieval.
RustDeveloper-Focused Search ToolsDocument Indexing EnginesFinite State Transducers
View on GitHub58,118
typesense/typesense
typesense/typesense
25,254View on GitHub
Typesense is a distributed search engine designed to provide sub-millisecond query latency across massive datasets. It functions as both a high-performance indexing and retrieval engine and a comprehensive search experience platform, offering built-in typo tolerance and tools for managing relevance through synonym configuration, result curation, and complex filtering. The platform distinguishes itself by utilizing in-memory indexing to maintain high-throughput data retrieval and integrating vector database capabilities to support semantic similarity searches. It ensures data consistency and h
Typesense is a distributed search engine that natively integrates vector similarity search and high-dimensional indexing alongside its traditional full-text capabilities, making it a capable choice for applications requiring both keyword and semantic retrieval.
C++Distributed Search EnginesSearch EnginesSearch Experience Platforms
View on GitHub25,254
surrealdb/surrealdb
surrealdb/surrealdb
32,397View on GitHub
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
SurrealDB is a multi-model database that natively supports vector storage and similarity search alongside relational and graph capabilities, making it a capable choice for applications requiring integrated vector operations.
RustMulti-Model DatabasesAccess Control SystemsACID Transactional Cores
View on GitHub32,397
clickhouse/clickhouse
ClickHouse/ClickHouse
48,229View on GitHub
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
ClickHouse is a high-performance analytical database that includes native support for vector similarity search and high-dimensional indexing, making it a capable engine for vector-based workloads despite its broader focus on general-purpose OLAP.
C++Access Control SystemsAgent AnalyticsAgentic Architectures
View on GitHub48,229
dragonflydb/dragonfly
dragonflydb/dragonfly
30,688View on GitHub
Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries. What distinguishes Dragonfly is its focus on effic
Dragonfly is a high-performance, multi-model in-memory data store that includes native support for vector similarity search and indexing, making it a capable engine for vector-based workloads despite its broader focus as a general-purpose database.
C++Access Control SystemsCluster ManagementConcurrency Models
View on GitHub30,688

Vector Databases for Machine Embeddings

facebookresearch/faiss

milvus-io/milvus

lancedb/lancedb

unum-cloud/USearch

chroma-core/chroma

cozodb/cozo

RyanCodrai/turbovec

pgvector/pgvector

MariaDB/server

mongodb/mongo

activeloopai/Hub

postgresml/postgresml

neo4j/neo4j

redisson/redisson

scylladb/scylladb

alibaba/zvec

redis/redis

alibaba/AliSQL

spotify/annoy

tporadowski/redis

meilisearch/meilisearch

typesense/typesense

surrealdb/surrealdb

ClickHouse/ClickHouse

dragonflydb/dragonfly