Faiss

This project is a high-performance library designed for the similarity search and clustering of dense vectors across massive datasets. It functions as a vector similarity search engine, providing the necessary tools to organize complex numerical data into specialized structures that facilitate rapid retrieval and efficient querying of millions of records.

The library distinguishes itself through a variety of advanced indexing and compression techniques, including hierarchical navigable small worlds for logarithmic time complexity and inverted file indexing to partition vector spaces into manageable subsets. To handle large-scale data, it employs product quantization to reduce memory footprints and utilizes hardware-level vector instructions to accelerate mathematical operations. For scenarios requiring absolute precision, the system also supports exhaustive brute-force search methods.

Beyond its core indexing capabilities, the library provides a comprehensive framework for the end-to-end vector search workflow, from the initial formatting of floating-point data into row-major matrices to the execution of nearest-neighbor retrieval. It includes support for memory-mapped index storage, allowing for the management of datasets that exceed physical memory capacity, and serves as a foundation for machine learning feature retrieval tasks.

Features

Vector Search Engines - Provides a high-performance library for efficient similarity search and clustering of dense vectors across massive datasets.
Approximate Nearest Neighbor Search - Optimizes search speed by trading off absolute precision for significantly faster lookup times when querying extremely large vector databases.
Vector Similarity Search - Enables identifying nearest neighbors in large datasets using distance metrics like Euclidean or inner product for fast and accurate results.
Similarity Search - Finds the most relevant items in massive datasets by comparing mathematical representations of data points based on their proximity.
Graph-Based Indexing - Constructs a multi-layered graph structure that allows logarithmic time complexity for finding approximate nearest neighbors in high-dimensional space.

unum-cloud/USearch

3,888View on GitHub

USearch is a high-performance vector similarity search engine and approximate nearest neighbor index designed for dense embeddings. It functions as a low-level vector database core and high-dimensional vector indexer, providing the primitives necessary to store and retrieve vectors across massive datasets. The engine distinguishes itself through hardware-level SIMD acceleration for distance kernels and a proximity-graph indexing system that enables fast retrieval across billions of vectors. It supports multi-precision vector quantization to balance memory usage and accuracy, and utilizes memo

spotify/annoy

14,157View on GitHub

Annoy is a C++ library designed for approximate nearest neighbor search in high-dimensional vector spaces. It functions as a vector similarity search engine that constructs static, disk-based data structures to facilitate fast lookups. By mapping identifiers to vector data and persisting these structures to disk, the library enables efficient, memory-mapped access to large datasets. The project distinguishes itself through the use of random projection trees and distance-metric-based partitioning, which organize data into hierarchical binary trees to balance search precision against computatio

lancedb/lancedb

9,031View on GitHub

LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters

alibaba/zvec

5,198View on GitHub

zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ

facebookresearchfaiss

Features