# spotify/annoy

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/spotify-annoy).**

14,157 stars · 1,218 forks · C++ · apache-2.0

## Links

- GitHub: https://github.com/spotify/annoy
- awesome-repositories: https://awesome-repositories.com/repository/spotify-annoy.md

## Topics

`approximate-nearest-neighbor-search` `c-plus-plus` `golang` `locality-sensitive-hashing` `lua` `nearest-neighbor-search` `python`

## Description

Annoy is a C++ library designed for approximate nearest neighbor search in high-dimensional vector spaces. It functions as a vector similarity search engine that constructs static, disk-based data structures to facilitate fast lookups. By mapping identifiers to vector data and persisting these structures to disk, the library enables efficient, memory-mapped access to large datasets.

The project distinguishes itself through the use of random projection trees and distance-metric-based partitioning, which organize data into hierarchical binary trees to balance search precision against computational overhead. Because the resulting indices are immutable and memory-mapped, they can be shared across multiple independent system processes without requiring the entire dataset to reside in active memory.

The library supports a broad range of indexing and retrieval capabilities, including the ability to scale to datasets that exceed available system memory. It provides cross-language integration through generated bindings and standard build system support, allowing the core search engine to be utilized across various programming environments.

## Tags

### Data & Databases

- [Approximate Nearest Neighbor Search](https://awesome-repositories.com/f/data-databases/approximate-nearest-neighbor-search.md) — Provides a high-performance library for approximate nearest neighbor search using random projection trees and memory-mapped indices.
- [Memory-Mapped Indexing](https://awesome-repositories.com/f/data-databases/memory-mapped-indexing.md) — Enables efficient access to large datasets by persisting search structures to disk and mapping them directly into process memory.
- [Vector Search Indexes](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/vector-search-indexes.md) — Constructs memory-mapped vector search indexes that enable fast, memory-efficient similarity queries. ([source](https://github.com/spotify/annoy#readme))
- [Similarity Search Engines](https://awesome-repositories.com/f/data-databases/similarity-search-engines.md) — Constructs static, disk-based data structures to perform high-dimensional nearest neighbor lookups with configurable precision.
- [Vector Similarity Search](https://awesome-repositories.com/f/data-databases/vector-similarity-search.md) — Performs similarity queries by applying distance metrics to find nearest neighbors in high-dimensional space. ([source](https://github.com/spotify/annoy#readme))
- [K-Nearest Neighbor Retrieval](https://awesome-repositories.com/f/data-databases/k-nearest-neighbor-retrieval.md) — Retrieves the closest items to a query vector using pre-built, memory-mapped data structures. ([source](https://github.com/spotify/annoy/blob/master/README_Lua.md))
- [Memory-Disk Layering](https://awesome-repositories.com/f/data-databases/persistent-storage-providers/memory-disk-layering.md) — Persists search structures to disk, allowing large datasets to be shared across processes without requiring full memory residency. ([source](https://github.com/spotify/annoy/blob/main/setup.py))
- [Search Index Management](https://awesome-repositories.com/f/data-databases/search-index-management.md) — Enables saving and loading of static search indices to disk for distribution and reuse across system environments. ([source](https://github.com/spotify/annoy#readme))
- [Search and Indexing](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing.md) — Supports scaling indexing to disk to accommodate datasets that exceed available system memory. ([source](https://github.com/spotify/annoy/blob/main/README.rst))
- [Static Content Indexing](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/static-content-indexing.md) — Constructs immutable, pre-computed binary search structures that remain fixed after creation for fast, shared lookups.

### Software Engineering & Architecture

- [Trees](https://awesome-repositories.com/f/software-engineering-architecture/trees.md) — Organizes high-dimensional data into hierarchical random projection trees to enable rapid approximate nearest neighbor lookups.

### Part of an Awesome List

- [Recommender Systems](https://awesome-repositories.com/f/awesome-lists/ai/recommender-systems.md) — Listed in the “Recommender Systems” section of the Awesome Python awesome list.
- [Vector Databases](https://awesome-repositories.com/f/awesome-lists/data/vector-databases.md) — Listed in the “Vector Databases” section of the Llm Course awesome list.

### Programming Languages & Runtimes

- [Native Library Integrations](https://awesome-repositories.com/f/programming-languages-runtimes/language-interoperability/foreign-function-interfaces/native-library-integrations.md) — Exposes high-performance search capabilities to multiple programming environments through native bindings and build system support.
- [Language Bindings](https://awesome-repositories.com/f/programming-languages-runtimes/language-interoperability/language-bindings.md) — Generates interface wrappers to allow core search functionality to be utilized from various programming languages. ([source](https://github.com/spotify/annoy/blob/main/tox.ini))

### Scientific & Mathematical Computing

- [Distance Metrics](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/mathematical-libraries-and-utilities/core-mathematical-concepts/distance-metrics.md) — Provides configurable distance metrics to partition vector spaces for efficient similarity search.
