# alibaba/zvec

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/alibaba-zvec).**

5,198 stars · 297 forks · C++ · apache-2.0

## Links

- GitHub: https://github.com/alibaba/zvec
- Homepage: https://zvec.org/en/
- awesome-repositories: https://awesome-repositories.com/repository/alibaba-zvec.md

## Topics

`ann-search` `embedded-database` `rag` `vector-search` `vectordb`

## Description

zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors.

The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for native application integration.

The project provides comprehensive data management through isolated local collection persistence, write-ahead logging, and dynamic schema mapping. Its search capabilities cover approximate nearest neighbor search at billion-scale, multimodal semantic search, and result reranking, while optimizing performance via memory-mapped I/O and vector index compression.

The engine facilitates AI agent integration by exposing database interfaces and reusable operation skill sets to connect agents to structured data stores.

## Tags

### Data & Databases

- [Hybrid Search Engines](https://awesome-repositories.com/f/data-databases/hybrid-search-engines.md) — Functions as a hybrid search engine integrating vector-based semantic retrieval with traditional keyword-based indexing. ([source](https://cdn.jsdelivr.net/gh/alibaba/zvec@main/README.md))
- [Vector Storage](https://awesome-repositories.com/f/data-databases/local-first-storage/vector-storage.md) — Provides a specialized embedded storage engine for high-dimensional vectors with configurable numeric formats and similarity metrics. ([source](https://zvec.org/en/docs/db/collections/create/schema/))
- [Hybrid Retrieval](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/hybrid-retrieval.md) — Fuses vector similarity, full-text keyword matching, and scalar filtering into a single query operation.
- [Vector Storage](https://awesome-repositories.com/f/data-databases/vector-storage.md) — Provides a high-performance storage engine for persisting high-dimensional dense embeddings used in semantic search. ([source](https://zvec.org/en/docs/db/concepts/vector-embedding/))
- [Approximate Nearest Neighbor Search](https://awesome-repositories.com/f/data-databases/approximate-nearest-neighbor-search.md) — Implements approximate nearest neighbor indexing to retrieve similar embeddings without scanning the entire dataset.
- [Attribute Filtering](https://awesome-repositories.com/f/data-databases/attribute-filtering.md) — Refines vector queries using specific scalar attributes to pinpoint precise results. ([source](https://zvec.org/en/docs/db/data-operations/query/fts/))
- [Document Upserts](https://awesome-repositories.com/f/data-databases/data-collections-datasets/collection-lifecycle-management/document-storage-managers/document-upserts.md) — Supports adding new documents to a collection or overwriting existing ones based on a unique identifier. ([source](https://zvec.org/en/docs/db/data-operations/upsert/))
- [Schema Definition](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-modeling-schemas/data-schemas/schema-definition.md) — Defines document structures using dynamic schemas that allow field modifications without database rebuilds. ([source](https://zvec.org/en/docs/db/collections/create/schema/))
- [Document Lifecycle Management](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/local-document-indexing/document-indexing/document-lifecycle-management.md) — Manages the full lifecycle of documents, including insertion, updates, and deletions via unique identifiers. ([source](https://zvec.org/en/docs/db/data-operations/))
- [Document Deletion Operations](https://awesome-repositories.com/f/data-databases/data-management/document-record-handling/document-deletion-operations.md) — Provides functionality to permanently remove specific records using identifiers or conditional filters. ([source](https://zvec.org/en/docs/db/quickstart/))
- [Vector Document Indexing](https://awesome-repositories.com/f/data-databases/database-management-systems/database-engines/vector-databases/vector-document-indexing.md) — Provides workflows for indexing single or batch documents containing dense and sparse vectors. ([source](https://zvec.org/en/docs/db/data-operations/insert/))
- [Schema Modification](https://awesome-repositories.com/f/data-databases/database-management-systems/database-systems-management/database-management/schema-designers/table-schemas/schema-modification.md) — Allows updating scalar field names, data types, or structures after creation without downtime. ([source](https://zvec.org/en/docs/db/collections/schema-evolution/))
- [Document Ingestion Pipelines](https://awesome-repositories.com/f/data-databases/document-ingestion-pipelines.md) — Implements pipelines for storing documents containing both scalar metadata and high-dimensional vector embeddings. ([source](https://zvec.org/en/docs/db/quickstart/))
- [Dynamic Schema Storage](https://awesome-repositories.com/f/data-databases/dynamic-schema-storage.md) — Enables adding or removing scalar fields and vectors from a collection without recreating the database. ([source](https://zvec.org/en/docs/db/concepts/data-modeling/))
- [Full Text Search](https://awesome-repositories.com/f/data-databases/full-text-search.md) — Implements full-text search capabilities for retrieving documents by matching text content through specialized indexing. ([source](https://zvec.org/en/docs/db/data-operations/query/))
- [Document Retrieval by Identifier](https://awesome-repositories.com/f/data-databases/full-text-search/documentation-search/document-retrieval-by-identifier.md) — Enables direct retrieval of specific documents from the database using their unique identifiers. ([source](https://zvec.org/en/docs/db/quickstart/))
- [Hybrid Vector-Keyword Indexing](https://awesome-repositories.com/f/data-databases/hybrid-vector-keyword-indexing.md) — Combines dense vector embeddings with inverted keyword indices to enable fast, relevance-ranked lexical retrieval. ([source](https://zvec.org/en/docs/db/concepts/))
- [Local Vector Store Backends](https://awesome-repositories.com/f/data-databases/in-memory-data-stores/vector-stores/local-vector-store-backends.md) — Manages isolated local data stores for organizing different sets of vector embeddings with local persistence. ([source](https://zvec.org/en/docs/db/quickstart/))
- [Embedded Persistence](https://awesome-repositories.com/f/data-databases/in-memory-data-stores/vector-stores/local-vector-store-backends/embedded-persistence.md) — Functions as a self-contained data store that persists collections locally using write-ahead logging and memory-mapped I/O.
- [Local Data Persistence](https://awesome-repositories.com/f/data-databases/local-data-persistence.md) — Saves collections into dedicated, self-contained directories to ensure isolation and local persistence. ([source](https://zvec.org/en/docs/db/concepts/data-modeling/))
- [Sparse](https://awesome-repositories.com/f/data-databases/local-first-storage/vector-storage/sparse.md) — Implements a dedicated storage layer for sparse vectors to enable efficient lexical and hybrid retrieval. ([source](https://zvec.org/en/docs/db/concepts/vector-embedding/))
- [Metadata Filtering](https://awesome-repositories.com/f/data-databases/metadata-filtering.md) — Restricts search results using scalar field conditions to isolate specific subsets of data. ([source](https://zvec.org/en/docs/db/data-operations/query/))
- [Search and Indexing](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing.md) — Integrates vector index construction for similarity search and inverted scalar indexes for efficient filtering. ([source](https://zvec.org/en/docs/db/concepts/data-modeling/))
- [Index Definitions](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/index-definitions.md) — Defines the indexing strategy for storing document chunks and their associated metadata. ([source](https://zvec.org/en/docs/ai/skills/))
- [Vector Collection Management](https://awesome-repositories.com/f/data-databases/vector-collection-management.md) — Provides tools for initializing and configuring vector collections with specific dimensions and scalar schemas. ([source](https://zvec.org/en/docs/db/collections/create/))
- [Vector Databases](https://awesome-repositories.com/f/data-databases/vector-databases.md) — Provides a specialized database engine for storing and querying high-dimensional dense and sparse vector embeddings.
- [Vector Indexing](https://awesome-repositories.com/f/data-databases/vector-indexing.md) — Offers a toolset for building IVF and graph-based indexes to enable approximate nearest neighbor search.
- [Inverted File Indexes](https://awesome-repositories.com/f/data-databases/vector-indexing/inverted-file-indexes.md) — Implements IVF vector indexing to partition vector spaces into clusters for faster similarity searches. ([source](https://zvec.org/en/docs/db/concepts/vector-index/ivf-index/))
- [Embedding Generation](https://awesome-repositories.com/f/data-databases/vector-search/embedding-generation.md) — Transforms text into high-dimensional vectors using local models or cloud APIs for semantic similarity search. ([source](https://zvec.org/en/docs/ai/embedding/))
- [Vector Similarity Search](https://awesome-repositories.com/f/data-databases/vector-similarity-search.md) — Implements high-dimensional similarity search using nearest neighbor indexes to find relevant documents or images.
- [Filtered Similarity Searches](https://awesome-repositories.com/f/data-databases/vector-similarity-search/filtered-similarity-searches.md) — Implements techniques for constraining vector similarity results using relational or full-text metadata filters. ([source](https://zvec.org/en/docs/db/quickstart/))
- [Exclusive Write Access](https://awesome-repositories.com/f/data-databases/acid-transactional-cores/concurrent-read-write-transactions/exclusive-write-access.md) — Maintains exclusive write access to data collections while allowing multiple concurrent read operations. ([source](https://cdn.jsdelivr.net/gh/alibaba/zvec@main/README.md))
- [Billion-Scale Vector Search](https://awesome-repositories.com/f/data-databases/approximate-nearest-neighbor-search/billion-scale-vector-search.md) — Executes approximate nearest neighbor searches across billions of vectors by optimizing memory and disk usage. ([source](https://zvec.org/en/docs/db/concepts/vector-index/diskann-index/))
- [Inverted Index Engines](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/specialized-storage-engines/inverted-index-engines.md) — Utilizes inverted indexing to map terms and scalar values to document identifiers for fast keyword search.
- [Scalar Attribute Filtering](https://awesome-repositories.com/f/data-databases/data-type-definitions/scalar-types/scalar-attribute-filtering.md) — Provides high-performance filtering of documents using primitive scalar attributes and optimized indices. ([source](https://zvec.org/en/docs/db/data-operations/query/filter/))
- [Inverted Scalar Indexes](https://awesome-repositories.com/f/data-databases/field-scoped-indexing/inverted-scalar-indexes.md) — Creates inverted indices on structured scalar data fields to enable fast filtering and retrieval. ([source](https://zvec.org/en/docs/db/collections/create/schema/))
- [Tokenized Text Indexes](https://awesome-repositories.com/f/data-databases/field-scoped-indexing/tokenized-text-indexes.md) — Configures string fields for text search using customizable tokenizers for different languages. ([source](https://zvec.org/en/docs/db/data-operations/query/fts/))
- [Concurrent Read Access](https://awesome-repositories.com/f/data-databases/high-concurrency-database-access/concurrent-read-access.md) — Allows multiple processes to read from the same data collection simultaneously for high-throughput retrieval. ([source](https://zvec.org/en/docs/db/))
- [Index Construction](https://awesome-repositories.com/f/data-databases/index-construction.md) — Provides utilities to create permanent indexes from staged vectors to optimize similarity search speed. ([source](https://zvec.org/en/docs/db/quickstart/))
- [Memory-Mapped Storage](https://awesome-repositories.com/f/data-databases/memory-mapped-storage.md) — Uses memory-mapped I/O to map files directly into the virtual address space for accelerated data retrieval.
- [Range and Exact Match Filters](https://awesome-repositories.com/f/data-databases/query-condition-builders/pattern-matching-filters/range-and-exact-match-filters.md) — Accelerates retrieval using exact matches and range queries to isolate specific records. ([source](https://zvec.org/en/docs/db/concepts/inverted-index/))
- [Search Index Management](https://awesome-repositories.com/f/data-databases/search-index-management.md) — Provides tools for configuring and managing indexes on both scalar and vector fields. ([source](https://zvec.org/en/docs/db/collections/schema-evolution/))
- [Background Index Buffering](https://awesome-repositories.com/f/data-databases/search-indexing/automatic-background-indexing/background-index-buffering.md) — Implements background buffering for vector index construction to maintain retrieval speed during updates. ([source](https://zvec.org/en/docs/db/collections/optimize/))
- [Search Result Fusion Algorithms](https://awesome-repositories.com/f/data-databases/search-result-fusion-algorithms.md) — Refines the sequence of retrieved items using models or fusion algorithms to improve relevance. ([source](https://zvec.org/en/docs/ai/reranker/))
- [Multimodal Search](https://awesome-repositories.com/f/data-databases/semantic-search/multimodal-search.md) — Locates visually similar images or code snippets using natural language descriptions and semantic embeddings. ([source](https://zvec.org/en/))
- [Similarity Search](https://awesome-repositories.com/f/data-databases/similarity-search.md) — Finds the most similar items using either exact matching or approximate indexing. ([source](https://zvec.org/en/docs/db/))
- [Vector Index Compression](https://awesome-repositories.com/f/data-databases/vector-indexing/vector-index-compression.md) — Converts high-precision vectors into compact forms to reduce memory usage and lower query latency. ([source](https://zvec.org/en/docs/db/concepts/vector-index/quantization/))
- [Vector Search](https://awesome-repositories.com/f/data-databases/vector-search.md) — Executes semantic similarity searches directly on the host device to eliminate network latency. ([source](https://zvec.org/en/blog/))
- [Multi-Vector Fusion Search](https://awesome-repositories.com/f/data-databases/vector-search/multi-vector-fusion-search.md) — Combines results from multiple independent vector embeddings and uses a re-ranker to fuse them. ([source](https://zvec.org/en/docs/db/data-operations/query/multi-vector/))
- [Similarity Thresholds](https://awesome-repositories.com/f/data-databases/vector-similarity-search/similarity-thresholds.md) — Filters retrieval results by rejecting candidates that fall outside a specified mathematical similarity distance radius. ([source](https://zvec.org/en/docs/db/concepts/vector-index/ivf-index/))
- [Text Vectorizers](https://awesome-repositories.com/f/data-databases/vector-storage/text-vectorizers.md) — Integrates embedding models to transform raw textual data into high-dimensional vector representations for similarity search. ([source](https://zvec.org/en/docs/ai/))
- [Write-Ahead Logging](https://awesome-repositories.com/f/data-databases/write-ahead-logging.md) — Ensures data durability and recoverability by recording all mutations to a persistent log before applying them. ([source](https://cdn.jsdelivr.net/gh/alibaba/zvec@main/README.md))

### Artificial Intelligence & ML

- [Hybrid Search Retrievers](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-rag-development/knowledge-base-retrieval/hybrid-search-retrievers.md) — Combines vector similarity with scalar filtering and weighted ranking to refine retrieval for RAG and AI agents. ([source](https://zvec.org/en/docs/db/))
- [RAG Document Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/documentation-retrieval-engines/rag-document-retrieval.md) — Fetches relevant document snippets from a local knowledge base to provide grounded context for LLMs.
- [RAG Context Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/rag-context-retrieval.md) — Retrieves relevant documents from a knowledge base to ground large language model responses in factual information. ([source](https://zvec.org/en/))
- [Sparse Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-embeddings/sparse-embeddings.md) — Generates sparse vector representations to enable efficient lexical matching and hybrid retrieval. ([source](https://zvec.org/en/docs/ai/embedding/))
- [Knowledge Base Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-rag-development/knowledge-base-retrieval.md) — Serves as a persistence layer that provides relevant document context to large language models via semantic retrieval.
- [AI Agent Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-agent-integrations.md) — Exposes database interfaces and standardized protocols to connect AI agents to structured data stores.
- [AI Agent Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-agent-servers.md) — Provides a standardized server interface to expose database operations as tools for AI agents. ([source](https://zvec.org/en/docs/ai/))
- [AI Agent Skills](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-agent-skills.md) — Allows the definition of reusable operation sets that AI agents use to perform tasks over stored data. ([source](https://zvec.org/en/docs/ai/))
- [Custom Model Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-integrations.md) — Allows the integration of proprietary or third-party embedding models via protocol classes for specialized representations. ([source](https://zvec.org/en/docs/ai/embedding/))
- [Custom Reranker Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/reranking-parameters/automated-scoring-rerankers/custom-reranker-backends.md) — Integrates external models or proprietary logic into the retrieval pipeline using a custom reranker extension system. ([source](https://zvec.org/en/docs/ai/reranker/))
- [Result Reranking](https://awesome-repositories.com/f/artificial-intelligence-ml/result-reranking.md) — Provides mechanisms to re-score and re-order search hits to improve the precision of top results. ([source](https://zvec.org/en/docs/ai/))

### Operating Systems & Systems Programming

- [Memory-Mapped I/O](https://awesome-repositories.com/f/operating-systems-systems-programming/memory-mapped-i-o.md) — Utilizes memory-mapped I/O to accelerate data retrieval and optimize memory usage for caching. ([source](https://zvec.org/en/docs/db/collections/create/options/))

### Software Engineering & Architecture

- [Data Type Validation](https://awesome-repositories.com/f/software-engineering-architecture/data-schema-validation/data-type-validation.md) — Validates that every ingested field strictly conforms to the declared scalar or vector data types. ([source](https://zvec.org/en/docs/db/concepts/data-modeling/))
- [Multi-Process Data Stores](https://awesome-repositories.com/f/software-engineering-architecture/multi-process-data-stores.md) — Enables multiple processes to read from a single data collection simultaneously with exclusive write access.
- [Model Adapter Plugins](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/plugin-module-systems/modular-plugin-architectures/plugin-based-architectures/plugin-based-architectures/model-adapter-plugins.md) — Provides a plugin-based system to register third-party embedding models and rerankers through standardized adapters.

### Part of an Awesome List

- [Database Systems](https://awesome-repositories.com/f/awesome-lists/data/database-systems.md) — Lightweight, in-process vector database.
