Milvus

Milvus is a specialized vector database engine designed for the indexing, management, and high-speed similarity retrieval of high-dimensional vector embeddings. It functions as a similarity search engine capable of identifying nearest neighbors within large-scale vector spaces, supporting the storage and retrieval of billions of data points while maintaining consistent performance.

The system utilizes a distributed architecture that decouples storage, query, and coordination into independent services, allowing for horizontal scaling across clusters. It employs a global indexing mechanism that builds specialized data structures across immutable, independently indexed segments. This design, combined with a shared-storage decoupled model, enables compute and storage resources to scale independently in cloud environments, while a log-based persistence layer ensures data durability and state recovery.

The platform supports a wide range of data retrieval patterns, including retrieval-augmented generation, hybrid search, and multimodal data retrieval for text, images, and graphs. Deployment options range from lightweight local instances for rapid prototyping to robust standalone setups and fully managed distributed clusters. Documentation includes sizing tools to assist in estimating hardware requirements based on specific data volumes and operational patterns.

Features

Similarity Search Engines - Provides high-performance similarity search capabilities for high-dimensional vector data.
Vector Databases - Functions as a specialized storage engine optimized for indexing and high-speed similarity retrieval of vector embeddings.
Vector Search Engines - Provides high-speed similarity search capabilities across massive high-dimensional datasets.
Distributed Database Clusters - Supports distributed architecture to handle horizontal scaling across clusters for large-scale production needs.
Retrieval-Augmented Generation Frameworks - Provides the foundational infrastructure for building retrieval-augmented generation applications.
Indexing Engines - Implements specialized indexing structures to enable high-performance similarity searches across massive vector datasets.
Distributed Data Architectures - Implements a distributed architecture that supports horizontal scaling and high availability across clusters.
Distributed Data Infrastructure - Manages and scales complex data storage systems across multiple server nodes for production environments.
Retrieval-Augmented Generation - Enhances AI models by providing contextually relevant data retrieved from vector-based knowledge bases.
Vector Databases - Scalable open-source vector database for similarity search.
Data Management - Vector similarity search engine for embeddings.
Data Storage Systems - Manages embedding vectors for ML and neural networks.
Database Engines - Vector database for embedding management and search.
Database Systems - Vector database for scalable similarity search and AI tasks.
Database Tools - Vector database.
Databases and RAG - Cloud-native vector database for AI.
Vector Databases - Cloud-native vector database for scalable similarity search.
Infrastructure and Serving - Vector database for similarity search.
Backend and Infrastructure - Vector database for AI and machine learning.
Storage Decoupling - Separates compute and storage nodes to allow independent scaling of processing power and data capacity.
Hybrid Search Systems - Implements hybrid search capabilities to combine vector similarity with other retrieval methods.
Multimodal Databases - Acts as a unified storage environment for organizing and retrieving complex data types like text and images.
Microservice Architectures - Decouples storage, query, and coordination into independent services to enable horizontal scaling.
Multimodal Retrieval Systems - Enables searching across diverse media types by utilizing shared vector representations.
Multimodal Search Engines - Supports multimodal search patterns to query across diverse data types.
Data Partitioning - Partitions data into immutable segments to optimize memory usage and parallel search performance.
Standalone Databases - Provides a standalone configuration for single-machine environments.
Write-Ahead Logs - Ensures data durability and consistent state recovery by recording all mutations in a distributed message log.

Star history

milvus-iomilvus

Name: milvus-io/milvus
Author: milvus-io

View on GitHub

44,804 stars4,068 forksGoApache-2.016 viewsmilvus.io

Milvus

Features

Similarity Search Engines - Provides high-performance similarity search capabilities for high-dimensional vector data.
Vector Databases - Functions as a specialized storage engine optimized for indexing and high-speed similarity retrieval of vector embeddings.
Vector Search Engines - Provides high-speed similarity search capabilities across massive high-dimensional datasets.
Distributed Database Clusters - Supports distributed architecture to handle horizontal scaling across clusters for large-scale production needs.
Retrieval-Augmented Generation Frameworks - Provides the foundational infrastructure for building retrieval-augmented generation applications.
Indexing Engines - Implements specialized indexing structures to enable high-performance similarity searches across massive vector datasets.
Distributed Data Architectures - Implements a distributed architecture that supports horizontal scaling and high availability across clusters.
Distributed Data Infrastructure - Manages and scales complex data storage systems across multiple server nodes for production environments.
Retrieval-Augmented Generation - Enhances AI models by providing contextually relevant data retrieved from vector-based knowledge bases.
Vector Databases - Scalable open-source vector database for similarity search.
Data Management - Vector similarity search engine for embeddings.
Data Storage Systems - Manages embedding vectors for ML and neural networks.
Database Engines - Vector database for embedding management and search.
Database Systems - Vector database for scalable similarity search and AI tasks.
Database Tools - Vector database.
Databases and RAG - Cloud-native vector database for AI.
Vector Databases - Cloud-native vector database for scalable similarity search.
Infrastructure and Serving - Vector database for similarity search.
Backend and Infrastructure - Vector database for AI and machine learning.
Storage Decoupling - Separates compute and storage nodes to allow independent scaling of processing power and data capacity.
Hybrid Search Systems - Implements hybrid search capabilities to combine vector similarity with other retrieval methods.
Multimodal Databases - Acts as a unified storage environment for organizing and retrieving complex data types like text and images.
Microservice Architectures - Decouples storage, query, and coordination into independent services to enable horizontal scaling.
Multimodal Retrieval Systems - Enables searching across diverse media types by utilizing shared vector representations.
Multimodal Search Engines - Supports multimodal search patterns to query across diverse data types.
Data Partitioning - Partitions data into immutable segments to optimize memory usage and parallel search performance.
Standalone Databases - Provides a standalone configuration for single-machine environments.
Write-Ahead Logs - Ensures data durability and consistent state recovery by recording all mutations in a distributed message log.

Open-source alternatives to Milvus

Similar open-source projects, ranked by how many features they share with Milvus.

qdrant/qdrant
qdrant/qdrant
32,372View on GitHub
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
Rustai-searchai-search-engineembeddings-similarity
View on GitHub32,372
chroma-core/chroma
chroma-core/chroma
26,198View on GitHub
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Rustaidatabasedocument-retrieval
View on GitHub26,198

Frequently asked questions

What does milvus-io/milvus do?

What are the main features of milvus-io/milvus?

The main features of milvus-io/milvus are: Similarity Search Engines, Vector Databases, Vector Search Engines, Distributed Database Clusters, Retrieval-Augmented Generation Frameworks, Indexing Engines, Distributed Data Architectures, Distributed Data Infrastructure.

What are some open-source alternatives to milvus-io/milvus?

Open-source alternatives to milvus-io/milvus include: qdrant/qdrant — Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors… chroma-core/chroma — Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for… weaviate/weaviate — Weaviate is an AI-native vector database designed to store and index high-dimensional vector embeddings alongside… lancedb/lancedb — LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector… infiniflow/infinity — Infinity is a distributed vector database and multimodal vector store designed to manage large-scale datasets for… activeloopai/deeplake — DeepLake is AI data infrastructure consisting of a multimodal data lake, a hybrid search engine, and a serverless…