What are the best open-source GitHub repositories for قاعدة بيانات مفتوحة المصدر لتضمينات المتجهات?

vdaas/vald is the closest match — Vald is a distributed, cloud-native approximate nearest neighbor search engine for high-dimensional vectors, fitting the vector database category, though explicit hybrid search and ML framework integration are not highlighted.. Other strong matches: semi-technologies/weaviate, oceanbase/oceanbase, qdrant/qdrant, pgvector/pgvector.

Why does vdaas/vald match “قاعدة بيانات مفتوحة المصدر لتضمينات المتجهات”?

Vald is a distributed, cloud-native approximate nearest neighbor search engine for high-dimensional vectors, fitting the vector database category, though explicit hybrid search and ML framework integration are not highlighted.

Why does semi-technologies/weaviate match “قاعدة بيانات مفتوحة المصدر لتضمينات المتجهات”?

Weaviate is a cloud-native, distributed vector database that stores high-dimensional vectors alongside structured data and combines vector similarity with keyword and metadata filtering, making it a comprehensive fit for this search.

Why does oceanbase/oceanbase match “قاعدة بيانات مفتوحة المصدر لتضمينات المتجهات”?

OceanBase is a distributed SQL database that doubles as a vector database engine with hybrid vector-keyword indexing and scalable distributed architecture, directly matching your need for storing and querying embeddings in an AI/ML context.

Why does qdrant/qdrant match “قاعدة بيانات مفتوحة المصدر لتضمينات المتجهات”?

Qdrant is a purpose-built open-source vector database that stores and searches high-dimensional vectors alongside metadata, supporting approximate nearest neighbor search, multiple index types, hybrid filtering, and distributed scaling — exactly what this search asks for.

Why does pgvector/pgvector match “قاعدة بيانات مفتوحة المصدر لتضمينات المتجهات”?

pgvector adds vector storage and similarity search to PostgreSQL, supporting ANN indices, hybrid filtering, and integration with ML workflows — making it a direct fit for your search as a database-integrated vector solution, though it relies on PostgreSQL for distribution and SQL rather than a nati…

قواعد بيانات متجهة (Vector Databases)

نحن نختار بعناية مستودعات GitHub مفتوحة المصدر التي تطابق “open source vector database”. النتائج مرتبة حسب صلتها ببحثك — استخدم الفلاتر أدناه للتضييق، أو قم بتحسين البحث باستخدام الذكاء الاصطناعي.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

vdaas/vald
vdaas/vald
1,706عرض على GitHub
Vald is a distributed, cloud-native search engine designed for high-dimensional vector data. It functions as an approximate nearest neighbor search platform, enabling the identification of similar data points across massive datasets through horizontal scaling and distributed indexing. The system is built for container orchestration environments, utilizing custom resource controllers to automate cluster lifecycle management and infrastructure state. It employs graph-based indexing to perform rapid similarity lookups and supports zero-downtime operations by decoupling index construction from qu
Vald is a distributed, cloud-native approximate nearest neighbor search engine for high-dimensional vectors, fitting the vector database category, though explicit hybrid search and ML framework integration are not highlighted.
GoApproximate Nearest Neighbor SearchHorizontal ScalingGraph-Based Indexing
عرض على GitHub1,706
semi-technologies/weaviate
semi-technologies/weaviate
16,337عرض على GitHub
Weaviate is a cloud-native vector database and distributed vector store designed to save high-dimensional vectors alongside structured data. It functions as a hybrid search engine that combines vector similarity, keyword matching, and structured metadata filtering within a single query. The system is optimized for retrieval-augmented generation, integrating vector search with generative AI and reranking to power question-and-answer workflows. It distinguishes itself through the ability to merge semantic search with traditional keyword queries and structured metadata filters to improve result
Weaviate is a cloud-native, distributed vector database that stores high-dimensional vectors alongside structured data and combines vector similarity with keyword and metadata filtering, making it a comprehensive fit for this search.
GoHorizontal ScalingHybrid SearchGraph-Based Indexing
عرض على GitHub16,337
oceanbase/oceanbase
oceanbase/oceanbase
9,980عرض على GitHub
OceanBase is a distributed SQL database designed for high availability and strong consistency across multiple nodes and regions. It functions as a hybrid transactional and analytical processing engine, allowing real-time analytics and transactions to execute on a single data copy. The system also serves as a vector database engine for indexing and querying vector data to power semantic search and recommendation systems. The platform features native compatibility layers for MySQL and Oracle, enabling the migration of legacy workloads without rewriting SQL code. It utilizes a Paxos-based distri
OceanBase is a distributed SQL database that doubles as a vector database engine with hybrid vector-keyword indexing and scalable distributed architecture, directly matching your need for storing and querying embeddings in an AI/ML context.
C++Horizontal Database ScalingHorizontal ScalingHybrid Search
عرض على GitHub9,980
qdrant/qdrant
qdrant/qdrant
32,372عرض على GitHub
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
Qdrant is a purpose-built open-source vector database that stores and searches high-dimensional vectors alongside metadata, supporting approximate nearest neighbor search, multiple index types, hybrid filtering, and distributed scaling — exactly what this search asks for.
RustgRPC InterfacesHybrid Search
عرض على GitHub32,372
pgvector/pgvector
pgvector/pgvector
21,787عرض على GitHub
Vector similarity search extension for PostgreSQL.
pgvector adds vector storage and similarity search to PostgreSQL, supporting ANN indices, hybrid filtering, and integration with ML workflows — making it a direct fit for your search as a database-integrated vector solution, though it relies on PostgreSQL for distribution and SQL rather than a native REST API.
CApproximate Nearest Neighbor SearchHybrid Search
عرض على GitHub21,787
weaviate/weaviate
weaviate/weaviate
15,620عرض على GitHub
Weaviate is an AI-native vector database designed to store and index high-dimensional vector embeddings alongside traditional data objects. It serves as a backend infrastructure for retrieval-augmented generation, enabling applications to ground language model responses in private, context-aware data. The platform distinguishes itself by combining vector similarity search with traditional keyword filtering through a hybrid storage architecture. It integrates directly with external machine learning models to automate the generation of embeddings and perform complex inference tasks during inges
Weaviate is an open-source, AI-native vector database that stores and indexes high-dimensional embeddings alongside objects, offers approximate nearest neighbor search with hybrid metadata filtering, multiple index types (e.g., HNSW), distributed scaling, gRPC/REST APIs, and direct ML integration for automated embeddings—directly matching every feature in your search for a scalable vector database with hybrid search and ML compatibility.
GoEmbedding Service Integrations
عرض على GitHub15,620
microsoft/sptag
microsoft/SPTAG
5,004عرض على GitHub
SPTAG is a vector approximate nearest neighbor search library and distributed vector search engine. It provides a large-scale vector index designed to organize and retrieve similar vectors from massive datasets using high-performance similarity search and proximity queries. The system functions as a dynamic vector index manager, supporting incremental updates, insertions, and deletions of vectors without requiring a full index rebuild. It scales search operations across multiple machines to handle large-scale datasets and high volumes of online requests through distributed search request hand
SPTAG is an open-source library and distributed engine for approximate nearest neighbor search on high-dimensional vectors, covering core vector storage and similarity search with scalability and dynamic index updates, fitting the vector database search.
C++Approximate Nearest Neighbor SearchVector Search Indexes
عرض على GitHub5,004
vespa-engine/vespa
vespa-engine/vespa
6,961عرض على GitHub
Vespa is a distributed search engine, vector database, and machine learning ranking engine. It serves as an AI search platform designed to handle large-scale document indexing and complex query processing across a cluster of nodes, combining keyword retrieval with high-dimensional embedding storage for semantic similarity search. The platform distinguishes itself by integrating machine learning models directly into the search pipeline to perform real-time inference and ranking. It converts these models into ranking expressions to score and order results based on relevance, while providing a s
Vespa is a distributed AI search platform that stores and queries high-dimensional embeddings with approximate nearest neighbor search, supports hybrid keyword-vector retrieval, scales across clusters, and integrates ML models directly—matching all the key requirements of an open-source vector database.
JavaVector Search Indexes
عرض على GitHub6,961
chroma-core/chroma
chroma-core/chroma
26,198عرض على GitHub
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Chroma is an open-source vector database purpose-built for semantic similarity search on high-dimensional embeddings, supporting hybrid search with dense vectors and sparse keyword/meta-filtering, all of which match your requirements for an AI/ML-oriented vector store with API access.
RustVector DatabasesHybrid Search EnginesVector Search
عرض على GitHub26,198
milvus-io/milvus
milvus-io/milvus
44,804عرض على GitHub
Milvus is a specialized vector database engine designed for the indexing, management, and high-speed similarity retrieval of high-dimensional vector embeddings. It functions as a similarity search engine capable of identifying nearest neighbors within large-scale vector spaces, supporting the storage and retrieval of billions of data points while maintaining consistent performance. The system utilizes a distributed architecture that decouples storage, query, and coordination into independent services, allowing for horizontal scaling across clusters. It employs a global indexing mechanism that
Milvus is a purpose-built open-source vector database with a distributed architecture for high-speed similarity search on billions of embeddings, supporting multiple index types, hybrid metadata filtering, and cloud-native scaling—exactly matching your need for storing and querying vector embeddings in AI/ML applications.
GoSimilarity Search EnginesVector DatabasesVector Search Engines
عرض على GitHub44,804
activeloopai/deeplake
activeloopai/deeplake
9,175عرض على GitHub
DeepLake is AI data infrastructure consisting of a multimodal data lake, a hybrid search engine, and a serverless vector database. It provides a PostgreSQL-based AI data runtime that combines multimodal storage with streaming pipelines to load and shuffle datasets from cloud storage directly into deep learning training pipelines. The system utilizes lazy indexing to store and slice images, audio, and video without loading entire files into memory. It enables retrieval-augmented generation by persisting high-dimensional embeddings in a serverless vector store and implementing hybrid search tha
DeepLake is a serverless vector database built on PostgreSQL that combines multimodal data lake storage with hybrid vector and metadata search, directly fitting the need for an open-source vector database with ANN search and ML framework integration.
C++Multimodal Data StorageServerless Vector StoresAI Data Runtimes
عرض على GitHub9,175
activeloopai/hub
activeloopai/Hub
9,177عرض على GitHub
Hub is a multimodal AI data lake and vector database designed for storing and querying embeddings, text, audio, and images. It functions as a dataset version control system and a machine learning data streaming engine to support large-scale model training. The system utilizes a serverless PostgreSQL vector store to index high-dimensional embeddings for semantic search. It provides a visual interface for inspecting multimodal datasets and viewing annotations such as bounding boxes and masks. The platform handles cloud-agnostic storage synchronization and implements lazy, compressed data strea
Hub is an open-source multimodal AI data lake and vector database that stores and indexes embeddings for semantic search, includes metadata filtering, supports large-scale streaming and cloud-agnostic storage, and integrates with ML training workflows — directly matching the full vector-database intent.
C++Data LakesDataset Versioning SystemsData Lineage
عرض على GitHub9,177
arangodb/arangodb
arangodb/arangodb
14,091عرض على GitHub
This project is a multi-model database system designed to store and manage information as documents, graphs, and key-value pairs within a single engine. It functions as a graph database and knowledge graph platform, providing the infrastructure to build, query, and visualize structured data models. By integrating vector search capabilities, the system serves as a vector database that supports retrieval-augmented generation for artificial intelligence applications. The platform distinguishes itself through a unified query language that allows users to perform document lookups, graph traversals
ArangoDB is an open-source multi-model database with built-in vector search capabilities, supporting dense embeddings, approximate nearest neighbor search, and hybrid queries with metadata filtering, making it a full-featured vector database for similarity search and AI/ML workloads.
C++Graph DatabasesMulti-Model DatabasesAI Grounding Services
عرض على GitHub14,091
lancedb/lancedb
lancedb/lancedb
9,031عرض على GitHub
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
LanceDB is a vector database designed for storing and querying high-dimensional embeddings, with built-in hybrid search combining vector similarity with full-text and metadata filtering, directly matching the core requirements.
HTMLApproximate Nearest Neighbor SearchHorizontal ScalingHybrid Search
عرض على GitHub9,031
alibaba/zvec
alibaba/zvec
5,198عرض على GitHub
zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ
zvec is an embedded vector database engine that supports dense and sparse vectors, hybrid search with full-text and metadata filtering, and pluggable model integrations, making it a solid fit for building AI/ML similarity search applications, though its embedded nature means it scales differently than distributed server-based solutions.
C++Approximate Nearest Neighbor Search
عرض على GitHub5,198
surrealdb/surrealdb
surrealdb/surrealdb
32,397عرض على GitHub
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
SurrealDB is a multi-model database that natively supports vector search and embeddings within its ACID-compliant query layer, making it a valid open-source vector database for similarity search and AI workloads.
RustHybrid Search
عرض على GitHub32,397
typesense/typesense
typesense/typesense
25,254عرض على GitHub
Typesense is a distributed search engine designed to provide sub-millisecond query latency across massive datasets. It functions as both a high-performance indexing and retrieval engine and a comprehensive search experience platform, offering built-in typo tolerance and tools for managing relevance through synonym configuration, result curation, and complex filtering. The platform distinguishes itself by utilizing in-memory indexing to maintain high-throughput data retrieval and integrating vector database capabilities to support semantic similarity searches. It ensures data consistency and h
Typesense is a distributed search engine that includes full vector database capabilities for semantic similarity search, supporting dense embeddings, approximate nearest neighbor, and hybrid search with metadata filtering, making it a solid fit for storing and querying vector embeddings.
C++Distributed Search EnginesSearch EnginesSearch Experience Platforms
عرض على GitHub25,254
redisearch/redisearch
RediSearch/RediSearch
6,161عرض على GitHub
RediSearch is a Redis module that adds secondary indexing, full-text search, aggregation, and vector similarity search directly into the in-memory data store. It operates as an in-process search engine, extending the core key-value store with capabilities for indexing hash and JSON documents, enabling fast field-level lookups beyond primary key access. The module provides a full-text search engine built on inverted indexes, supporting stemming, fuzzy matching, and relevance scoring via tf-idf. It also includes a vector similarity search engine using a Hierarchical Navigable Small World graph
RediSearch is a Redis module that adds vector similarity search (using HNSW) and full-text indexing directly into Redis, giving you a vector database that supports hybrid filtering — it fits the search as an open-source vector storage and retrieval solution, though it runs as a Redis plugin rather than as a standalone server.
CIn-Process Search EnginesSearch ModulesAggregation Pipelines
عرض على GitHub6,161

قواعد بيانات متجهة (Vector Databases)

vdaas/vald

semi-technologies/weaviate

oceanbase/oceanbase

qdrant/qdrant

pgvector/pgvector

weaviate/weaviate

microsoft/SPTAG

vespa-engine/vespa

chroma-core/chroma

milvus-io/milvus

activeloopai/deeplake

activeloopai/Hub

arangodb/arangodb

lancedb/lancedb

alibaba/zvec

surrealdb/surrealdb

typesense/typesense

RediSearch/RediSearch