4 个仓库
Horizontally scalable systems for managing large-scale vector embeddings with replication.
Distinct from Vector Memory Stores: Focuses on the distributed architectural nature rather than AI agent memory specifically
Explore 4 awesome GitHub repositories matching data & databases · Distributed Vector Stores. Refine with filters or upvote what's useful.
Weaviate is a cloud-native vector database and distributed vector store designed to save high-dimensional vectors alongside structured data. It functions as a hybrid search engine that combines vector similarity, keyword matching, and structured metadata filtering within a single query. The system is optimized for retrieval-augmented generation, integrating vector search with generative AI and reranking to power question-and-answer workflows. It distinguishes itself through the ability to merge semantic search with traditional keyword queries and structured metadata filters to improve result
Implements a horizontally scalable, replicated data system for managing large-scale vector embeddings.
DeepLake is AI data infrastructure consisting of a multimodal data lake, a hybrid search engine, and a serverless vector database. It provides a PostgreSQL-based AI data runtime that combines multimodal storage with streaming pipelines to load and shuffle datasets from cloud storage directly into deep learning training pipelines. The system utilizes lazy indexing to store and slice images, audio, and video without loading entire files into memory. It enables retrieval-augmented generation by persisting high-dimensional embeddings in a serverless vector store and implementing hybrid search tha
Provides a serverless vector database for storing high-dimensional embeddings to enable scalable retrieval for language models.
SPTAG 是一个向量近似最近邻搜索库和分布式向量搜索引擎。它提供了一个大规模向量索引,旨在通过高性能相似度搜索和邻近查询,从海量数据集中组织和检索相似向量。 该系统作为一个动态向量索引管理器,支持向量的增量更新、插入和删除,无需完全重建索引。它跨多台机器扩展搜索操作,通过分布式搜索请求处理来处理大规模数据集和高并发的在线请求。 该项目使用空间划分树和相对邻域图实现搜索和索引能力。它通过迭代图遍历和距离度量计算执行近似最近邻搜索,以定位距离查询点最近的向量。
Scales vector search operations across multiple machines to handle extremely large datasets and online requests.
该仓库是一个技术文档站点,也是实现网络、安全和云基础设施服务的指南和参考集合。它作为一个静态生成的门户和无头内容平台,将源文件与展示层分离,以实现灵活的渲染。 该项目利用存储在版本控制 Git 仓库中的 Markdown 文档。它提供专业的技术内容,包括用于构建代理和管理推理的 AI 平台文档、用于 DNS 和 CDN 配置的云基础设施指南、用于无服务器部署的边缘计算参考,以及用于零信任和防火墙管理的网络安全文档。
Provides globally distributed SQL and key-value stores for direct querying from serverless functions.