4 repositorios
Storage systems that combine vector embeddings with structured and graph data.
Distinguishing note: Focuses on the integration of vectors into a multi-model database.
Explore 4 awesome GitHub repositories matching data & databases · Multi-Model Vector Storage. Refine with filters or upvote what's useful.
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
Keeps vector embeddings alongside structured data and graph relationships within a single database to simplify data management.
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
Provides low-latency storage that combines vectors, text, and numeric data to serve as model inputs.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
Natively handles typed multi-dimensional vectors from simple arrays to multi-vector embeddings for multimodal AI pipelines.
Chonkie es una librería de fragmentación de texto (chunking) diseñada para pipelines de generación aumentada por recuperación (RAG). Funciona como un divisor de texto semántico y pipeline de ingesta RAG, transformando texto sin procesar en segmentos incrustados para su almacenamiento en bases de datos vectoriales. El proyecto se distingue por estrategias de división especializadas, incluyendo un divisor de código basado en AST para preservar límites lógicos en el código fuente y un divisor de texto semántico que utiliza modelos de embedding para determinar límites basados en el significado. También proporciona un ingestor de bases de datos vectoriales para automatizar la generación de embeddings y su exportación a varios almacenes. La librería cubre una amplia gama de capacidades, incluyendo el análisis de documentos mediante OCR y extracción de markdown, una variedad de métodos de división como conteo de tokens y segmentación jerárquica, y orquestación de flujos de trabajo a través de pipelines reutilizables. Admite una amplia gama de integraciones de almacenes vectoriales, incluyendo Qdrant, Milvus, Weaviate y Elasticsearch, así como la exportación de datos a JSON y datasets de Hugging Face. Los usuarios pueden ejecutar estas operaciones a través de una interfaz de línea de comandos o desplegar el sistema como un servicio API contenerizado.
Automatically selects and instantiates embedding providers based on model names through a registered handler system.