4 Repos
Storage systems that combine vector embeddings with structured and graph data.
Distinguishing note: Focuses on the integration of vectors into a multi-model database.
Explore 4 awesome GitHub repositories matching data & databases · Multi-Model Vector Storage. Refine with filters or upvote what's useful.
SurrealDB is a multi-model database engine designed to store and query document, graph, relational, and vector data within a single ACID-compliant platform. It functions as an AI-native data store, integrating vector search, graph traversal, and machine learning model execution directly into its query layer. By providing a unified declarative query language, the platform eliminates the need for external middleware to synchronize data across different storage models. The platform distinguishes itself through its ability to manage agent memory and complex workflows natively. It allows developer
Keeps vector embeddings alongside structured data and graph relationships within a single database to simplify data management.
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
Provides low-latency storage that combines vectors, text, and numeric data to serve as model inputs.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
Natively handles typed multi-dimensional vectors from simple arrays to multi-vector embeddings for multimodal AI pipelines.
Chonkie ist eine Text-Chunking-Bibliothek, die für Retrieval-Augmented-Generation-Pipelines (RAG) konzipiert wurde. Sie fungiert als semantischer Text-Splitter und RAG-Ingestion-Pipeline und transformiert Rohtext in eingebettete Segmente für die Speicherung in Vektordatenbanken. Das Projekt zeichnet sich durch spezialisierte Splitting-Strategien aus, einschließlich eines AST-basierten Code-Splitters zur Bewahrung logischer Grenzen im Quellcode und eines semantischen Text-Splitters, der Embedding-Modelle verwendet, um Grenzen basierend auf der Bedeutung zu bestimmen. Es bietet zudem einen Vektordatenbank-Ingestor, um die Generierung von Embeddings und deren Export in verschiedene Speicher zu automatisieren. Die Bibliothek deckt ein breites Spektrum an Funktionen ab, einschließlich Dokumenten-Parsing via OCR und Markdown-Extraktion, einer Vielzahl von Splitting-Methoden wie Token-Count und hierarchische Segmentierung sowie Workflow-Orchestrierung durch wiederverwendbare Pipelines. Sie unterstützt eine breite Palette an Vektorspeicher-Integrationen, einschließlich Qdrant, Milvus, Weaviate und Elasticsearch, sowie den Datenexport in JSON- und Hugging-Face-Datensätze. Nutzer können diese Operationen über eine Kommandozeilenschnittstelle ausführen oder das System als containerisierten API-Dienst bereitstellen.
Automatically selects and instantiates embedding providers based on model names through a registered handler system.