30 open-source projects similar to activeloopai/deeplake, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Deeplake alternative.
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
zvec is an embedded vector database engine and indexing library designed for high-dimensional similarity search. It functions as a hybrid search engine and a retrieval-augmented generation knowledge base, allowing for the storage and retrieval of dense and sparse vectors. The system is distinguished by its hybrid retrieval pipeline, which fuses vector similarity, full-text keyword matching, and scalar metadata filtering into single query operations. It supports a plugin-based model integration system for registering custom embedding models and rerankers, as well as language bindings for nativ
Hub is a multimodal AI data lake and vector database designed for storing and querying embeddings, text, audio, and images. It functions as a dataset version control system and a machine learning data streaming engine to support large-scale model training. The system utilizes a serverless PostgreSQL vector store to index high-dimensional embeddings for semantic search. It provides a visual interface for inspecting multimodal datasets and viewing annotations such as bounding boxes and masks. The platform handles cloud-agnostic storage synchronization and implements lazy, compressed data strea
Lance is a versioned columnar data format and storage engine designed as a multimodal AI lakehouse. It serves as a vector database storage engine and a cloud object store dataset manager, organizing images, video, audio, and embeddings into a unified format optimized for machine learning workflows. The project distinguishes itself by combining a columnar layout for structured data with a specialized blob store for large multimodal tensors. It implements a hybrid search engine that integrates vector similarity search, full-text search, and SQL analytics on a single dataset, supported by a stor
Infinity is a distributed vector database and multimodal vector store designed to manage large-scale datasets for retrieval and similarity search. It serves as a backend for large language model applications and retrieval augmented generation pipelines by storing and retrieving dense vectors, sparse vectors, and full-text data. The system functions as a hybrid search engine, combining vector embeddings and full-text search with reranking algorithms to identify the most relevant documents. It supports multimodal data storage, allowing the maintenance of diverse data types including tensors, st
RedisInsight is a graphical user interface and management tool for browsing, analyzing, and administering Redis databases. It provides a visual environment for exploring key-value data structures, managing database instances, and performing data analysis across different operating systems and deployments. The tool distinguishes itself by providing dedicated visual managers for complex operations, including a vector database manager for configuring embeddings and similarity searches, a query workbench for executing raw commands and Lua scripts, and a performance monitoring dashboard for tracki
ParadeDB is a database extension that integrates full-text search, vector database capabilities, and real-time analytics directly into a relational engine. It functions as a plugin that adds new storage and query execution capabilities to an existing database architecture. The project distinguishes itself by supporting hybrid search workflows that combine lexical keyword matching with dense and sparse vector similarity in a single query. It utilizes reciprocal rank fusion to merge these ranked result sets and employs logical replication to synchronize data from external instances, removing th
Weaviate is a cloud-native vector database and distributed vector store designed to save high-dimensional vectors alongside structured data. It functions as a hybrid search engine that combines vector similarity, keyword matching, and structured metadata filtering within a single query. The system is optimized for retrieval-augmented generation, integrating vector search with generative AI and reranking to power question-and-answer workflows. It distinguishes itself through the ability to merge semantic search with traditional keyword queries and structured metadata filters to improve result
Orama is a search engine and vector database that provides full-text indexing, geospatial calculations, and semantic vector storage. It functions as an LLM retrieval engine designed to provide grounded context to language models for conversational interfaces. The project implements hybrid search by combining dense vector embeddings with inverted keyword indices to retrieve documents based on both semantic meaning and exact text matches. It utilizes a WebAssembly module to execute search logic across different JavaScript environments and platforms. The system covers a broad range of retrieval
Verba is a retrieval-augmented generation interface and chatbot that uses Weaviate to provide factual answers based on private datasets. It functions as a vector database knowledge base, combining a hybrid search engine with an orchestration interface to connect various large language model providers and embedding services. The system differentiates itself through a RAG pipeline manager for adjusting text chunking rules and retrieval settings, alongside a 3D vector space visualization tool for analyzing the spatial organization and clustering of high-dimensional embeddings. It employs a modul
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
llmware is a Python framework for AI agent orchestration and model management, designed to coordinate multi-model workflows and autonomous agents. It provides a unified model catalog and standardized interface to execute specialized language models for complex research, analysis, and structured data generation. The project distinguishes itself through its heavy emphasis on local execution and quantized inference, allowing models to run on private infrastructure using CPU, GPU, and NPU acceleration via runtimes like ONNX and OpenVino. It features a specialized ability to translate natural lang
sqlite-vec is a C-based vector library and SQLite extension that adds virtual tables for storing and querying high-dimensional embeddings. It functions as a database plugin for performing nearest neighbor searches using distance metrics such as L2, cosine, and Hamming distance. The project provides a portable embedding store that supports deployment across Android, iOS, desktop environments, and web browsers via WebAssembly. It distinguishes itself by converting numerical arrays into compact binary formats and utilizing quantization to reduce the memory footprint and storage size of vector in
LibSQL is a high-performance, distributed SQL database engine that extends SQLite to support remote network access, edge computing, and real-time synchronization. It functions as an embedded database library that integrates directly into application processes while providing the infrastructure to maintain consistency across multiple geographic regions. The platform distinguishes itself by enabling database interaction over standard HTTP protocols, allowing applications to query remote data sources in serverless and edge environments without requiring local filesystem access. It includes nativ
Weaviate is an AI-native vector database designed to store and index high-dimensional vector embeddings alongside traditional data objects. It serves as a backend infrastructure for retrieval-augmented generation, enabling applications to ground language model responses in private, context-aware data. The platform distinguishes itself by combining vector similarity search with traditional keyword filtering through a hybrid storage architecture. It integrates directly with external machine learning models to automate the generation of embeddings and perform complex inference tasks during inges
Cozo is a logic-based database engine that functions as a relational data store, an embedded graph database, and a temporal vector database. It utilizes a Datalog-inspired query language to execute relational, recursive, and graph queries. The system distinguishes itself through specialized indexing for high-dimensional vector similarity searches and near-duplicate detection using locality sensitive hashing. It also provides built-in temporal versioning, allowing for historical state retrieval and time-travel queries to access data as it existed at specific points in time. Its broader capabi
Lance is a columnar data format and storage layer designed for high-performance random access and the persistence of multimodal data. It functions as a vector database storage system, a multimodal data store, and a versioned dataset manager. The project distinguishes itself as a hybrid search engine that combines vector similarity search and full-text indexing on a single dataset. It provides unified storage for diverse data types including images, audio, and video, utilizing a system that lazy-loads large binary objects only when requested. The system manages dataset evolution through schem
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
USearch is a high-performance vector similarity search engine and approximate nearest neighbor index designed for dense embeddings. It functions as a low-level vector database core and high-dimensional vector indexer, providing the primitives necessary to store and retrieve vectors across massive datasets. The engine distinguishes itself through hardware-level SIMD acceleration for distance kernels and a proximity-graph indexing system that enables fast retrieval across billions of vectors. It supports multi-precision vector quantization to balance memory usage and accuracy, and utilizes memo
This project is a full text search engine and enterprise search infrastructure designed for indexing and retrieving large sets of documents. It provides a comprehensive framework for information discovery using ranked results and linguistic analysis. The system integrates high-dimensional vector similarity search for semantic retrieval alongside traditional full-text capabilities. It distinguishes itself through support for geospatial data retrieval, multilingual text processing, and a search suggestion workflow that includes typo-tolerant query completion and spellchecking. The platform cov
This project is a software development kit and cluster management tool for PHP. It serves as a full-text search SDK and vector search interface, enabling applications to perform lexical, fuzzy, and semantic searches against indexed data. The library implements a PSR 7 HTTP client to ensure cross-environment compatibility through standardized messaging interfaces. It provides a specialized interface for retrieving embeddings and performing semantic retrieval workflows using vector data. Its capability surface covers a wide range of administrative and operational tasks, including search index
This project is a containerized local AI infrastructure stack designed to deploy large language models and vector databases on private hardware. It functions as an orchestration platform that combines AI runners, knowledge graphs, and a visual workflow builder for creating agentic chatflows and automating tasks via tool integration. The platform distinguishes itself through a low-code approach to agent orchestration, utilizing a visual interface to design complex sequences and connect agents to external tools and search engines. It includes a dedicated local observability stack to track promp
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
AgentMemory is a persistent knowledge store and memory server designed to provide AI coding agents with long-term memory. It functions as a knowledge graph engine and vector database store that saves and recalls project context, architectural decisions, and patterns across different sessions. The system distinguishes itself by using a tiered-memory consolidation pipeline that compresses raw observations into episodic, semantic, and procedural layers to optimize token usage. It employs a hybrid retrieval strategy combining keyword matching, vector embeddings, and graph traversal to surface rel
RediSearch is a Redis module that adds secondary indexing, full-text search, aggregation, and vector similarity search directly into the in-memory data store. It operates as an in-process search engine, extending the core key-value store with capabilities for indexing hash and JSON documents, enabling fast field-level lookups beyond primary key access. The module provides a full-text search engine built on inverted indexes, supporting stemming, fuzzy matching, and relevance scoring via tf-idf. It also includes a vector similarity search engine using a Hierarchical Navigable Small World graph
WeKnora is a multi-tenant retrieval-augmented generation (RAG) knowledge platform and autonomous AI agent framework. It transforms raw documents into queryable knowledge bases and integrates large language models with vector databases to provide grounded AI responses. The system also functions as a Model Context Protocol (MCP) tool server, exposing knowledge search and agentic capabilities to external AI clients. The platform distinguishes itself through an autonomous agent framework that utilizes iterative reasoning, tool calling, and web search to solve multi-step tasks. It implements a sta
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
Claude-context is a retrieval-augmented generation pipeline and semantic code search tool. It functions as an LLM codebase indexer and RAG context provider, designed to index local directories and retrieve relevant code files to provide context for large language models. The system operates as a hybrid search engine that combines keyword matching with dense vector search. This allows for the retrieval of code snippets and logic using natural language queries based on meaning rather than exact text matches. The project covers codebase indexing and search index management, utilizing asynchrono