30 open-source projects similar to elasticsearch/elasticsearch, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Elasticsearch alternative.
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
RedisInsight is a graphical user interface and management tool for browsing, analyzing, and administering Redis databases. It provides a visual environment for exploring key-value data structures, managing database instances, and performing data analysis across different operating systems and deployments. The tool distinguishes itself by providing dedicated visual managers for complex operations, including a vector database manager for configuring embeddings and similarity searches, a query workbench for executing raw commands and Lua scripts, and a performance monitoring dashboard for tracki
RediSearch is a Redis module that adds secondary indexing, full-text search, aggregation, and vector similarity search directly into the in-memory data store. It operates as an in-process search engine, extending the core key-value store with capabilities for indexing hash and JSON documents, enabling fast field-level lookups beyond primary key access. The module provides a full-text search engine built on inverted indexes, supporting stemming, fuzzy matching, and relevance scoring via tf-idf. It also includes a vector similarity search engine using a Hierarchical Navigable Small World graph
Vespa is a distributed search engine, vector database, and machine learning ranking engine. It serves as an AI search platform designed to handle large-scale document indexing and complex query processing across a cluster of nodes, combining keyword retrieval with high-dimensional embedding storage for semantic similarity search. The platform distinguishes itself by integrating machine learning models directly into the search pipeline to perform real-time inference and ranking. It converts these models into ranking expressions to score and order results based on relevance, while providing a s
CouchDB is a NoSQL document database that stores data as flexible documents and exposes a RESTful API for data management over HTTP. It functions as a distributed document store, synchronizing and replicating data across multiple nodes to ensure high availability and consistency. The system includes a full-text search engine that transforms database records into queryable documents, supporting sorting and pagination. Data synchronization is handled via multi-master replication, which exchanges revision histories to maintain consistency across distributed nodes. The database utilizes multi-ve
Redis is a high-performance in-memory key-value store that functions as a distributed cache, message broker, and NoSQL database. It provides sub-millisecond read and write access to data stored in RAM and can operate as a vector database for indexing high-dimensional embeddings. The system supports a wide range of data storage and synchronization primitives, including the management of strings, hashes, lists, sets, and JSON documents. It enables real-time data operations through atomic transactions, hybrid persistence using snapshots and append-only logs, and high-availability configurations
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
OpenGrok is a Java-based source code search engine and indexer designed to process large source trees and binaries into a searchable index. It functions as a version control browser, allowing for the exploration and searching of revision histories integrated with version control systems. The system provides symbol-based cross-referencing to link code definitions and usages, enabling navigation across a codebase. It utilizes an inverted-index search engine to perform full-text retrieval of source code. The application supports periodic source synchronization and reindexing to keep local data
Nebula is a distributed graph database designed for storing and querying massive volumes of interconnected vertices and edges across a horizontally scalable cluster. It functions as a Kubernetes-native database and a distributed graph analytics engine, utilizing a Raft-based distributed store to ensure strong consistency and high availability. The system features an OpenCypher query engine for performing complex graph traversals and pattern matching. It distinguishes itself with a decoupled compute-storage architecture and a shared-nothing distributed design, allowing query processing and dat
lunr.js is a JavaScript full-text search library and client-side search engine. It creates in-memory search indexes for fast keyword retrieval and ranked document matching within browser or Node.js environments. The library utilizes a JSON serializable search index, allowing the search structure to be converted to and from JSON for storage and distribution of pre-built search data. This enables search functionality for static websites by indexing content into portable files. The system supports advanced querying capabilities, including fuzzy text matching to account for typos, field-scoped i
Orama is a search engine and vector database that provides full-text indexing, geospatial calculations, and semantic vector storage. It functions as an LLM retrieval engine designed to provide grounded context to language models for conversational interfaces. The project implements hybrid search by combining dense vector embeddings with inverted keyword indices to retrieve documents based on both semantic meaning and exact text matches. It utilizes a WebAssembly module to execute search logic across different JavaScript environments and platforms. The system covers a broad range of retrieval
Chroma is a specialized vector database designed to index and retrieve high-dimensional data representations for semantic similarity search. It functions as a comprehensive platform for information retrieval, enabling the storage and management of unstructured documents alongside structured metadata. By mapping data into numerical representations, the system facilitates rapid similarity lookups across large datasets. The platform distinguishes itself through a hybrid search infrastructure that combines dense vector embeddings with sparse keyword and regular expression matching to balance sema
Garnet is a multi-threaded in-memory database and distributed key-value store. It functions as a high-performance remote cache store that implements the RESP wire protocol to maintain compatibility with existing Redis clients and libraries. The project is distinguished by a shared-memory architecture that enables parallel request processing across multiple cores for sub-millisecond latency. It features a tiered storage system that automatically offloads colder data from system memory to SSD or cloud storage layers, and includes a specialized vector search database for high-dimensional similar
LanceDB is a vector database and columnar data store designed to function as a versioned dataset manager and vector search engine. It serves as a high-performance backend for indexing and retrieving high-dimensional embeddings, providing the foundation for machine learning data pipelines. The system distinguishes itself through a combination of cloud-native object storage and immutable version tracking, allowing for data time-travel and reproducible AI experiments. It integrates hybrid search capabilities, merging dense vector similarity with BM25 full-text search and SQL-like scalar filters
Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintaining global cluster state through a distributed consensus mechanism. The platform distinguishes itself through its integrated approach to observability, security, and advanced analytics. It combines full-text, vector, and hybrid search capabilities with machine learning-driven insi
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries. What distinguishes Dragonfly is its focus on effic
Tantivy is a library for building full-text search engines and indexing frameworks. It provides the core components necessary to organize large collections of text data into searchable structures, enabling the execution of complex queries and the retrieval of information across structured document sets. The engine utilizes an inverted index architecture to map terms to document identifiers, supported by a segment-based storage model that balances search performance with write throughput. It incorporates specialized data structures, including finite state transducers for term dictionaries and
This project is a multi-model database system designed to store and manage information as documents, graphs, and key-value pairs within a single engine. It functions as a graph database and knowledge graph platform, providing the infrastructure to build, query, and visualize structured data models. By integrating vector search capabilities, the system serves as a vector database that supports retrieval-augmented generation for artificial intelligence applications. The platform distinguishes itself through a unified query language that allows users to perform document lookups, graph traversals
OceanBase is a distributed SQL database designed for high availability and strong consistency across multiple nodes and regions. It functions as a hybrid transactional and analytical processing engine, allowing real-time analytics and transactions to execute on a single data copy. The system also serves as a vector database engine for indexing and querying vector data to power semantic search and recommendation systems. The platform features native compatibility layers for MySQL and Oracle, enabling the migration of legacy workloads without rewriting SQL code. It utilizes a Paxos-based distri
This project is an open source relational database management system and SQL database designed for storing and managing structured data. It functions as a relational database for ensuring consistency and reliability, while also operating as a vector database for storing and querying high-dimensional vector embeddings. The system incorporates a columnar storage engine to optimize analytical query processing and large-scale data aggregation. It further enables vector similarity search, allowing users to find similar items by querying vector embeddings. The software covers a broad capability su
This project is a reference implementation providing a collection of practical examples for data access patterns and repository abstractions within the Spring Data ecosystem. It serves as a comprehensive showcase for implementing consistent data layers across various relational and non-relational databases. The repository specifically demonstrates multi-store persistence by integrating relational, document, and vector databases within a single application. It includes implementations for vector search to manage high-dimensional embeddings and similarity searches across different database tech
Apache Druid is a real-time OLAP database and distributed analytics engine. It functions as a columnar time-series database designed for high-performance analytical queries and the real-time ingestion of streaming and batch datasets. The system provides a framework for high-concurrency analytics, allowing multiple simultaneous users to execute SQL and native queries across large-scale data. It supports mixed data ingestion, combining real-time streaming and batch loading into a single system for unified analysis. The platform includes capabilities for distributed cluster management, enabling
langchaingo is an LLM application framework for Go designed for building language model-powered applications and autonomous agents. It serves as an orchestration library and tool integration framework that allows developers to link prompt sequences and model calls into complex, multi-step workflows. The project provides a toolkit for implementing retrieval-augmented generation pipelines by processing unstructured documents and retrieving relevant context via vector search. It includes a dedicated integration layer for indexing high-dimensional embeddings and performing similarity searches acr
Databend is a cloud-native data warehouse and OLAP database designed for large-scale analytics. It functions as a SQL-compliant engine and serverless analytics platform that separates compute from storage to allow for independent scaling. The system integrates vector database capabilities, indexing high-dimensional embeddings to enable semantic, hybrid, and full-text searches across massive datasets. It further distinguishes itself through serverless compute management that automatically scales resources based on demand and shuts them down during idle periods. The platform covers a broad set
AgentMemory is a persistent knowledge store and memory server designed to provide AI coding agents with long-term memory. It functions as a knowledge graph engine and vector database store that saves and recalls project context, architectural decisions, and patterns across different sessions. The system distinguishes itself by using a tiered-memory consolidation pipeline that compresses raw observations into episodic, semantic, and procedural layers to optimize token usage. It employs a hybrid retrieval strategy combining keyword matching, vector embeddings, and graph traversal to surface rel
m3 is a distributed time series database designed for high-resolution metrics and high-cardinality data management. It functions as a scalable storage system and a multi-cluster query engine, providing a distributed metrics aggregator capable of downsampling and summarizing data before it is committed to storage. The project distinguishes itself through a coordinated cluster model using etcd for node membership and shard placement. It supports multiple ingestion protocols, including the Prometheus remote write protocol, InfluxDB line protocol, and Graphite Carbon plaintext protocol, and provi
TurboVec is a high-performance Rust vector database and quantized search index designed for storing and retrieving high-dimensional embeddings. It functions as a pluggable vector store for large language model orchestration frameworks, providing a memory-efficient alternative to standard in-memory storage. The project distinguishes itself through a high-dimensional vector compressor that utilizes random rotation and data-oblivious scalar quantization to reduce memory footprints. Retrieval is accelerated via SIMD kernels that process distance calculations and search operations for increased th
This project is a software development kit and cluster management tool for PHP. It serves as a full-text search SDK and vector search interface, enabling applications to perform lexical, fuzzy, and semantic searches against indexed data. The library implements a PSR 7 HTTP client to ensure cross-environment compatibility through standardized messaging interfaces. It provides a specialized interface for retrieving embeddings and performing semantic retrieval workflows using vector data. Its capability surface covers a wide range of administrative and operational tasks, including search index