30 open-source projects similar to quickwit-oss/tantivy, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Tantivy alternative.
Bleve is a search indexing engine library written in Go, designed to provide full-text search and document retrieval capabilities for embedded application data. It functions as a framework for indexing structured or unstructured information, allowing developers to build searchable collections that support complex query logic and data analysis. The engine distinguishes itself through a pluggable analysis pipeline that normalizes text before indexing, alongside support for vector similarity search to identify semantically related content. It utilizes finite-state transducer automata for efficie
Zincsearch is a high-performance, self-hosted full-text search engine and database written in Go. It provides a lightweight infrastructure for indexing and searching unstructured text data, specializing in log and event analysis through a schemaless indexing model. The system is designed as a resource-efficient alternative to heavier search infrastructure, featuring an API surface compatible with Elasticsearch for indexing and querying documents. It distinguishes itself by packaging the entire server and its built-in web search interface into a single statically linked binary. The engine cov
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
lunr.js is a JavaScript full-text search library and client-side search engine. It creates in-memory search indexes for fast keyword retrieval and ranked document matching within browser or Node.js environments. The library utilizes a JSON serializable search index, allowing the search structure to be converted to and from JSON for storage and distribution of pre-built search data. This enables search functionality for static websites by indexing content into portable files. The system supports advanced querying capabilities, including fuzzy text matching to account for typos, field-scoped i
Elasticsearch is a distributed search engine and NoSQL document store designed for full-text search and real-time data retrieval. It functions as a RESTful data indexer and vector database, allowing for the storage and management of structured JSON documents across multiple nodes. The system distinguishes itself through its ability to serve as a log analytics platform for monitoring system health and security events. It incorporates vector search implementation using mathematical embeddings to support generative AI and augmented generation applications. The platform covers a broad range of c
Zinc is a high-performance full-text search engine written in Go. It provides a schema-less document index that organizes arbitrary datasets into searchable structures without requiring a predefined data format. The engine features an API compatible with Elasticsearch for indexing and querying data, which facilitates the ingestion of single and bulk records. It is designed as an in-process search engine that embeds indexing and retrieval logic within a single binary to operate with minimal system resource overhead. The system includes a built-in web-based management interface for executing s
Apache Druid is a real-time OLAP database and distributed analytics engine. It functions as a columnar time-series database designed for high-performance analytical queries and the real-time ingestion of streaming and batch datasets. The system provides a framework for high-concurrency analytics, allowing multiple simultaneous users to execute SQL and native queries across large-scale data. It supports mixed data ingestion, combining real-time streaming and batch loading into a single system for unified analysis. The platform includes capabilities for distributed cluster management, enabling
Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.
Druid is a distributed columnar store and online analytical processing database designed for real-time analytics. It functions as a SQL analytics platform and a streaming data ingestion engine, allowing for the analysis of large datasets with low latency to support interactive dashboards and high-concurrency operational workloads. The system integrates a streaming data ingestion engine that loads information via batch or streaming processes to enable immediate analysis of arriving data. It provides high-performance analytical processing to execute slice-and-dice queries on massive data volume
This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure. The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It dis
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
OpenSearch is a distributed search and analytics engine designed for indexing, searching, and analyzing massive volumes of structured and unstructured data in real time. It functions as a comprehensive platform that integrates enterprise-grade search capabilities, a vector database for high-dimensional similarity lookups, and a unified observability suite for monitoring logs, metrics, and traces across complex distributed environments. The platform distinguishes itself through its support for agentic workflow automation, allowing users to orchestrate multi-agent tasks and integrate foundation
This project is a privacy-focused, self-hosted metasearch engine that aggregates results from a wide array of web, academic, and media sources into a single, unified interface. By acting as a proxy between the user and external search providers, it strips identifying headers and tracking parameters from requests, ensuring that search activity remains anonymous and protected from third-party profiling. The platform distinguishes itself through a modular, plugin-based architecture that allows for extensive customization of search behavior, result filtering, and interface branding. It supports a
Wagtail is an open-source content management system built on the Django web framework. It provides a structured, tree-based approach to content modeling, allowing developers to define custom page types and reusable content components that are managed through a highly customizable administrative interface. The platform distinguishes itself through its flexible, block-based content composition system, which enables editors to assemble complex page layouts dynamically. It also offers robust support for multi-site and multi-lingual environments, allowing organizations to manage distinct websites
Quickwit is a cloud-native, distributed search engine designed for observability data such as logs, traces, and metrics. It functions as an observability backend that decouples compute from storage by persisting indices directly in S3-compatible cloud object stores. The system is distinguished by its compatibility with the Elasticsearch REST API, allowing it to integrate with existing clients and log shippers without reconfiguration. It also serves as an OpenTelemetry data indexer, ingesting technical data via the OpenTelemetry Protocol using gRPC and HTTP. The engine utilizes a hybrid of co
Manticoresearch is a high-performance search engine and database designed for indexing and retrieving large datasets. It functions as a full-text search engine, a vector search database, and a SQL-based search database, providing a distributed search cluster architecture. The system provides an alternative to the Elasticsearch stack, offering a compatible API for indexing and searching structured and unstructured data. It distinguishes itself by supporting multiple retrieval methods, including vector matching for similarity search, geospatial queries, and traditional full-text ranking. The p
Orama is a search engine and vector database that provides full-text indexing, geospatial calculations, and semantic vector storage. It functions as an LLM retrieval engine designed to provide grounded context to language models for conversational interfaces. The project implements hybrid search by combining dense vector embeddings with inverted keyword indices to retrieve documents based on both semantic meaning and exact text matches. It utilizes a WebAssembly module to execute search logic across different JavaScript environments and platforms. The system covers a broad range of retrieval
Sonic is a high-performance, lightweight search backend designed to provide real-time full-text search and autocomplete capabilities for applications. It functions as a persistent indexing server that maps text terms to object identifiers, allowing developers to integrate rapid search functionality without storing raw document content directly within the search engine. The system distinguishes itself through a specialized graph-based index that enables real-time word prediction and typo correction. Communication is handled via a custom, low-latency binary protocol over raw TCP sockets, which
OpenGrok is a Java-based source code search engine and indexer designed to process large source trees and binaries into a searchable index. It functions as a version control browser, allowing for the exploration and searching of revision histories integrated with version control systems. The system provides symbol-based cross-referencing to link code definitions and usages, enabling navigation across a codebase. It utilizes an inverted-index search engine to perform full-text retrieval of source code. The application supports periodic source synchronization and reindexing to keep local data
NewsBlur is an RSS feed aggregator and social news reader that collects and organizes stories from feeds, newsletters, and websites into a single interface. It functions as a feed synchronization service that maintains read states and subscription data across multiple devices and third-party applications. The platform distinguishes itself with AI-powered content summarization to generate briefings and answer questions about articles, alongside a system for training content classifiers. These classifiers learn user preferences for authors and tags to automatically highlight preferred topics or
RediSearch is a Redis module that adds secondary indexing, full-text search, aggregation, and vector similarity search directly into the in-memory data store. It operates as an in-process search engine, extending the core key-value store with capabilities for indexing hash and JSON documents, enabling fast field-level lookups beyond primary key access. The module provides a full-text search engine built on inverted indexes, supporting stemming, fuzzy matching, and relevance scoring via tf-idf. It also includes a vector similarity search engine using a Hierarchical Navigable Small World graph
This project provides a system for managing agent context and session memory, featuring an agent context compactor, an AI session memory manager, and a tool output sandbox. It functions as a middleware layer and server extension for the Model Context Protocol to optimize context windows and reduce token usage. The system optimizes agent performance by sandboxing tool outputs and externalizing large data sets, replacing raw I/O with pointers and concise summaries. It employs a persistent knowledge base that indexes session history and tool outputs for retrieval via full-text search, ensuring s
GoldenDict-ng is a multi-source dictionary application and offline dictionary reader that enables users to search for word definitions across local files, DICT servers, and web sources in a single interface. It functions as a web-based definition browser, rendering entries using a browser engine to support HTML, CSS, and JavaScript for rich content presentation. The project distinguishes itself by integrating with Anki flashcard systems to facilitate language learning workflows and offering specialized translation tools that support clipboard monitoring and character set conversion. It also p
dn is a self-hosted personal web archiving system that automatically intercepts and stores web pages on a local device. It uses a proxy-based request interception model to capture browser traffic and save content for offline access without an internet connection. The system features a local full-text search engine that indexes all saved page content for information retrieval across the collection. It includes a dedicated browser interface that simulates online connectivity to serve archived files, mimicking the original live web environment. Administrative control is provided through a web-b
Mailpile is an encrypted email client and high-volume mail indexer that provides a web-based portal for managing electronic mail. It functions as a private email management system designed to protect user privacy and data control. The project features a search engine optimized for indexing and retrieving large volumes of email on consumer hardware. It includes a Bayesian email filter for automated message classification and tagging. The system supports secure communication through the integration of public-key encryption and signing for sending and receiving messages. Additional capabilities
TagSpaces is an offline-first file tagging and organization platform that lets you manage local files with portable metadata stored directly in filenames or sidecar JSON files, eliminating the need for a central database. It functions as a full-text file search engine, a Kanban board file organizer, a local AI file assistant, an S3-compatible cloud file manager, and a web clipper and bookmark manager, all within a single application. The project distinguishes itself through a local-first architecture where all file operations, indexing, and AI processing run entirely on the device, with cloud
WCDB is a cross-platform storage layer and embedded database engine that serves as a framework for SQLite. It functions as an object relational mapper, linking application classes to database tables to enable data operations via objects rather than raw queries. The project is distinguished by an integrated encryption layer for securing data at rest and a full-text search engine that uses language-specific tokenizers for text lookups. It also features transparent field compression to reduce storage footprints and a connection-pooling model to coordinate simultaneous read and write operations a
Karakeep is a self-hosted, open-source platform designed for personal knowledge management and web content archiving. It functions as a centralized repository where users can capture, organize, and preserve bookmarks, notes, and media files, ensuring long-term access to digital information even if original sources are removed or modified. The system distinguishes itself through its automated content processing and security-focused architecture. It utilizes headless browser crawling and optical character recognition to ingest and index web content, while a modular artificial intelligence pipel
This project is a comprehensive Java backend engineering guide and technical reference focused on high-concurrency design, distributed systems, and microservices architecture. It provides detailed strategies for decomposing monolithic applications, managing service discovery, and implementing the architectural patterns required for scalable backend environments. The repository distinguishes itself through an extensive collection of big data algorithmic references and database scaling strategies. It covers memory-efficient techniques for analyzing massive datasets, such as Top-K element extrac
Cozo is a logic-based database engine that functions as a relational data store, an embedded graph database, and a temporal vector database. It utilizes a Datalog-inspired query language to execute relational, recursive, and graph queries. The system distinguishes itself through specialized indexing for high-dimensional vector similarity searches and near-duplicate detection using locality sensitive hashing. It also provides built-in temporal versioning, allowing for historical state retrieval and time-travel queries to access data as it existed at specific points in time. Its broader capabi