Paradedb

ParadeDB is a database extension that integrates full-text search, vector database capabilities, and real-time analytics directly into a relational engine. It functions as a plugin that adds new storage and query execution capabilities to an existing database architecture.

The project distinguishes itself by supporting hybrid search workflows that combine lexical keyword matching with dense and sparse vector similarity in a single query. It utilizes reciprocal rank fusion to merge these ranked result sets and employs logical replication to synchronize data from external instances, removing the need for manual ETL pipelines.

The system covers broad capability areas including columnar-based indexing for high-performance aggregations and faceted search. It also includes features for search result highlighting, match offset location, and transactional consistency via multi-version concurrency control.

The software can be deployed using Docker containers or through cloud platforms such as Railway.

Features

Hybrid Retrieval - Combines vector similarity and full-text keyword matching in a single query to improve retrieval accuracy.

Columnar Analytics - Calculates high-performance aggregates, buckets, and facets using specialized columnar storage.

Columnar Storage Engines - Stores non-text fields in a columnar format optimized for analytical workloads and high-speed filtering.

Data Analytics Engines - Provides a high-performance computational backend for executing complex analytical queries over massive datasets.

Analytical Operation Optimizers - Provides high-performance count and group-by operations using specialized scan nodes to optimize analytical queries.

Full-Text Search Integrations - Integrates advanced text search, highlighting, and ranking capabilities directly into a Postgres database.

Data Replication - Synchronizes data from external Postgres instances via logical replication to ensure consistency and availability.

Logical Replication Ingestion - Ingests data changes from external database instances using logical replication protocols to eliminate manual ETL processes.

Full Text Search - Executes advanced full-text search using tokenization to find relevant documents within a relational engine.

Postgres-Based Engines - Functions as a storage engine built upon a PostgreSQL backend to add search and analytical capabilities.

Query Aggregates - Executes JSON aggregate queries using a columnar index to accelerate data counting and summarization.

Real-Time Analytics - Provides high-performance aggregate queries and columnar scans for immediate operational insights without external analytical tools.

Columnar-Inverted Hybrid Indexes - Implements indexing strategies that combine inverted indices for text search with columnar storage for analytical aggregations.

Vector Databases - Implements storage and querying of dense and sparse vector embeddings directly within the database for semantic search.

Vector Search - Finds similar items using dense and sparse vector embeddings for semantic retrieval and similarity matching.

Predicate Pushdown Joins - Implements predicate pushdown during join operations to reduce latency by utilizing indexes before accessing base tables.

Indexed Predicate Filtering - Accelerates filter execution by including non-text columns in the search index to enable indexed predicate filtering.

Field Aggregates - Computes fast field aggregates in parallel to analyze data distributions across indexed fields.

Asynchronous Segment Merging - Buffers incoming writes and merges index segments asynchronously in the background to maintain high ingestion throughput.

Predicate Pushdown - Filters data at the storage layer during index scans to reduce data movement and processing overhead.

Search Index Synchronizers - Maintains real-time consistency between the primary relational data and search indexes using multi-version concurrency control.

Search Index Pushdowns - Reduces processing overhead by pushing search predicates and aggregate functions into the index during joins.

Faceted Search Engines - Provides the backend capability to aggregate results into categories using single-query counts for faceted search.

Faceted Search Implementation - Enables the development of category-based filtering and data distribution summaries using single-query aggregates.

Search Ranking Algorithms - Calculates relevance scores based on indexed fields to prioritize the most pertinent search results.

Search Result Filtering - Uses standard database clauses and operators to narrow search results based on text and non-text fields.

Search Result Fusion Algorithms - Combines multiple ranked result sets into a single unified list using Reciprocal Rank Fusion.

Asynchronous Indexing - Increases ingestion speed by buffering writes and merging index segments asynchronously in the background.

High Availability Clustering - Uses read replicas and logical replication to maintain service uptime and distribute read loads across a cluster.

Data Synchronization and Consistency - Ensures atomic changes and consistent query results by synchronizing indexes with multi-version concurrency control.

Rank Fusion Algorithms - Merges multiple ranked result sets into a single unified list by normalizing and combining individual relevance scores.

Database Systems - Postgres-based search and analytics engine.

Extensions and Plugins - Extension enabling full-text search using modern ranking algorithms.

paradedbparadedb

Features

Star history