pg_textsearch is a full-text search integration for PostgreSQL that provides large-scale text indexing and BM25 relevance ranking. It implements a scalable indexing architecture that uses a memtable system to spill data to disk segments, allowing for the processing of massive datasets.
The project distinguishes itself through support for multilingual search via language-specific partial indexes and the ability to index complex expressions, such as JSONB fields or concatenated columns. It ensures high availability by utilizing PostgreSQL-native streaming replication and write-ahead logs to synchronize search data across primary and standby nodes.
The system covers a broad range of search capabilities, including document chunking for oversized text, parallel index construction, and top-k query optimization. It also manages partitioned data indexing by maintaining local statistics for accurate scoring and utilizes bitset-based tracking to prune deleted documents without requiring full index rebuilds.
The system includes internal inspection tools to dump index structures and summarize statistics for performance analysis and debugging.