# timescale/pg_textsearch

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/timescale-pg-textsearch).**

3,118 stars · 85 forks · C · postgresql

## Links

- GitHub: https://github.com/timescale/pg_textsearch
- awesome-repositories: https://awesome-repositories.com/repository/timescale-pg-textsearch.md

## Topics

`bm25` `c-extension` `full-text-search` `postgresql`

## Description

pg_textsearch is a full-text search integration for PostgreSQL that provides large-scale text indexing and BM25 relevance ranking. It implements a scalable indexing architecture that uses a memtable system to spill data to disk segments, allowing for the processing of massive datasets.

The project distinguishes itself through support for multilingual search via language-specific partial indexes and the ability to index complex expressions, such as JSONB fields or concatenated columns. It ensures high availability by utilizing PostgreSQL-native streaming replication and write-ahead logs to synchronize search data across primary and standby nodes.

The system covers a broad range of search capabilities, including document chunking for oversized text, parallel index construction, and top-k query optimization. It also manages partitioned data indexing by maintaining local statistics for accurate scoring and utilizes bitset-based tracking to prune deleted documents without requiring full index rebuilds.

The system includes internal inspection tools to dump index structures and summarize statistics for performance analysis and debugging.

## Tags

### Data & Databases

- [Full Text Search](https://awesome-repositories.com/f/data-databases/full-text-search.md) — Provides high-performance full-text search integration for PostgreSQL with BM25 relevance ranking.
- [Segment Spilling](https://awesome-repositories.com/f/data-databases/ordered-data-structures/skiplists/skiplist-memtables/segment-spilling.md) — Uses a memtable system to buffer index updates and spill them to immutable disk segments to handle massive datasets.
- [BM25 Full-Text Indices](https://awesome-repositories.com/f/data-databases/search-indexing-engines/bm25-full-text-indices.md) — Implements specialized indices using the BM25 ranking function for efficient keyword-based retrieval from text columns. ([source](https://github.com/timescale/pg_textsearch/blob/main/README.md))
- [Memtable-Based Segment Spilling](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/memtable-based-segment-spilling.md) — Builds scalable indices using a memtable architecture that spills to disk segments to handle massive datasets. ([source](https://github.com/timescale/pg_textsearch/blob/main/CLAUDE.md))
- [Log-Structured Merge-Trees](https://awesome-repositories.com/f/data-databases/storage-engines/b-tree/log-structured-merge-trees.md) — Implements a log-structured merge architecture to combine index segments, reducing scan overhead and improving query performance.
- [Large Scale Indexing](https://awesome-repositories.com/f/data-databases/storage-scaling/index-scaling/large-scale-indexing.md) — Builds and maintains search indexes for massive datasets using parallel workers and memory-to-disk spilling.
- [Multilingual Search Engines](https://awesome-repositories.com/f/data-databases/table-indexing-systems/search-backends/multilingual-search-engines.md) — Provides built-in support for multilingual search through language-specific tokenization, stemming, and stop-word removal. ([source](https://github.com/timescale/pg_textsearch#readme))
- [WAL-Based Standby Replication](https://awesome-repositories.com/f/data-databases/backup-and-recovery/standby-management/wal-based-standby-replication.md) — Utilizes the PostgreSQL write-ahead log to synchronize search index data across primary and standby nodes.
- [Partition-Local Statistics](https://awesome-repositories.com/f/data-databases/data-partitioning/partitioned-synchronization/partition-local-statistics.md) — Maintains term frequencies and document counts per partition to ensure accurate BM25 scoring across distributed data.
- [Database Performance Tuning](https://awesome-repositories.com/f/data-databases/database-performance-tuning.md) — Optimizes query speeds through segment consolidation, memory management, and skipping irrelevant data blocks.
- [Expression Indexes](https://awesome-repositories.com/f/data-databases/expression-indexes.md) — Supports indexing computed results from stable functions to enable search on JSONB fields and concatenated columns.
- [Asynchronous Segment Merging](https://awesome-repositories.com/f/data-databases/high-throughput-indexing/asynchronous-segment-merging.md) — Merges multiple internal index segments into a single segment to reduce scan overhead and increase query speed. ([source](https://github.com/timescale/pg_textsearch#readme))
- [Index Memory Management](https://awesome-repositories.com/f/data-databases/in-memory-caches/index-memory-management.md) — Controls shared memory usage for in-memory caches and manages thresholds for data spilling to disk. ([source](https://github.com/timescale/pg_textsearch/blob/main/CLAUDE.md))
- [Language-Specific Partial Indexing](https://awesome-repositories.com/f/data-databases/index-construction/language-specific-partial-indexing.md) — Creates smaller partial indexes for subsets of data to apply unique language configurations for multilingual tables. ([source](https://github.com/timescale/pg_textsearch/blob/main/README.md))
- [Parallel Construction](https://awesome-repositories.com/f/data-databases/index-construction/parallel-construction.md) — Accelerates index creation for large tables using parallel worker processes based on available system memory.
- [Top-k Retrieval Optimization](https://awesome-repositories.com/f/data-databases/k-nearest-neighbor-retrieval/top-k-vector-similarity-retrievals/top-k-retrieval-optimization.md) — Improves retrieval speed by skipping irrelevant data blocks when queries include order-by clauses and limits. ([source](https://github.com/timescale/pg_textsearch#readme))
- [Cascading Replication](https://awesome-repositories.com/f/data-databases/primary-replica-replication/cascading-replication.md) — Synchronizes search data across nodes to support cascading replication and point-in-time recovery. ([source](https://github.com/timescale/pg_textsearch/tree/main/test))
- [Search Index Recovery](https://awesome-repositories.com/f/data-databases/search-index-recovery.md) — Automatically restores search index integrity and availability after crashes or unexpected shutdowns. ([source](https://github.com/timescale/pg_textsearch/tree/main/test))
- [Index Memory Management](https://awesome-repositories.com/f/data-databases/search-indexing/index-memory-management.md) — Manages search indexes with configurable memory budgets and per-index limits to optimize system performance. ([source](https://github.com/timescale/pg_textsearch/tree/main/test))
- [Search Result Filtering](https://awesome-repositories.com/f/data-databases/search-result-filtering.md) — Combines search scoring with pre-filtering via separate indexes or post-filtering for non-indexed columns. ([source](https://github.com/timescale/pg_textsearch/blob/main/README.md))
- [Local Indexes](https://awesome-repositories.com/f/data-databases/secondary-indexes/local-indexes.md) — Maintains partition-local statistics for document counts and term frequencies to ensure accurate scoring. ([source](https://github.com/timescale/pg_textsearch#readme))
- [Build Acceleration](https://awesome-repositories.com/f/data-databases/table-indexing-systems/database-indexes/index-accelerated-querying/build-acceleration.md) — Accelerates the creation of indexes for large tables and partitions using parallel worker processes. ([source](https://github.com/timescale/pg_textsearch/blob/main/README.md))

### User Interface & Experience

- [BM25+ Scoring](https://awesome-repositories.com/f/user-interface-experience/search-result-ranking/relevance-scoring/bm25-scoring.md) — Calculates BM25 scores using term frequency and inverse document frequency to return the most relevant documents. ([source](https://github.com/timescale/pg_textsearch#readme))

### Artificial Intelligence & ML

- [Document Chunking Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-orchestration/retrieval-augmented-generation/document-chunking-strategies.md) — Splits documents exceeding token limits into smaller pieces to ensure consistent tokenization and ranking. ([source](https://github.com/timescale/pg_textsearch#readme))
- [Memory-Efficient Chunking](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/text-tokenization/text-chunks/memory-efficient-chunking.md) — Splits oversized text bodies into smaller pieces during tokenization to maintain memory efficiency and consistency.

### Development Tools & Productivity

- [Bitset-Based Deletion Tracking](https://awesome-repositories.com/f/development-tools-productivity/search-indexing-tools/local-file-indexers/on-demand-indexers/full-index-rebuilds/bitset-based-deletion-tracking.md) — Uses per-segment bitsets to prune deleted documents during vacuuming without requiring expensive full index rebuilds. ([source](https://github.com/timescale/pg_textsearch/blob/main/CLAUDE.md))

### DevOps & Infrastructure

- [Database High Availability](https://awesome-repositories.com/f/devops-infrastructure/high-availability-clustering/database-high-availability.md) — Ensures data availability across primary and standby nodes through index replication and automatic recovery.
