# phiresky/ripgrep-all

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/phiresky-ripgrep-all).**

9,695 stars · 215 forks · Rust · NOASSERTION

## Links

- GitHub: https://github.com/phiresky/ripgrep-all
- awesome-repositories: https://awesome-repositories.com/repository/phiresky-ripgrep-all.md

## Description

ripgrep-all is a command-line utility that extends ripgrep to perform regular expression searches across binary files, compressed archives, and media formats. It functions as a universal text extractor that converts non-plain-text formats, such as PDFs, E-books, and Office documents, into searchable text.

The tool uses a system of adapters to transform binary data into plain text and utilizes a local database to cache these extracted versions, accelerating repeated search operations. It identifies file types by analyzing header magic bytes rather than relying on file extensions.

The project covers recursive archive traversal for searching inside nested zip and tar files, as well as the retrieval of media metadata, subtitles, and chapters from video and audio containers. It also provides a mechanism to integrate custom extraction scripts for proprietary file formats.

## Tags

### Development Tools & Productivity

- [Text Pattern Search](https://awesome-repositories.com/f/development-tools-productivity/text-pattern-search.md) — Performs high-performance regular expression searches across a mix of plain text, PDFs, E-books, and Office documents.
- [Binary Format Text Extraction](https://awesome-repositories.com/f/development-tools-productivity/text-pattern-search/binary-format-text-extraction.md) — Provides regular expression searching across binary formats like PDFs and Office documents by extracting them into plain text. ([source](https://github.com/phiresky/ripgrep-all/blob/master/rust-toolchain.toml))
- [Recursive Archive Traversers](https://awesome-repositories.com/f/development-tools-productivity/archive-management/archive-importers/recursive-archive-traversers.md) — Recursively descends into nested archives to identify and search internal contents based on mime types.
- [Magic Byte File Identification](https://awesome-repositories.com/f/development-tools-productivity/magic-byte-file-identification.md) — Identifies the correct text extractor by analyzing file header magic bytes instead of relying on file extensions. ([source](https://github.com/phiresky/ripgrep-all/blob/master/README.md))
- [Text Extraction Caches](https://awesome-repositories.com/f/development-tools-productivity/search-indexing-tools/local-file-indexers/text-extraction-caches.md) — Maintains a local database of extracted text from binary files to speed up repeated search operations.
- [Universal Text Extractors](https://awesome-repositories.com/f/development-tools-productivity/universal-text-extractors.md) — Implements a system of adapters to convert various binary formats into searchable plain text.
- [Signature-to-Adapter Mappings](https://awesome-repositories.com/f/development-tools-productivity/file-extension-language-mappings/signature-to-adapter-mappings.md) — Implements a flexible mapping system that links file signatures to specific extraction tools via configuration.
- [Media Metadata Retrievers](https://awesome-repositories.com/f/development-tools-productivity/integration-metadata-retrievers/media-metadata-retrievers.md) — Retrieves and searches embedded subtitles, chapters, and metadata from video and audio containers.
- [Subprocess Management](https://awesome-repositories.com/f/development-tools-productivity/subprocess-management.md) — Orchestrates external CLI tools via subprocesses to normalize diverse binary formats into searchable text streams.

### Data & Databases

- [Compressed Data Searching](https://awesome-repositories.com/f/data-databases/compressed-data-searching.md) — Enables regular expression searching directly within compressed archives by analyzing internal mime types. ([source](https://github.com/phiresky/ripgrep-all/blob/master/CHANGELOG.md))
- [Persistent Binary Caches](https://awesome-repositories.com/f/data-databases/data-caching/persistent-binary-caches.md) — Utilizes a local database to cache extracted text from binary files, preventing redundant processing.
- [Binary](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/local-document-indexing/document-indexing/binary.md) — Converts binary documents into searchable formats and indexes them for high-performance querying.
- [Regex-Based File Search](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/regex-based-file-search.md) — Extends ripgrep's regex search capabilities to include binary files like PDFs and Office documents.
- [Text Extraction](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction.md) — Provides utilities to extract plain text from various binary document formats for search processing.

### Security & Cryptography

- [Archive Content Scanning](https://awesome-repositories.com/f/security-cryptography/secrets-scanning/archive-content-scanning.md) — Scans the contents of compressed archives like zip and tar files without requiring manual extraction.

### Graphics & Multimedia

- [Content-Based Metadata Extraction](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing/video-analysis-processing/video-metadata-extraction/content-based-metadata-extraction.md) — Retrieves and searches embedded subtitles, lyrics, and chapter titles within video and audio files. ([source](https://github.com/phiresky/ripgrep-all/blob/master/README.md))

### Software Engineering & Architecture

- [Custom Format Decoders](https://awesome-repositories.com/f/software-engineering-architecture/custom-format-decoders.md) — Allows the integration of custom scripts and external tools as adapters for proprietary file formats. ([source](https://github.com/phiresky/ripgrep-all/blob/master/README.md))

### Part of an Awesome List

- [Text Processing](https://awesome-repositories.com/f/awesome-lists/devtools/text-processing.md) — Search tool for PDFs, E-books, and office documents.
