# rapidfuzz/rapidfuzz

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/rapidfuzz-rapidfuzz).**

3,731 stars · 151 forks · Python · mit

## Links

- GitHub: https://github.com/rapidfuzz/RapidFuzz
- Homepage: https://rapidfuzz.github.io/RapidFuzz/
- awesome-repositories: https://awesome-repositories.com/repository/rapidfuzz-rapidfuzz.md

## Topics

`cpp` `levenshtein` `levenshtein-distance` `python` `string-comparison` `string-matching` `string-similarity`

## Description

RapidFuzz is a C++ accelerated Python library providing high-performance string comparison and similarity calculations. It functions as a fuzzy string matching toolkit used to quantify the difference between text sequences through Levenshtein distance and other edit distance metrics.

The library focuses on scalable approximate text matching, enabling the identification and ranking of similar strings within large datasets. It provides specialized utilities for finding the best matches in a collection and generating pairwise similarity matrices.

The project covers a broad surface of text processing capabilities, including string similarity scoring, edit distance calculation, and text preprocessing. These tools are used for tasks such as data deduplication, typo correction, and search query suggestion.

## Tags

### Data & Databases

- [Fuzzy Matching](https://awesome-repositories.com/f/data-databases/fuzzy-matching.md) — Provides a comprehensive suite of algorithms for fuzzy string matching to identify close matches despite typos or variations.
- [Intra-Dataset Deduplication](https://awesome-repositories.com/f/data-databases/intra-dataset-deduplication.md) — Enables the identification and merging of nearly identical records to deduplicate datasets.
- [Matching and Ranking Logic](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/matching-ranking-logic.md) — Implements matching and ranking logic to order a list of candidates by similarity to a target string. ([source](https://cdn.jsdelivr.net/gh/rapidfuzz/rapidfuzz@main/README.md))
- [Fuzzy Query Suggestions](https://awesome-repositories.com/f/data-databases/search-suggestions/fuzzy-query-suggestions.md) — Supports the ranking of candidates to generate fuzzy search query suggestions based on user input.
- [Vector Distance Kernels](https://awesome-repositories.com/f/data-databases/vectorized-arithmetic/simd-accelerated-arithmetic/vector-distance-kernels.md) — Employs SIMD vector distance kernels to process multiple characters simultaneously for accelerated string distance calculations.

### Artificial Intelligence & ML

- [Text Similarity Scoring](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-analysis-tools/semantic-similarity-calculation/text-similarity-scoring.md) — Calculates numerical similarity scores between pieces of text using structural and token-based algorithmic comparison.

### Programming Languages & Runtimes

- [C++ Implementations](https://awesome-repositories.com/f/programming-languages-runtimes/c-implementations.md) — Ships a core engine written in C++ to provide high-performance string matching and similarity calculations.
- [High-Performance C++ Libraries](https://awesome-repositories.com/f/programming-languages-runtimes/high-performance-c-libraries.md) — Provides a high-performance C++ library core exposed to Python for scalable string comparison.
- [Edit Distance Calculators](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/string-utilities/string-manipulators/edit-distance-calculators.md) — Implements high-performance edit distance calculators to measure the minimum operations required to transform one string into another.
- [String Similarity Metrics](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/string-utilities/string-manipulators/edit-distance-calculators/string-similarity-metrics.md) — Implements a wide range of string similarity metrics, including Levenshtein, Hamming, and Jaro-Winkler distances.
- [Raw Pointer Access](https://awesome-repositories.com/f/programming-languages-runtimes/array-reductions/array-pointer-arithmetic/raw-pointer-access.md) — Uses raw pointer access to manage string buffers and score arrays, eliminating Python object overhead in tight loops.
- [Bit-Parallel Implementations](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/string-utilities/string-manipulators/edit-distance-calculators/bit-parallel-implementations.md) — Implements high-performance edit distance calculations using bit-parallelism to process state vectors in single CPU cycles.
- [Similarity Matrices](https://awesome-repositories.com/f/programming-languages-runtimes/programming-utilities/string-utilities/string-manipulators/edit-distance-calculators/string-similarity-metrics/similarity-matrices.md) — Provides utilities to calculate distance or similarity between two collections of strings as a full matrix or direct mapping. ([source](https://rapidfuzz.github.io/RapidFuzz/Usage/process.html))

### Software Engineering & Architecture

- [Approximate Matching Tools](https://awesome-repositories.com/f/software-engineering-architecture/string-matching-algorithms/approximate-matching-tools.md) — Ships a toolkit for approximate matching to identify the most similar strings from candidate sets using distance metrics.
- [Approximate String Searching](https://awesome-repositories.com/f/software-engineering-architecture/string-matching-algorithms/approximate-string-searching.md) — Performs approximate string searching to find the closest textual matches for a query within a collection. ([source](https://rapidfuzz.github.io/RapidFuzz/Usage/index.html))
- [Best Match Extraction](https://awesome-repositories.com/f/software-engineering-architecture/string-matching-algorithms/best-match-extraction.md) — Provides utilities to extract the single most similar string from a list of candidate choices. ([source](https://rapidfuzz.github.io/RapidFuzz/Usage/process.html))

### Operating Systems & Systems Programming

- [C-Bindings](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/memory-allocation-libraries/low-level-system-operations/c-bindings.md) — Provides a high-performance binding layer that translates Python string objects into C-style arrays for low-latency execution.
