# huggingface/datatrove

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/huggingface-datatrove).**

3,092 stars · 273 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/huggingface/datatrove
- awesome-repositories: https://awesome-repositories.com/repository/huggingface-datatrove.md

## Description

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

## Tags

### Part of an Awesome List

- [Data Curation and Filtering](https://awesome-repositories.com/f/awesome-lists/ai/data-curation-and-filtering.md) — Library for building scalable, platform-agnostic text processing pipelines.
- [Data Pipelines](https://awesome-repositories.com/f/awesome-lists/data/data-pipelines.md) — Processes, filters, and deduplicates large-scale text data.
- [Data Processing](https://awesome-repositories.com/f/awesome-lists/data/data-processing.md) — Platform-agnostic pipeline blocks for data processing.
