# nvidia-nemo/curator

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nvidia-nemo-curator).**

1,619 stars · 287 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/NVIDIA-NeMo/Curator
- awesome-repositories: https://awesome-repositories.com/repository/nvidia-nemo-curator.md

## Topics

`data` `data-curation` `data-prep` `data-preparation` `data-processing` `data-processing-pipelines` `data-quality` `datacuration` `datarecipes` `deduplication` `fast-data-processing` `fine-tuning` `large-language-models` `large-scale-data-processing` `llm` `llm-data-quality` `llmapps` `python` `semantic-deduplication`

## Description

Scalable data pre processing and curation toolkit for LLMs

## Tags

### Part of an Awesome List

- [Training Datasets](https://awesome-repositories.com/f/awesome-lists/ai/training-datasets.md) — Listed in the “Training Datasets” section of the Llm Course awesome list.
