# allenai/dolma

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/allenai-dolma).**

1,410 stars · 166 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/allenai/dolma
- Homepage: https://allenai.github.io/dolma/
- awesome-repositories: https://awesome-repositories.com/repository/allenai-dolma.md

## Topics

`data-processing` `large-language-models` `llm` `machile-learning` `nlp`

## Tags

### Part of an Awesome List

- [Data Curation and Filtering](https://awesome-repositories.com/f/awesome-lists/ai/data-curation-and-filtering.md) — High-performance toolkit for tagging, deduplicating, and curating large text corpora.
