# opennmt/tokenizer

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/opennmt-tokenizer).**

333 stars · 83 forks · C++ · MIT

## Links

- GitHub: https://github.com/OpenNMT/Tokenizer
- Homepage: https://opennmt.net/
- awesome-repositories: https://awesome-repositories.com/repository/opennmt-tokenizer.md

## Topics

`bpe` `cpp` `icu` `machine-translation` `natural-language-processing` `python` `sentencepiece` `tokenization` `tokenizer` `unicode`

## Description

Fast and customizable text tokenization library with BPE and SentencePiece support

## Tags

### Part of an Awesome List

- [Natural Language Processing](https://awesome-repositories.com/f/awesome-lists/ai/natural-language-processing.md) — Listed in the “Natural Language Processing” section of the FunNLP awesome list.
- [Lexical Analysis Tools](https://awesome-repositories.com/f/awesome-lists/devtools/lexical-analysis-tools.md) — Fast and customizable text tokenization library.
