1 repo
Tokenization methods that operate on raw byte sequences to handle diverse vocabularies.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Byte-Level Tokenizers. Refine with filters or upvote what's useful.
This project is a speech recognition and translation engine that utilizes a sequence-to-sequence transformer architecture to convert audio into text. It is built upon a weakly supervised learning framework, which leverages large-scale, unlabelled audio-transcript data to create generalized speech representations capabl