Omnilingual-ASR is a multilingual automatic speech recognition framework and toolkit designed to transcribe audio across 1,600 languages. It provides a complete pipeline for converting speech to text, including a toolkit for fine-tuning pre-trained speech models to specific languages or datasets using custom training recipes.
The system supports zero-shot speech recognition, allowing the model to predict text in unseen languages without extensive training data. It further enables few-shot language guidance through in-context examples and uses language codes to constrain transcription output to the correct target language and script.
The framework includes capabilities for high-throughput transcription via parallelized batch processing and a modular audio pipeline that normalizes and resamples diverse input formats. Resource management is handled through a system of asset cards and a command-line interface for retrieving metadata related to models, datasets, and tokenizers.