2 repository-uri
Processing and normalizing speech data across thousands of different languages.
Distinct from Speech and Audio Processing: Specifically for the processing/normalization of multilingual audio, not document classification or speech generation.
Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Multilingual Audio Processing. Refine with filters or upvote what's useful.
This project is a framework for building local voice assistants and a real-time audio streaming server. It functions as a containerized inference engine and a multilingual speech pipeline that orchestrates speech-to-text, language models, and text-to-speech components to convert spoken input into spoken output. The system is distinguished by its use of WebSocket-based bidirectional streaming for low-latency interactions. It features a voice activity detection system that manages speech boundaries and handles user barge-in interruptions during assistant playback. It also supports custom voice
Features a multilingual audio processing chain that handles language detection and audio-to-audio conversion.
Omnilingual-ASR is a multilingual automatic speech recognition framework and toolkit designed to transcribe audio across 1,600 languages. It provides a complete pipeline for converting speech to text, including a toolkit for fine-tuning pre-trained speech models to specific languages or datasets using custom training recipes. The system supports zero-shot speech recognition, allowing the model to predict text in unseen languages without extensive training data. It further enables few-shot language guidance through in-context examples and uses language codes to constrain transcription output t
Manages and processes speech data across thousands of languages with tools for resampling and normalization.