# abus-aikorea/voice-pro

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/abus-aikorea-voice-pro).**

6,255 stars · 673 forks · Python · gpl-3.0

## Links

- GitHub: https://github.com/abus-aikorea/voice-pro
- Homepage: https://www.wctokyoseoul.com
- awesome-repositories: https://awesome-repositories.com/repository/abus-aikorea-voice-pro.md

## Topics

`audiobook` `faster-whisper` `gradio` `karaoke` `podcasts` `speech-recognition` `speech-synthesis` `speech-to-text` `subtitles` `text-to-speech` `transcription` `translator` `tts` `voice-cloning` `voice-conversion` `webui` `whisper` `whisperx` `yt-dlp`

## Description

Voice Pro is a comprehensive speech and audio processing toolkit that combines text-to-speech synthesis, voice cloning, speech recognition, and translation capabilities into a single application. At its core, the project enables users to generate natural-sounding speech from text, clone voices from short audio samples without requiring prior training data, and perform real-time speech translation across over 100 languages.

The platform distinguishes itself through its integrated multimedia workflow, allowing users to download YouTube videos, extract audio, separate voice tracks, generate word-timed subtitles, and produce dubbed content in over 100 languages through a unified pipeline. It supports multiple speech synthesis engines including Edge-TTS, F5-TTS, E2-TTS, CosyVoice, and kokoro, while also providing the ability to train custom TTS models on user-provided datasets and export trained models to ONNX format for deployment.

Beyond core speech generation, the application offers extensive audio processing features such as transcribing speech to text with word-level subtitle generation, translating subtitle files while preserving formatting, and performing real-time speech recognition and translation with customizable audio inputs. The system also includes capabilities for extracting audio from video, removing noise, and managing the application's installation and dependencies through built-in cleanup utilities.

## Tags

### Artificial Intelligence & ML

- [AI Video Dubbing Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-video-generators/ai-video-dubbing-tools.md) — Provides an integrated AI dubbing pipeline that downloads, transcribes, translates, and dubs videos. ([source](https://github.com/abus-aikorea/voice-pro/))
- [Subtitle Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/multilingual-transcription/subtitle-translation.md) — Converts subtitle files (ASS, SSA, SRT) into over 100 languages while preserving timing and formatting. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.por.md))
- [Multilingual Voice Cloning Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/voice-cloning-tools/multilingual-voice-cloning-synthesizers.md) — Generates speech using multiple cloning engines with support for celebrity voices and multilingual output. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.jpn.md))
- [Zero-Shot Voice Cloning](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/zero-shot-voice-cloning.md) — Provides zero-shot voice cloning from short audio samples without additional training. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.kor.md))
- [Real-Time Speech Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-speech-translation.md) — Recognizes speech and translates it into multiple languages in real time. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.jpn.md))
- [Speech-to-Text Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-engines.md) — Converts spoken audio from files or streams into text using multiple recognition engines. ([source](https://github.com/abus-aikorea/voice-pro/))
- [Automated Video Subtitling](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription/automated-video-subtitling.md) — Transcribes audio to word-level subtitles with noise removal and multilingual support. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.jpn.md))
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Generates natural-sounding speech from text using conditional flow matching synthesis. ([source](https://github.com/abus-aikorea/voice-pro/))
- [ONNX Model Exporters](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/serialization-and-export-formats/onnx-model-exporters.md) — Exports trained TTS checkpoints to the ONNX format for cross-engine deployment. ([source](https://github.com/abus-aikorea/voice-pro/tree/main/third_party/Matcha-TTS))
- [ONNX Runtime Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-engines/onnx-runtime-inference.md) — Runs speech synthesis inference on exported ONNX graphs with GPU acceleration. ([source](https://github.com/abus-aikorea/voice-pro/tree/main/third_party/Matcha-TTS))
- [Custom TTS Model Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-frameworks/speech-model-training/custom-tts-model-training-pipelines.md) — Trains custom TTS models on user-provided datasets with file list preparation and normalization. ([source](https://github.com/abus-aikorea/voice-pro/tree/main/third_party/Matcha-TTS))
- [Word-Highlighted Subtitle Players](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription/automated-video-subtitling/word-highlighted-subtitle-players.md) — Ships a video player that displays word-highlighted subtitles with noise removal and multilingual support. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.kor.md))

### Graphics & Multimedia

- [Speech Synthesis & TTS](https://awesome-repositories.com/f/graphics-multimedia/audio-music/speech-synthesis-tts.md) — Converts text to speech using multiple engines with support for 100+ languages and 400+ voices. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.eng.md))
- [Audio Track Extraction](https://awesome-repositories.com/f/graphics-multimedia/video-content-repurposing/video-clip-extraction/audio-track-extraction.md) — Extracts audio tracks from downloaded YouTube videos in multiple formats. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.kor.md))

### Content Management & Publishing

- [Timestamped Subtitle Generators](https://awesome-repositories.com/f/content-management-publishing/media-management/subtitle-management-systems/timestamped-subtitle-generators.md) — Generates subtitle files with word-level timestamps from audio input. ([source](https://github.com/abus-aikorea/voice-pro/blob/main/docs/README.eng.md))