Voicevox is a text-to-speech synthesis software and audio production environment that converts written text into spoken audio using synthetic character voices. It functions as both a comprehensive editor for voice design and a standalone speech synthesis engine capable of generating audio via an API for integration into external applications.
The project distinguishes itself by providing a singing voice synthesizer that uses a piano-roll interface for melodic vocal composition, including the ability to generate humming. It offers specialized prosody editing tools for the manual refinement of pitch, inflection, and accent to ensure natural delivery.
The system covers broad capability areas including multi-track audio management, virtual character voice design, and the generation of phonetic metadata for animation lip-syncing. It also supports custom pronunciation dictionaries, voice characteristic morphing, and hardware-accelerated inference to optimize audio generation speed.
The synthesis engine can be deployed as a standalone executable or via Docker containers.