# voicevox/voicevox

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/voicevox-voicevox).**

3,025 stars · 351 forks · TypeScript · other

## Links

- GitHub: https://github.com/VOICEVOX/voicevox
- Homepage: https://voicevox.hiroshiba.jp/
- awesome-repositories: https://awesome-repositories.com/repository/voicevox-voicevox.md

## Description

Voicevox is a text-to-speech synthesis software and audio production environment that converts written text into spoken audio using synthetic character voices. It functions as both a comprehensive editor for voice design and a standalone speech synthesis engine capable of generating audio via an API for integration into external applications.

The project distinguishes itself by providing a singing voice synthesizer that uses a piano-roll interface for melodic vocal composition, including the ability to generate humming. It offers specialized prosody editing tools for the manual refinement of pitch, inflection, and accent to ensure natural delivery.

The system covers broad capability areas including multi-track audio management, virtual character voice design, and the generation of phonetic metadata for animation lip-syncing. It also supports custom pronunciation dictionaries, voice characteristic morphing, and hardware-accelerated inference to optimize audio generation speed.

The synthesis engine can be deployed as a standalone executable or via Docker containers.

## Tags

### Artificial Intelligence & ML

- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Converts written text into high-quality spoken audio using a variety of synthetic character voices.
- [Voice](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/integration-deployment/agent-frameworks/configuration-and-specifications/agent-persona-definitions/persona-assignments/voice.md) — Allows selection of different voice characters and styles to change the identity of synthesized audio. ([source](https://voicevox.hiroshiba.jp/how_to_use/))
- [Prosody Control](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/prosody-control.md) — Offers fine-grained control over pitch, speed, and intonation by adjusting parameters tied to specific phonemes.
- [Phoneme-Based Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-alignment-models/phoneme-based-alignment/phoneme-based-pipelines.md) — Uses a customizable dictionary-based system to translate written text into phonetic representations for accurate pronunciation.
- [Speech Synthesis Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-synthesis-engines.md) — Provides a core speech synthesis engine capable of being integrated into external applications or services.
- [Synthetic Voice Design](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions/synthetic-voice-design.md) — Provides tools for creating and fine-tuning unique synthetic vocal identities and personas.
- [Vocal Compositions](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-music-composition/vocal-compositions.md) — Enables musical vocal composition using a multi-track editor with pitch editing and MIDI/UST imports. ([source](https://voicevox.hiroshiba.jp/update_history/))
- [Hardware-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference.md) — Supports switching between CPU and GPU processing to accelerate the speed of neural network audio generation.
- [Phonetic Pronunciation Overrides](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/phonetic-pronunciation-overrides.md) — Supports custom pronunciation dictionaries for explicit phoneme sequences to resolve ambiguity. ([source](https://voicevox.hiroshiba.jp/how_to_use/))
- [Speech Parameter Configuration](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-parameter-configuration.md) — Allows configuration of auditory qualities like pitch, rate, and volume on a per-segment basis. ([source](https://voicevox.hiroshiba.jp/how_to_use/))
- [Expressive Prosody Controls](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-synthesis-controls/expressive-prosody-controls.md) — Offers fine-grained controls for manipulating emotion, rhythm, and intonation. ([source](https://voicevox.hiroshiba.jp/nemo/))
- [Hybrid Voice Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions/hybrid-voice-synthesis.md) — Blends characteristics of multiple target speakers to create unique hybrid vocal identities. ([source](https://voicevox.hiroshiba.jp/how_to_use/))

### Graphics & Multimedia

- [Singing Voice Synthesis](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/singing-voice-synthesis.md) — Provides a specialized singing voice synthesizer with a piano-roll interface for melodic vocal composition.
- [Speech Processing Libraries](https://awesome-repositories.com/f/graphics-multimedia/audio-music/speech-processing-libraries.md) — Provides a core library that allows speech generation capabilities to be embedded directly into external applications. ([source](https://voicevox.hiroshiba.jp/song/))
- [Multi-Track Audio Editors](https://awesome-repositories.com/f/graphics-multimedia/multi-track-audio-editors.md) — Ships a multi-track editor for organizing speech and singing segments with integrated synthesis.
- [Multi-Track Audio Sequencers](https://awesome-repositories.com/f/graphics-multimedia/multi-track-audio-sequencers.md) — Provides a multi-track sequencing environment to organize audio segments, timing, and character assignments on a project timeline.
- [Humming Generation](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/singing-voice-synthesis/humming-generation.md) — Provides the ability to generate melodic humming sounds for characters to sing without lyrics. ([source](https://voicevox.hiroshiba.jp/product/aierutan/))

### Business & Productivity Software

- [Synthesis Project Management](https://awesome-repositories.com/f/business-productivity-software/synthesis-project-management.md) — Saves and loads work-in-progress sessions including text, character assignments, and timing settings. ([source](https://voicevox.hiroshiba.jp/update_history/))

### DevOps & Infrastructure

- [Docker Container Deployments](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/container-runtimes/runtime-configuration-interfaces/docker-socket-orchestrators/docker-target-configurators/docker-container-deployments.md) — Packages the synthesis runtime into Docker containers to ensure consistent environment dependencies across operating systems.
- [Standalone Service Deployments](https://awesome-repositories.com/f/devops-infrastructure/standalone-service-deployments.md) — Allows the synthesis engine to be deployed as a standalone executable or Docker container for external use. ([source](https://voicevox.hiroshiba.jp/))

### Software Engineering & Architecture

- [Client-Server Architectures](https://awesome-repositories.com/f/software-engineering-architecture/client-server-architectures.md) — Implements a client-server architecture that decouples the user interface from the synthesis engine via a REST API.
- [Voice Parameter Presets](https://awesome-repositories.com/f/software-engineering-architecture/project-management-governance/project-management/project-lifecycle-management/project-configuration-presets/configuration-presets/voice-parameter-presets.md) — Allows saving and applying groups of voice settings to maintain consistency across different segments of text. ([source](https://voicevox.hiroshiba.jp/how_to_use/))

### User Interface & Experience

- [Voice Character Presets](https://awesome-repositories.com/f/user-interface-experience/voice-interfaces/system-voice-managers/voice-catalogs/voice-character-presets.md) — Saves and loads collections of voice parameters to maintain consistent character delivery across different projects. ([source](https://voicevox.hiroshiba.jp/update_history/))
