EmotiVoice

Features

Speech Synthesis - Synthesizes seamless spoken audio from a mix of Chinese and English text.
Voice Cloning - Implements a framework for replicating specific speaker identities by training custom acoustic models on personal audio datasets.
Joint Acoustic-Vocoder Training - Employs a jointly-trained acoustic-vocoder pipeline to produce high-fidelity audio with reduced artifacts.
Expressive Synthesis - Uses style and emotional embeddings to control the timbre and expression of generated speech.
Grapheme To Phoneme Conversion - Transforms Chinese text into phonetic representations via number normalization and pinyin conversion.
Acoustic Model Trainers - Provides joint training for acoustic models and vocoders to ensure high-fidelity synthetic audio generation.
Voice Model Trainers - Utilizes a deep learning architecture to align text with high-fidelity emotional expressions.
Synthetic Speech Generation - Produces high-fidelity synthetic speech by replicating vocal characteristics based on specific speaker profiles.
Voice Synthesizer Training - Processes custom audio datasets and transcriptions to train models on specific speaker characteristics.
Phoneme-Based Pipelines - Implements a pipeline to transform raw bilingual text into phonetic representations for synthesis.
Emotional Synthesis - Generates synthetic audio in English and Chinese with controllable emotional states like happiness or sadness.
Bilingual Synthesizers - Acts as a bilingual synthesis engine processing mixed Chinese and English text into seamless audio.
Emotional Modulation - Generates synthetic audio that conveys specific human emotions like happiness or sadness.
Emotional TTS Engines - Generates synthetic audio in English and Chinese with controllable emotional states.
Model Fine-Tuning - Supports improving emotional expression by adapting pre-trained synthetic voices using custom datasets and alignment.
Speech Synthesis Services - Operates as a containerized web server exposing speech synthesis capabilities through an HTTP interface.
Text-to-Speech - Provides a self-hosted web service via Docker for programmatic text-to-speech generation.
Local Speech Synthesis - Allows users to generate synthetic speech locally via a desktop application without an internet connection.
Training Dataset Preparation - Includes utilities to organize datasets and initialize model checkpoints specifically for voice model training.
Synthetic Voice Design - Provides a library of diverse speaker identities and gender profiles to define the characteristics of generated speech.
Command Line Interfaces - Provides a command line interface for generating synthetic audio from text.
Speech API Hosting - Ships an HTTP interface to expose synthetic voice generation programmatically to external applications.
Text-to-Speech Engines - Provides a system to convert written text into spoken audio via remote server requests.
API Wrappers - Exposes the internal speech engine via a web server wrapper for remote programmatic use.
Web Dashboards - Ships an interactive browser-based dashboard for performing text-to-speech synthesis without writing code.

Open-source alternatives to EmotiVoice

Similar open-source projects, ranked by how many features they share with EmotiVoice.

openbmb/voxcpm
OpenBMB/VoxCPM
29,985View on GitHub
VoxCPM is a multilingual speech synthesis system and text-to-speech inference server. It functions as an AI voice cloning tool and a synthetic voice designer, capable of generating natural speech across global languages and regional dialects using a GPU-accelerated audio generator. The project features a speech model fine-tuning framework that supports both full parameter updates and low-rank adaptation for customizing voice characteristics. It enables high-fidelity voice cloning from reference audio, including cross-lingual voice transfer and acoustic environment mimicry, as well as the crea
Pythonaudiodeeplearningminicpm
View on GitHub29,985
neonbjb/tortoise-tts
neonbjb/tortoise-tts
14,864View on GitHub
Tortoise-tts is a neural text-to-speech engine and voice cloning toolkit designed for high-quality audio generation. It functions as a zero-shot synthesis system, meaning it can generate speech for unseen speakers without requiring additional training or fine-tuning for each new voice. The system specializes in replicating human vocal characteristics using small sets of reference audio clips. It allows for the extraction of voice latents to mimic specific speakers, the generation of random synthetic identities, and the blending of multiple voice profiles to create hybrid vocal identities. Th
Jupyter Notebook
View on GitHub14,864
bytedance/megatts3
bytedance/MegaTTS3
6,066View on GitHub
MegaTTS3 is a bilingual speech synthesis system that generates natural-sounding speech in Chinese and English, including seamless code-switching within a single utterance. It functions as a text-to-speech engine, voice cloning system, and speech-to-text alignment tool, built around an acoustic latent compression model that encodes high-resolution audio into compact representations for efficient processing. The system distinguishes itself through accent intensity control, allowing adjustment of a speaker's accent strength in generated speech, and voice cloning from short audio samples for pers
Pythonresearch
View on GitHub6,066
livekit/agents
livekit/agents
9,379View on GitHub
This project is a framework for developing multimodal AI agents that function as programmable participants in real-time communication rooms. It enables the construction of agents that can see, hear, and speak by integrating speech-to-text, large language models, and text-to-speech pipelines to facilitate low-latency, natural conversations. The system is distinguished by its advanced orchestration of real-time media and conversational flow, including support for full-duplex speech, preemptive response generation, and sophisticated interruption management. It further differentiates itself throu
Pythonagentsaiopenai
View on GitHub9,379

See all 30 alternatives to EmotiVoice

netease-youdaoEmotiVoice

Features

Open-source alternatives to EmotiVoice

OpenBMB/VoxCPM

neonbjb/tortoise-tts

bytedance/MegaTTS3

livekit/agents

Star history

Open-source alternatives to EmotiVoice

OpenBMB/VoxCPM

neonbjb/tortoise-tts

bytedance/MegaTTS3

livekit/agents