So Vits Svc | Awesome Repository

This project is a singing voice conversion tool based on VITS generative modeling. It transforms the identity of a singing voice to a target speaker while preserving the original melody, lyrics, and intonation.

The system distinguishes itself through hybrid voice synthesis, allowing for the blending of multiple speaker identities via linear model interpolation. It utilizes cluster-based feature retrieval to increase target voice similarity and employs a diffusion probabilistic model as a post-processor to remove electronic artifacts and improve vocal clarity.

The software covers a broad range of audio processing and model management capabilities, including fundamental frequency extraction, pitch normalization, and semitone adjustment. It provides a full training pipeline featuring audio dataset preprocessing, automatic mixed precision training, and the generation of speaker-specific voice indices. For deployment, the system supports weight compression and exportation to the ONNX format.

Features

VITS Synthesis Models - Implements a VITS-based generative architecture combining variational autoencoders and flow-based decoders for high-fidelity singing voice conversion.
Voice Identity Conversions - Transforms singing audio into a target voice identity while preserving the original melody, lyrics, and intonation.
Data Preparation - Resamples audio files, trims silence, and normalizes peak loudness to prepare vocal datasets for training.

Features

VITS Synthesis Models - Implements a VITS-based generative architecture combining variational autoencoders and flow-based decoders for high-fidelity singing voice conversion.
Voice Identity Conversions - Transforms singing audio into a target voice identity while preserving the original melody, lyrics, and intonation.
Data Preparation - Resamples audio files, trims silence, and normalizes peak loudness to prepare vocal datasets for training.