# svc-develop-team/so-vits-svc

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/svc-develop-team-so-vits-svc).**

28,097 stars · 5,051 forks · Python · AGPL-3.0 · archived

## Links

- GitHub: https://github.com/svc-develop-team/so-vits-svc
- awesome-repositories: https://awesome-repositories.com/repository/svc-develop-team-so-vits-svc.md

## Description

This project is a singing voice conversion tool based on VITS generative modeling. It transforms the identity of a singing voice to a target speaker while preserving the original melody, lyrics, and intonation.

The system distinguishes itself through hybrid voice synthesis, allowing for the blending of multiple speaker identities via linear model interpolation. It utilizes cluster-based feature retrieval to increase target voice similarity and employs a diffusion probabilistic model as a post-processor to remove electronic artifacts and improve vocal clarity.

The software covers a broad range of audio processing and model management capabilities, including fundamental frequency extraction, pitch normalization, and semitone adjustment. It provides a full training pipeline featuring audio dataset preprocessing, automatic mixed precision training, and the generation of speaker-specific voice indices. For deployment, the system supports weight compression and exportation to the ONNX format.

## Tags

### Artificial Intelligence & ML

- [VITS Synthesis Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/neural-text-to-speech-engines/vits-synthesis-models.md) — Implements a VITS-based generative architecture combining variational autoencoders and flow-based decoders for high-fidelity singing voice conversion.
- [Voice Identity Conversions](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions.md) — Transforms singing audio into a target voice identity while preserving the original melody, lyrics, and intonation. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/requirements_win.txt))
- [Data Preparation](https://awesome-repositories.com/f/artificial-intelligence-ml/data-preparation.md) — Resamples audio files, trims silence, and normalizes peak loudness to prepare vocal datasets for training. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/resample.py))
- [Feature Extractors](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-extractors.md) — Predicts fundamental frequency and extracts content embeddings from audio to facilitate voice conversion.
- [Voice Model Trainers](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-trainers/voice-model-trainers.md) — Provides a framework for preprocessing audio datasets and training VITS models to capture specific vocal characteristics.
- [Voice Model Merging](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/voice-synthesis/modular-voice-configurations/voice-model-merging.md) — Blends multiple voice models or speaker identities to create unique hybrid vocal identities through linear combinations. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/webUI.py))
- [Voice Synthesizer Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/voice-synthesis/modular-voice-configurations/voice-synthesizer-training.md) — Processes audio datasets and trains deep learning models to replicate specific vocal characteristics and timbres.
- [Pitch-Guided Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/pitch-guided-synthesis.md) — Uses fundamental frequency predictors to map source pitch and maintain melodic accuracy during voice conversion.
- [Training Data Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-generation.md) — Converts raw audio files into spectrograms, speaker IDs, and text sequences for deep learning model training. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/data_utils.py))
- [Vocal Content Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/vocal-content-embeddings.md) — Extracts speaker-independent latent representations from raw audio to separate linguistic content from vocal identity.
- [AI Audio Enhancement](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-audio-enhancement.md) — Applies diffusion probabilistic models as post-processors to remove electronic artifacts and improve vocal clarity.
- [Dataset Scanning Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-scanning-tools.md) — Scans audio directories to generate file lists while filtering short clips and mapping speaker identities. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/preprocess_flist_config.py))
- [Vocal Denoising Post-processors](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/diffusion-models/autoregressive-audio-diffusion/vocal-denoising-post-processors.md) — Applies a diffusion probabilistic model as a post-processor to remove electronic artifacts and improve vocal clarity.
- [Voice Index Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/voice-synthesis/modular-voice-configurations/voice-synthesizer-training/voice-index-generators.md) — Extracts speaker-specific features from audio datasets to create index files that guide the conversion process. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/train_index.py))
- [Voice Identity Interpolators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-reconstruction/weight-interpolators/voice-identity-interpolators.md) — Blends multiple speaker identities via weighted averages of model weights or feature embeddings for hybrid voice creation.
- [Hybrid Voice Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions/hybrid-voice-synthesis.md) — Mixes multiple target speaker characteristics to create a hybrid vocal identity for output audio. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/inference_main.py))
- [Voice Model Loaders](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-model-loaders.md) — Enables the import of external model and configuration files to define specific target vocal characteristics. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/webUI.py))

### Graphics & Multimedia

- [Vocal Timbre Mixers](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/timbre-morphing-tools/vocal-timbre-mixers.md) — Blends multiple speaker models to create hybrid voice identities through linear interpolation.
- [Audio Feature Extraction](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/audio-analysis-synthesis/audio-feature-extraction.md) — Extracts content embeddings, fundamental frequency, and volume from audio files to prepare for voice conversion. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/preprocess_hubert_f0.py))
- [Pitch Estimation](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/audio-analysis-synthesis/audio-feature-extraction/pitch-estimation.md) — Employs predictor algorithms to estimate the pitch of an audio signal for use in voice conversion. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/utils.py))
- [Time-Stretching and Pitch-Shifting](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-processing/time-stretching-and-pitch-shifting.md) — Shifts the pitch of converted audio by specific semitones to modify the musical key of the output. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/inference_main.py))
- [Vocal Pitch Normalization](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-processing/time-stretching-and-pitch-shifting/vocal-pitch-normalization.md) — Adjusts the fundamental frequency of voice recordings to a standard scale for consistent voice conversion. ([source](https://github.com/svc-develop-team/so-vits-svc/blob/4.1-Stable/utils.py))
- [Timbre Fidelity Controllers](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/timbre-morphing-tools/timbre-fidelity-controllers.md) — Balances retrieval-based index files to reduce timbre leakage and increase similarity to the target voice. ([source](https://github.com/svc-develop-team/so-vits-svc#readme))
- [Vocal Artifact Removal](https://awesome-repositories.com/f/graphics-multimedia/vocal-artifact-removal.md) — Applies diffusion models and enhancers to reduce electronic artifacts and improve the clarity of converted vocals. ([source](https://github.com/svc-develop-team/so-vits-svc#readme))

### Data & Databases

- [Acoustic Feature Retrieval](https://awesome-repositories.com/f/data-databases/k-nearest-neighbor-retrieval/acoustic-feature-retrieval.md) — Uses cluster-based nearest-neighbor retrieval of acoustic features to improve target voice similarity and reduce timbre leakage.

### Part of an Awesome List

- [Voice Processing](https://awesome-repositories.com/f/awesome-lists/media/voice-processing.md) — Framework for singing voice conversion using deep learning.
