# open-mmlab/amphion

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/open-mmlab-amphion).**

9,844 stars · 813 forks · Python · MIT

## Links

- GitHub: https://github.com/open-mmlab/Amphion
- Homepage: https://openhlt.github.io/amphion/
- awesome-repositories: https://awesome-repositories.com/repository/open-mmlab-amphion.md

## Topics

`audio-generation` `audio-synthesis` `audioldm` `audit` `emilia` `fastspeech2` `maskgct` `music-generation` `naturalspeech2` `singing-voice-conversion` `speech-synthesis` `text-to-audio` `text-to-speech` `vall-e` `vits` `vocoder` `voice-conversion`

## Description

Amphion is an audio generation toolkit designed for the research and development of models that synthesize speech, music, and environmental sound effects. It provides a standardized framework for reproducible audio synthesis, incorporating a text-to-speech engine and a voice conversion framework.

The project specializes in transforming audio identities, allowing for the modification of speaker accents and voice identities while preserving original rhythm and style. It also includes capabilities for singing voice synthesis and the generation of environmental soundscapes from text descriptions using diffusion models.

The toolkit covers a broad range of audio processing capabilities, including neural vocoding for waveform reconstruction, discrete token encoding, and zero-shot voice cloning. It further provides utilities for audio dataset preprocessing to unify diverse open-source data, as well as tools for audio quality evaluation and the visualization of model mechanisms.

## Tags

### Part of an Awesome List

- [Audio Generation](https://awesome-repositories.com/f/awesome-lists/ai/audio-generation.md) — Provides a comprehensive toolkit for synthesizing speech, music, and environmental sound effects.
- [Text-to-Sound Effect Generation](https://awesome-repositories.com/f/awesome-lists/media/music-and-audio-generation/text-to-sound-effect-generation.md) — Generates high-fidelity environmental sounds and effects from text descriptions using diffusion models. ([source](https://github.com/open-mmlab/amphion#readme))

### Graphics & Multimedia

- [Audio Synthesis](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis.md) — Provides a standardized framework for reproducible synthesis of speech, music, and environmental audio signals. ([source](https://github.com/open-mmlab/amphion#readme))
- [Neural Vocoders](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/neural-vocoders.md) — Includes neural vocoders to transform intermediate acoustic representations into high-resolution audio waveforms.
- [Audio Dataset Preprocessing](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-playback/raw-audio-streamers/audio-dataset-preprocessing.md) — Provides utilities to unify and clean diverse open-source audio datasets into consistent formats for model training.
- [Audio Signal Fidelity Metrics](https://awesome-repositories.com/f/graphics-multimedia/audio-signal-fidelity-metrics.md) — Offers objective metrics to evaluate the reconstruction quality, intelligibility, and speaker similarity of generated audio. ([source](https://github.com/open-mmlab/amphion#readme))
- [Singing Voice Synthesis](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/singing-voice-synthesis.md) — Synthesizes melodic singing voice performances with precise control over melody and vocal style. ([source](https://github.com/open-mmlab/amphion#readme))

### Artificial Intelligence & ML

- [Audio Sample Reconstruction](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-generation-models/audio-sample-reconstruction.md) — Produces high-quality audio waveforms from intermediate representations using specialized neural vocoders. ([source](https://github.com/open-mmlab/amphion#readme))
- [Audio Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-tokenization.md) — Implements discrete token encoding to decompose complex audio signals into efficient sequences for generative modeling.
- [End-to-End Speech Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/end-to-end-speech-synthesis.md) — Provides an integrated neural network architecture that maps text inputs directly to audio waveforms.
- [Zero-Shot Voice Cloning](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/zero-shot-voice-cloning.md) — Implements zero-shot voice cloning to synthesize speech from short reference clips without retraining.
- [Generative Audio Research](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-audio-research.md) — Provides a framework for building and evaluating reproducible generative audio models and soundscapes.
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Synthesizes natural human speech from text inputs using end-to-end and zero-shot architectures. ([source](https://github.com/open-mmlab/amphion#readme))
- [Voice Identity Conversions](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions.md) — Modifies the identity of a speaker in audio clips while maintaining original style and rhythm. ([source](https://github.com/open-mmlab/amphion#readme))
- [Audio Dataset Preprocessing](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-preprocessing-tools/audio-dataset-preprocessing.md) — Unifies the cleaning and preparation of various open-source audio datasets and raw speech data. ([source](https://github.com/open-mmlab/amphion#readme))
- [Latent Diffusion Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-models/latent-diffusion-models.md) — Utilizes latent diffusion models to generate high-fidelity audio soundscapes by denoising within a compressed latent space.
- [Dataset Curation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-datasets/dataset-curation-tools.md) — Provides tools for cleaning, formatting, and preparing diverse audio and speech datasets for machine learning training.
- [Speech Accent Transformation](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions/speech-accent-transformation.md) — Transforms the accent of a speaker's voice to match a target accent without requiring previous training. ([source](https://github.com/open-mmlab/amphion#readme))