# ace-step/ace-step

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/ace-step-ace-step).**

4,088 stars · 514 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/ace-step/ACE-Step
- Homepage: https://ace-step.github.io/
- awesome-repositories: https://awesome-repositories.com/repository/ace-step-ace-step.md

## Description

ACE-Step is a high-fidelity audio synthesis system and diffusion model designed to generate music and vocals from text descriptions. It functions as a music generator and vocal synthesizer, using a diffusion transformer decoder to produce audio across various languages and genres.

The project provides tools for text-guided audio editing, including the ability to extend the duration of tracks, regenerate specific song segments, and perform latent-space audio inpainting to modify lyrics or styles. It also includes a framework for audio style fine-tuning using low-rank adaptation to adapt vocal characteristics and musical styles.

The system covers broad capabilities in music production, such as synthesizing instrumental samples and loops, generating vocal accompaniments from recordings, and producing complementary instrument stems based on reference audio. It supports variable-length sequence generation to synthesize audio of custom durations.

## Tags

### Part of an Awesome List

- [Audio and Speech Synthesis](https://awesome-repositories.com/f/awesome-lists/ai/audio-and-speech-synthesis.md) — Synthesizes high-fidelity music and vocals from text descriptions using a diffusion transformer decoder. ([source](https://ace-step.github.io/ace-step-v1.5.github.io/))
- [Text-to-Music Generators](https://awesome-repositories.com/f/awesome-lists/media/music-and-audio-generation/text-to-sound-effect-generation/text-to-music-generators.md) — Synthesizes full songs including composition, lyrics, and style from plain-language text descriptions.
- [AI Audio Segment Modification](https://awesome-repositories.com/f/awesome-lists/devtools/audio-editing/ai-audio-segment-modification.md) — Edits specific parts of a song to change lyrics or style while preserving original melody. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))
- [Generative Audio Extension](https://awesome-repositories.com/f/awesome-lists/devtools/audio-editing/generative-audio-extension.md) — Adds new musical content to the beginning or end of existing tracks to increase duration. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))
- [Music And Audio Generation](https://awesome-repositories.com/f/awesome-lists/media/music-and-audio-generation.md) — Provides capabilities for producing instrumental stems, sound effects, and conceptual loops for music production.
- [Musical Variation Synthesis](https://awesome-repositories.com/f/awesome-lists/media/music-and-audio-generation/musical-variation-synthesis.md) — Generates new versions of a track by adjusting noise ratios to control divergence. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))
- [Text-to-Sound Effect Generation](https://awesome-repositories.com/f/awesome-lists/media/music-and-audio-generation/text-to-sound-effect-generation.md) — Creates conceptual music production elements, loops, and sound effects from text descriptions. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))

### Artificial Intelligence & ML

- [Text-to-Music Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-music-composition/text-to-music-engines.md) — Synthesizes full songs with lyrics and style from plain-language text prompts across various genres. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))
- [AI Vocal Production](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-vocal-production.md) — Creates singing or rap audio from lyrics and adapts vocal styles for musical performances.
- [Audio](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/diffusion-models/audio.md) — Implements a diffusion transformer decoder for generating and editing musical tracks and vocal samples.
- [Audio Inpainting And Editing](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-audio-synthesis/audio-inpainting-and-editing.md) — Provides text-guided audio editing, including duration extension and latent-space inpainting to modify lyrics or styles.
- [Diffusion Transformers](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-architectures/diffusion-transformers.md) — Implements a diffusion transformer decoder to iteratively refine noise into high-fidelity audio signals.
- [AI Audio Regeneration](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-audio-regeneration.md) — Regenerates variations of songs or replaces specific segments and lyrics using AI. ([source](https://ace-step.github.io/))
- [LoRA Style Adapters](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-music-agents/style-based-music-generation/lora-style-adapters.md) — Uses low-rank adaptation to capture and reproduce specific musical styles and vocal characteristics.
- [Audio](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/audio.md) — Adapts pre-trained audio foundation models to custom vocal and musical styles using LoRA. ([source](https://ace-step.github.io/))
- [Low-Rank Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/low-rank-adaptation.md) — Uses low-rank adaptation (LoRA) to efficiently fine-tune vocal characteristics and musical styles.
- [Variable-Length Audio Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/text-sequence-processing/sequence-length-constraints/output-length-modifiers/variable-length-audio-synthesis.md) — Synthesizes high-fidelity audio with adjustable durations instead of fixed-length outputs. ([source](https://ace-step.github.io/))
- [Variable-Length Audio Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation/infinite-length-generators/variable-length-audio-generation.md) — Decouples the generation process from fixed windows to synthesize audio of custom durations.

### Graphics & Multimedia

- [Singing Voice Synthesis](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/singing-voice-synthesis.md) — Synthesizes melodic singing and rap performances from provided lyrics. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))
- [Vocal Synthesizers](https://awesome-repositories.com/f/graphics-multimedia/vocal-synthesizers.md) — Synthesizes high-fidelity singing and rap audio from lyrics with support for style adaptation.
- [Audio Extension and Variation](https://awesome-repositories.com/f/graphics-multimedia/audio-extension-and-variation.md) — Adds content to the length of a track or creates new versions based on audio references.
- [Audio Inpainting and Editing](https://awesome-repositories.com/f/graphics-multimedia/audio-inpainting-and-editing.md) — Modifies specific segments of a song to change lyrics or styles while preserving the rest of the track.
- [AI Instrument Stem Synthesis](https://awesome-repositories.com/f/graphics-multimedia/instrument-routing/instrument-track-creation/ai-instrument-stem-synthesis.md) — Produces individual instrument tracks that complement and match a provided reference track. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))
- [Reference-Driven Synthesis](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-synthesis/reference-driven-synthesis.md) — Synthesizes complementary instrument stems by conditioning the model on reference audio latent features.
- [Audio Latent Inpainting](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/face-portrait-manipulation/image-masking/face-mask-generation/latent-inpainting-masks/audio-latent-inpainting.md) — Provides latent-space audio inpainting to modify lyrics or styles within specific song segments.
- [Generative Vocal Accompaniment](https://awesome-repositories.com/f/graphics-multimedia/vocal-artifact-removal/vocal-to-instrumental-converters/generative-vocal-accompaniment.md) — Creates a full instrumental backing track based on an input vocal recording. ([source](https://cdn.jsdelivr.net/gh/ace-step/ace-step@main/README.md))