ACE Step 1.5 is a local text-to-music generation and audio editing system that runs on consumer hardware. It transforms plain-language descriptions into full-length songs with lyrics, and can edit existing audio through cover generation, vocal removal, track separation, and selective repainting. The system supports multilingual prompts and lyrics in over 50 languages, and provides precise control over musical structure including duration, BPM, key, and time signature. The project distinguishes itself through a dual-stream diffusion architecture that processes separate latent streams for vocal
Jukebox is a generative audio model and AI music synthesis tool designed to create high-fidelity music samples and singing voices. It functions as a deep learning system that synthesizes raw audio conditioned on genre and artist metadata, utilizing a neural audio codec to convert raw audio into discrete codes for generative modeling and reconstruction. The system enables musical style steering and AI music composition by conditioning generation on specific artists, genres, and lyrics. It supports audio priming, allowing existing wave files to guide the creation of new musical sequences, and p
This Python SDK provides a comprehensive toolkit for synthetic audio generation, voice cloning, and the development of conversational AI agents. It enables the creation of lifelike spoken audio from text, the replication of human voices through custom cloning, and the deployment of real-time voice agents capable of interacting with external large language models. The library distinguishes itself through deep integration of conversational AI capabilities, including the design of agent personas and the execution of real-time actions via APIs. It supports professional-grade audio production thro
Genkit is an open-source framework for building AI-powered applications. It provides a unified interface for connecting to hundreds of generative AI models from multiple providers, enabling text, image, audio, and video generation through a single API. The framework structures multi-step AI interactions—including chat, retrieval-augmented generation, tool use, and agentic workflows—as composable, traceable flows with built-in streaming and state management. The framework distinguishes itself through a comprehensive developer toolkit that includes a command-line interface and a local developer