# 2noise/chattts

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/2noise-chattts).**

39,464 stars · 4,246 forks · Python · AGPL-3.0

## Links

- GitHub: https://github.com/2noise/ChatTTS
- Homepage: https://2noise.com
- awesome-repositories: https://awesome-repositories.com/repository/2noise-chattts.md

## Topics

`agent` `chat` `chatgpt` `chattts` `chinese` `chinese-language` `english` `english-language` `gpt` `llm` `llm-agent` `natural-language-inference` `python` `text-to-speech` `torch` `torchaudio` `tts`

## Description

ChatTTS is a conversational text-to-speech generative model designed to convert written dialogue into natural sounding audio. It functions as a multilingual speech synthesis framework capable of producing human-like audio across different languages and speaker profiles.

The system is distinguished by its ability to generate interactive dialogue with realistic vocal nuances. It utilizes a speech nuance controller to insert specific tokens that trigger non-verbal elements, such as laughter, pauses, and interjections, during the synthesis process.

The project includes a streaming audio generator that delivers speech incrementally to reduce latency. It further supports multi-speaker embeddings to maintain consistent vocal characteristics throughout a conversation.

## Tags

### Artificial Intelligence & ML

- [Conversational Audio Streams](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/conversational-audio-streams.md) — Provides a generative model for natural, multi-speaker interactive dialogue and conversational audio streams. ([source](https://github.com/2noise/chattts#readme))
- [Audio Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-tokenization.md) — Converts raw audio waveforms into discrete numerical codes for processing by the language model.
- [Autoregressive Transformers](https://awesome-repositories.com/f/artificial-intelligence-ml/autoregressive-transformers.md) — Implements an autoregressive transformer architecture to predict audio tokens for sequential speech generation.
- [Prosody Control Tokens](https://awesome-repositories.com/f/artificial-intelligence-ml/latent-conditioning-mechanisms/prosody-control-tokens.md) — Inserts specialized control tokens to trigger non-verbal vocal behaviors like laughter and pauses.
- [Speaker Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-embeddings.md) — Uses learned vector representations to maintain consistent vocal characteristics across different speakers.
- [Multilingual Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-synthesis-models/multilingual-synthesis.md) — Provides a generative framework capable of producing human-like speech across multiple languages.
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Synthesizes natural human speech from written dialogue with human-like rhythms and nuances.
- [Latent Acoustic Mapping](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/latent-acoustic-mapping.md) — Maps natural language input to a latent space to guide the generation of acoustic features.
- [Vocal](https://awesome-repositories.com/f/artificial-intelligence-ml/performance-tuning/vocal.md) — Refines audio output by adding human-like non-verbal elements such as laughter and interjections.
- [Speech Synthesis Markup Controls](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-synthesis-markup-controls.md) — Uses markup-like tokens to control prosody and insert fine-grained vocal elements like laughter. ([source](https://github.com/2noise/chattts#readme))
- [Vocal Nuance Controllers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/vocal-nuance-controllers.md) — Implements a controller that inserts vocal tokens to trigger specific audio elements like laughter.

### Development Tools & Productivity

- [Spoken Dialogue Generation](https://awesome-repositories.com/f/development-tools-productivity/interactive-execution-interfaces/dialogue-interaction-engines/spoken-dialogue-generation.md) — Creates spoken audio for multi-speaker interactions including realistic pauses and emotional cues.

### Graphics & Multimedia

- [Chunked Audio Streaming](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming.md) — Streams audio output in chunks incrementally to minimize latency during speech generation. ([source](https://github.com/2noise/chattts#readme))
- [Generative Audio Chunking](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/generative-audio-chunking.md) — Yields audio waveform chunks sequentially during generation to enable immediate low-latency playback.

### Part of an Awesome List

- [Additional AI Tools](https://awesome-repositories.com/f/awesome-lists/ai/additional-ai-tools.md) — Generative TTS model optimized for natural, expressive daily dialogue with fine-grained prosody control.
- [Core Models](https://awesome-repositories.com/f/awesome-lists/ai/core-models.md) — The official repository for the core model implementation.
