# supertone-inc/supertonic

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/supertone-inc-supertonic).**

2,626 stars · 234 forks · C++ · mit

## Links

- GitHub: https://github.com/supertone-inc/supertonic
- Homepage: https://huggingface.co/spaces/Supertone/supertonic-2
- awesome-repositories: https://awesome-repositories.com/repository/supertone-inc-supertonic.md

## Topics

`cpp` `csharp` `go` `ios` `java` `lightweight` `nodejs` `on-device` `python` `rust` `swift` `text-to-speech` `tts` `web`

## Description

Supertonic is an on-device neural text-to-speech engine that runs entirely locally without cloud dependencies or GPU acceleration. It converts written text into natural-sounding speech across 31 languages with automatic language detection and a fallback model for unsupported locales.

The engine provides expressive speech control through inline prosody tags that dynamically adjust pitch, rate, and tone during synthesis. It supports voice cloning from a short reference audio clip by extracting a speaker embedding vector, and offers a selection of pre-built voices tuned for different use cases. Synthesis quality and playback speed are configurable, allowing trade-offs between latency and audio fidelity.

Supertonic includes a local HTTP server that exposes an endpoint compatible with the OpenAI Audio Speech API specification for drop-in integration with external tools and applications. A command-line interface is also available for generating audio files directly from text with configurable voice, quality, language, and style parameters. The engine applies rule-based and ML-based text normalization for numbers, dates, currencies, and units across supported languages.

## Tags

### Artificial Intelligence & ML

- [Command-Line Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/command-line-speech-synthesizers.md) — Generates audio files from text directly in the terminal with options for voice selection, quality, and language.
- [Compact Neural TTS Models](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/compact-neural-tts-models.md) — Ships a compact neural TTS model that runs entirely on-device without cloud or GPU dependencies.
- [Text-to-Speech Conversions](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-and-text-conversion/text-to-speech-conversions.md) — Converts written text into natural-sounding speech using a compact on-device model supporting 31 languages. ([source](https://supertone-inc.github.io/supertonic-py))
- [Multilingual Text-to-Speech Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-and-text-conversion/text-to-speech-conversions/multilingual-text-to-speech-engines.md) — Converts written text into natural-sounding speech across 31 languages with automatic language detection.
- [CLI Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/cli-speech-synthesizers.md) — Generates audio files from text via a command-line interface with configurable voice, quality, and language. ([source](https://supertone-inc.github.io/supertonic-py/))
- [Multilingual Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/cli-speech-synthesizers/multilingual-speech-synthesizers.md) — Converts written text into natural-sounding speech across 31 languages with automatic language detection.
- [On-Device Text-to-Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/on-device-text-to-speech-synthesizers.md) — Provides a compact neural TTS model that runs entirely on-device without cloud dependencies or GPU acceleration.
- [Expressive Prosody Controls](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-synthesis-controls/expressive-prosody-controls.md) — Applies inline tags to alter tone, pitch, or rhythm for natural human expression in generated audio.
- [Zero-Shot Voice Cloning](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/zero-shot-voice-cloning.md) — Loads a voice style from a JSON file, enabling zero-shot voice cloning from a short reference clip. ([source](https://supertone-inc.github.io/supertonic-py/))
- [Voice Identity Selections](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/voice-synthesis/modular-voice-configurations/voice-identity-selections.md) — Offers 10 distinct male and female voices tuned for specific tonal qualities and use cases. ([source](https://supertone-inc.github.io/supertonic-py/voices/))
- [Voice Cloning](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning.md) — Extracts speaker embeddings from short audio clips to clone voice characteristics without fine-tuning.

### Networking & Communication

- [OpenAI-Compatible Audio Servers](https://awesome-repositories.com/f/networking-communication/local-http-servers/openai-compatible-audio-servers.md) — Exposes a local HTTP server implementing the OpenAI Audio API specification for drop-in TTS integration. ([source](https://supertone-inc.github.io/supertonic-py))

### User Interface & Experience

- [Inline-Tag Styling](https://awesome-repositories.com/f/user-interface-experience/inline-styling-systems/inline-tag-styling.md) — Parses embedded tags in input text to dynamically adjust pitch, rate, and tone during synthesis.
- [Prosody Tag Parsers](https://awesome-repositories.com/f/user-interface-experience/inline-styling-systems/inline-tag-styling/prosody-tag-parsers.md) — Parses inline prosody tags to dynamically adjust pitch, rate, and tone during speech synthesis.

### Part of an Awesome List

- [Voice Embedding Precomputations](https://awesome-repositories.com/f/awesome-lists/media/voice-processing/voice-embedding-precomputations.md) — Extracts a speaker embedding vector from a short audio clip to clone voice characteristics without fine-tuning.
- [AI Tools](https://awesome-repositories.com/f/awesome-lists/ai/ai-tools.md) — High-speed local TTS engine.

### Data & Databases

- [Multilingual Normalization Pipelines](https://awesome-repositories.com/f/data-databases/text-normalization/multilingual-normalization-pipelines.md) — Applies rule-based and ML-based normalization for numbers, dates, currencies, and units across 31 languages.

### Graphics & Multimedia

- [Speed-Quality Tradeoffs](https://awesome-repositories.com/f/graphics-multimedia/game-graphics-upscalers/quality-tuning/speed-quality-tradeoffs.md) — Offers configurable quality tiers and playback speed parameters that trade off latency against audio fidelity.
