# hexgrad/kokoro

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/hexgrad-kokoro).**

5,729 stars · 658 forks · JavaScript · apache-2.0

## Links

- GitHub: https://github.com/hexgrad/kokoro
- awesome-repositories: https://awesome-repositories.com/repository/hexgrad-kokoro.md

## Description

Kokoro is a lightweight neural text-to-speech engine that converts written text into spoken audio using a compact model designed for fast inference. It supports multiple languages through language-specific grapheme-to-phoneme conversion pipelines, and offers voice profile selection to change the character of the generated speech.

The engine provides GPU acceleration on Apple Silicon hardware by setting a single environment variable, enabling faster inference on Mac M-series machines. It also includes pattern-based text segmentation, allowing input text to be split at user-defined delimiters to produce separate audio segments, and speed-adjustable playback controlled by a multiplier parameter.

Generated speech can be exported directly to WAV files for offline storage and further processing. The project is implemented in JavaScript and provides a complete text-to-speech pipeline with minimal dependencies.

## Tags

### Artificial Intelligence & ML

- [Neural Text-to-Speech Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/neural-text-to-speech-engines.md) — Ships a lightweight neural text-to-speech engine that converts text into natural-sounding speech.
- [Multi-Language Speech Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/speech-synthesis/cross-lingual-speech-generators/multi-language-speech-generators.md) — Produces speech in multiple languages using language-specific pipelines and grapheme-to-phoneme conversion. ([source](https://github.com/hexgrad/kokoro#readme))
- [Grapheme To Phoneme Conversion](https://awesome-repositories.com/f/artificial-intelligence-ml/grapheme-to-phoneme-conversion.md) — Transforms written text into phonetic representations for accurate pronunciation across multiple languages.
- [Voice Identity Selections](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/voice-synthesis/modular-voice-configurations/voice-identity-selections.md) — Chooses from multiple voice profiles to change the character of the spoken audio. ([source](https://github.com/hexgrad/kokoro#readme))
- [Speech Synthesis Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-synthesis-engines.md) — Provides a compact neural TTS engine designed for fast inference on CPU and Apple Silicon.
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Converts plain text into natural-sounding speech using a lightweight neural model. ([source](https://github.com/hexgrad/kokoro#readme))
- [Multi-Language Speech Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-to-speech-models/speech-to-speech-frameworks/speech-integration-engines/vision-language-speech-integrations/multi-language-speech-generators.md) — Produces speech in multiple languages with language-specific pipelines and grapheme-to-phoneme conversion.
- [Multi-Language Voice Profiles](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-profile-management/multi-language-voice-profiles.md) — Supports multiple languages with separate voice profiles and language-specific phoneme mappings.
- [Voice Profile Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/conversational-voice-interaction/voice-agents/voice-profile-managers.md) — Offers multiple voice profiles to change the character and tone of generated speech output.
- [Apple Silicon GPU Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/apple-hardware-acceleration/apple-silicon-gpu-accelerators.md) — Accelerates speech synthesis inference on Apple M-series hardware by enabling GPU acceleration.
- [Apple Silicon GPU Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference/apple-silicon-gpu-accelerators.md) — Provides GPU acceleration for speech synthesis inference on Apple Silicon via a single environment variable.
- [Speech-to-Speech Models](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-to-speech-models.md) — Writes generated speech output directly to a WAV file on disk for later use. ([source](https://github.com/hexgrad/kokoro#readme))

### Software Engineering & Architecture

- [Grapheme-to-Phoneme Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/infrastructure-configuration-languages/multi-language-support/multi-language-pipeline-orchestration/grapheme-to-phoneme-pipelines.md) — Implements language-specific grapheme-to-phoneme conversion pipelines for multi-language speech generation.

### Part of an Awesome List

- [Audio Exporters](https://awesome-repositories.com/f/awesome-lists/devtools/audio-file-handling/audio-exporters.md) — Saves generated speech directly to WAV files for offline storage and further processing.
