# remsky/kokoro-fastapi

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/remsky-kokoro-fastapi).**

4,422 stars · 736 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/remsky/Kokoro-FastAPI
- awesome-repositories: https://awesome-repositories.com/repository/remsky-kokoro-fastapi.md

## Topics

`fastapi` `huggingface-spaces` `kokoro` `kokoro-tts` `onnx` `onnxruntime` `openai-compatible-api` `openwebui` `pytorch` `sillytavern` `tts` `tts-api` `uv`

## Description

Kokoro-FastAPI is a text-to-speech API and LLM speech synthesis server that generates spoken audio from text via a REST interface. It functions as a Kubernetes-native deployment designed for orchestrated speech synthesis.

The system includes a voice blending engine that creates unique vocal profiles by mixing multiple existing voices using custom weight ratios.

The service provides real-time audio streaming to reduce latency and generates word-level timestamps for speech synchronization. It manages hardware efficiency through on-demand model loading to optimize VRAM usage and includes system resource monitoring for tracking CPU and GPU states.

Deployment is supported via Helm charts for installation within containerized clusters.

## Tags

### Artificial Intelligence & ML

- [Text-to-Speech Conversions](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-and-text-conversion/text-to-speech-conversions.md) — Provides a high-quality text-to-speech API for converting written text into spoken audio.
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Functions as a high-fidelity generative synthesis server that converts written text into spoken audio. ([source](https://cdn.jsdelivr.net/gh/remsky/kokoro-fastapi@master/README.md))
- [GPU Memory Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-memory-optimizers.md) — Manages VRAM consumption to prevent exhaustion by dynamically reloading models during request processing.
- [Grapheme To Phoneme Conversion](https://awesome-repositories.com/f/artificial-intelligence-ml/grapheme-to-phoneme-conversion.md) — Transforms raw input text into phonetic representations and token IDs before passing them to the synthesis engine. ([source](https://cdn.jsdelivr.net/gh/remsky/kokoro-fastapi@master/README.md))
- [Model API Gateways](https://awesome-repositories.com/f/artificial-intelligence-ml/model-api-gateways.md) — Exposes the underlying synthesis model and monitoring tools through a FastAPI-based REST gateway.
- [Voice Identity Interpolators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-reconstruction/weight-interpolators/voice-identity-interpolators.md) — Synthesizes unique vocal profiles by interpolating voice embedding vectors based on custom weight ratios.
- [Speech Synthesis Services](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-synthesis-services.md) — Serves as a backend synthesis server that transforms text to phonemes and high-fidelity audio.
- [Phoneme-Based Speech Processors](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/phoneme-based-speech-processors.md) — Uses a phoneme-based pipeline to convert raw text into phonetic representations for consistent speech synthesis.
- [Hybrid Voice Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions/hybrid-voice-synthesis.md) — Includes a specialized engine for blending multiple speaker characteristics into a unique hybrid voice.
- [Synthetic Voice Design](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning/voice-identity-conversions/synthetic-voice-design.md) — Creates specialized vocal identities by blending multiple existing voices using specific weight ratios.
- [VRAM Offloading](https://awesome-repositories.com/f/artificial-intelligence-ml/vram-offloading.md) — Implements VRAM optimization by unloading models to system memory during idle periods. ([source](https://cdn.jsdelivr.net/gh/remsky/kokoro-fastapi@master/README.md))
- [OpenAI-Compatible APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/model-integration-interfaces/ai-integration-apis/openai-compatible-apis.md) — Implements a standardized external interface for text-to-speech generation compatible with the OpenAI API specification. ([source](https://github.com/remsky/Kokoro-FastAPI/wiki/Integrations-SillyTavern))
- [Word-Level Timestamps](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/word-level-timestamps.md) — Generates precise word-level timing metadata to synchronize spoken audio with on-screen text or animations. ([source](https://cdn.jsdelivr.net/gh/remsky/kokoro-fastapi@master/README.md))
- [Speech Synthesis Markup](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-emphasis-controls/speech-synthesis-markup.md) — Provides inline markup tags to control pacing, pauses, and specific pronunciations within synthesized speech. ([source](https://cdn.jsdelivr.net/gh/remsky/kokoro-fastapi@master/README.md))

### Development Tools & Productivity

- [Model Weight Offloading](https://awesome-repositories.com/f/development-tools-productivity/package-managers/dependency/optional-dependency-managers/on-demand-library-loading/model-weight-offloading.md) — Optimizes GPU memory efficiency by unloading model weights from VRAM during idle periods and reloading them on demand.

### Graphics & Multimedia

- [Real-time Synthesis Streaming](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/real-time-synthesis-streaming.md) — Delivers synthesized speech as a continuous audio stream to minimize the time to first byte.

### Web Development

- [Response Streaming](https://awesome-repositories.com/f/web-development/backend-development/request-response-handling/http-response-handling/response-streaming.md) — Provides real-time audio streaming by sending synthesized speech chunks incrementally to reduce latency.

### Part of an Awesome List

- [Speech Boundary Timestamps](https://awesome-repositories.com/f/awesome-lists/media/audio-and-speech-models/speech-boundary-timestamps.md) — Generates precise word-level timestamps to synchronize spoken audio with text or animations.

### DevOps & Infrastructure

- [Helm Chart Deployment](https://awesome-repositories.com/f/devops-infrastructure/helm-chart-management/helm-chart-deployment.md) — Ships predefined Helm charts to automate the deployment and configuration of the synthesis service on Kubernetes.
- [Kubernetes Application Deployments](https://awesome-repositories.com/f/devops-infrastructure/kubernetes-deployments/kubernetes-application-deployments.md) — Provides automated workflows for deploying scalable speech synthesis services via Helm charts in Kubernetes.
