Kokoro-FastAPI is a text-to-speech API and LLM speech synthesis server that generates spoken audio from text via a REST interface. It functions as a Kubernetes-native deployment designed for orchestrated speech synthesis.
The system includes a voice blending engine that creates unique vocal profiles by mixing multiple existing voices using custom weight ratios.
The service provides real-time audio streaming to reduce latency and generates word-level timestamps for speech synchronization. It manages hardware efficiency through on-demand model loading to optimize VRAM usage and includes system resource monitoring for tracking CPU and GPU states.
Deployment is supported via Helm charts for installation within containerized clusters.