KittenTTS is a neural text-to-speech engine and text-to-audio synthesis tool that converts written text into spoken audio using lightweight neural network models. It functions as both a speech synthesizer and an audio file generator, producing spoken audio for offline playback.
The system includes a text normalization processor that expands numbers and abbreviations into full spoken words to improve the naturalness of the synthesized speech. It supports diverse voice options and provides the ability to adjust playback speed.