Abogen is a text-to-speech audiobook generator that transforms digital documents and subtitle files into audiobooks. It utilizes language models to perform text normalization, rewriting contractions and punctuation to produce more natural speech synthesis.
The system features a voice profile mixer that blends multiple voice models using adjustable weight ratios to create personalized synthetic voices. It also includes an automated export system that sends completed audio files and metadata to a remote Audiobookshelf server via a web API.
The project manages the end-to-end audiobook production workflow, covering document-to-audio conversion, chapter segmentation, and the embedding of metadata such as titles, authors, and cover art. It supports batch processing through a queue-based model and can generate synchronized subtitle files that match the timing of the generated speech.