Podcastfy is an AI content-to-podcast generator that converts text, URLs, PDFs, images, and videos into conversational audio podcasts. It integrates with over 100 language models for transcript creation and multiple text-to-speech engines for audio output, with support for customizable dialogue style and optional local transcript generation for privacy.
The project distinguishes itself through a flexible architecture that decouples job submission from result retrieval via asynchronous polling, normalizes heterogeneous inputs into uniform text, and routes content through pluggable LLM and TTS backends with template-driven dialogue assembly. Users can customize conversation tone, speaker roles, dialogue structure, and creativity level through configuration files, and can run transcript generation locally using a local language model for greater privacy and offline use.
Beyond core podcast generation, the system supports content extraction from websites, videos, images, and documents, multilingual audio generation, Q&A content generation from text, and topic-based podcast creation through real-time web search. It also offers transcript-only generation and the ability to produce audio from pre-written transcripts.