This project is an AI voice assistant backend and gateway server designed to connect ESP32 hardware to large language models. It enables real-time conversational AI by processing streaming speech-to-text and text-to-speech interactions, allowing hardware devices to engage in natural language dialogue.
The system is distinguished by a modular plugin framework that loads custom feature extensions at runtime and a retrieval-augmented generation engine that queries external knowledge bases for factual accuracy. It further personalizes interactions by using voiceprint mapping to identify individual speakers and maintain long-term contextual memory.
The platform covers a broad capability surface including IoT hardware control via function calling, secure device authentication using bearer tokens, and full-duplex communication through WebSockets. It also provides a web interface for system and device management, overseeing configuration and gateway traffic routing.