Xiaozhi Esp32 Server | Awesome Repository

This project is an AI voice assistant backend and gateway server designed to connect ESP32 hardware to large language models. It enables real-time conversational AI by processing streaming speech-to-text and text-to-speech interactions, allowing hardware devices to engage in natural language dialogue.

The system is distinguished by a modular plugin framework that loads custom feature extensions at runtime and a retrieval-augmented generation engine that queries external knowledge bases for factual accuracy. It further personalizes interactions by using voiceprint mapping to identify individual speakers and maintain long-term contextual memory.

The platform covers a broad capability surface including IoT hardware control via function calling, secure device authentication using bearer tokens, and full-duplex communication through WebSockets. It also provides a web interface for system and device management, overseeing configuration and gateway traffic routing.

Features

Conversational Voice AI - Implements an end-to-end conversational AI system integrating STT, LLMs, and TTS for hardware devices.
Knowledge Base Retrieval - Integrates a retrieval-augmented generation engine that queries external knowledge bases to ensure factual accuracy.
LLM Orchestrators - Orchestrates the connection between large language models, external plugins, and IoT hardware for multi-modal interaction.
LLM Tool Calling - Maps natural language intents to executable functions for hardware control and data retrieval.

Features

Conversational Voice AI - Implements an end-to-end conversational AI system integrating STT, LLMs, and TTS for hardware devices.
Knowledge Base Retrieval - Integrates a retrieval-augmented generation engine that queries external knowledge bases to ensure factual accuracy.
LLM Orchestrators - Orchestrates the connection between large language models, external plugins, and IoT hardware for multi-modal interaction.
LLM Tool Calling - Maps natural language intents to executable functions for hardware control and data retrieval.