# xinnan-tech/xiaozhi-esp32-server

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/xinnan-tech-xiaozhi-esp32-server).**

8,627 stars · 2,947 forks · JavaScript · mit

## Links

- GitHub: https://github.com/xinnan-tech/xiaozhi-esp32-server
- Homepage: http://xiaozhi.biz
- awesome-repositories: https://awesome-repositories.com/repository/xinnan-tech-xiaozhi-esp32-server.md

## Topics

`dify` `esp32` `mcp-server` `xiaozhi` `xiaozhi-ai` `xiaozhi-esp32` `xiaozhi-server`

## Description

This project is an AI voice assistant backend and gateway server designed to connect ESP32 hardware to large language models. It enables real-time conversational AI by processing streaming speech-to-text and text-to-speech interactions, allowing hardware devices to engage in natural language dialogue.

The system is distinguished by a modular plugin framework that loads custom feature extensions at runtime and a retrieval-augmented generation engine that queries external knowledge bases for factual accuracy. It further personalizes interactions by using voiceprint mapping to identify individual speakers and maintain long-term contextual memory.

The platform covers a broad capability surface including IoT hardware control via function calling, secure device authentication using bearer tokens, and full-duplex communication through WebSockets. It also provides a web interface for system and device management, overseeing configuration and gateway traffic routing.

## Tags

### Artificial Intelligence & ML

- [Conversational Voice AI](https://awesome-repositories.com/f/artificial-intelligence-ml/conversational-voice-ai.md) — Implements an end-to-end conversational AI system integrating STT, LLMs, and TTS for hardware devices.
- [Knowledge Base Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-rag-development/knowledge-base-retrieval.md) — Integrates a retrieval-augmented generation engine that queries external knowledge bases to ensure factual accuracy.
- [LLM Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-orchestrators.md) — Orchestrates the connection between large language models, external plugins, and IoT hardware for multi-modal interaction.
- [LLM Tool Calling](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-tool-calling.md) — Maps natural language intents to executable functions for hardware control and data retrieval.
- [Full-Duplex Multimodal Interaction](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-processing/full-duplex-multimodal-interaction.md) — Maintains persistent bidirectional connections for real-time streaming of audio and control data.
- [Conversational Dialogue Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-interfaces/conversational-dialogue-systems.md) — Implements a system for natural, real-time conversational dialogue between users and AI-powered hardware. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))
- [Personalized Assistants](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-assistants/personalized-assistants.md) — Implements voiceprint mapping to identify individual speakers and deliver personalized responses and memory.
- [Speaker Verification](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-embeddings/speaker-verification.md) — Matches incoming audio signatures against stored voiceprints to verify and identify the speaker.
- [Speaker Identification Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-identification-frameworks.md) — Identifies individual users by their voiceprints to deliver personalized AI responses. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))

### Part of an Awesome List

- [LLM Gateways](https://awesome-repositories.com/f/awesome-lists/devtools/hardware-interaction/hardware-servers/llm-gateways.md) — Acts as a dedicated gateway server connecting ESP32 hardware to large language models for voice interactions.
- [IoT and Hardware Control](https://awesome-repositories.com/f/awesome-lists/devtools/iot-and-hardware-control.md) — Sends programmatic commands from a central server to control physical components on ESP32 devices.

### Graphics & Multimedia

- [Speech-to-Text Pipelines](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-processing/speech-to-text-pipelines.md) — Processes spoken audio in real-time chunks to minimize latency during voice-to-text conversion.
- [Streaming Pipelines](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-processing/text-to-speech-engines/streaming-pipelines.md) — Provides real-time streaming speech-to-text and text-to-speech processing for fluid voice interactions. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))

### Hardware & IoT

- [Device Management](https://awesome-repositories.com/f/hardware-iot/connectivity-iot/internet-of-things/device-management.md) — Manages security keys and gateway routing to maintain secure connections for a fleet of IoT devices. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))
- [Remote Hardware Controllers](https://awesome-repositories.com/f/hardware-iot/remote-hardware-controllers.md) — Enables control of physical ESP32 hardware states through function calling and communication protocols. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))

### Networking & Communication

- [Communication Gateways](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-infrastructure-configuration/network-and-server-infrastructure/communication-gateways.md) — Routes device communication through MQTT and UDP gateways using dynamically delivered endpoints. ([source](https://github.com/xinnan-tech/xiaozhi-esp32-server/blob/main/docs/mqtt-gateway-integration.md))

### Security & Cryptography

- [Bearer Token Authentication](https://awesome-repositories.com/f/security-cryptography/bearer-token-authentication.md) — Uses unique bearer tokens and identifiers to secure the communication between ESP32 devices and the server.
- [Device and Connection Authorization](https://awesome-repositories.com/f/security-cryptography/identity-access-management/access-control/device-connection-authorization.md) — Authenticates hardware devices via tokens and unique IDs to ensure secure persistent connections. ([source](https://ccnphfhqs21z.feishu.cn/wiki/M0XiwldO9iJwHikpXD5cEx71nKh))

### Software Engineering & Architecture

- [Extensible Plugin Architectures](https://awesome-repositories.com/f/software-engineering-architecture/extensible-plugin-architectures.md) — Provides a modular framework for extending system functionality via runtime plugins. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))
- [Intent-to-Skill Mappings](https://awesome-repositories.com/f/software-engineering-architecture/intent-based-coordination/intent-to-skill-mappings.md) — Translates natural language user intents into specific programmatic actions and tool executions. ([source](https://cdn.jsdelivr.net/gh/xinnan-tech/xiaozhi-esp32-server@main/README.md))
- [Runtime Capability Extensions](https://awesome-repositories.com/f/software-engineering-architecture/modular-extension-systems/runtime-capability-extensions.md) — Allows the dynamic loading of modular feature sets at runtime without restarting the server.
- [Modular Plugin Frameworks](https://awesome-repositories.com/f/software-engineering-architecture/modular-plugin-frameworks.md) — Features a modular framework that loads custom feature plugins at runtime to extend device capabilities.
