Ten Framework | Awesome Repository

Ten Framework is a multimodal large language model agent framework designed for building low-latency conversational agents. It integrates voice, text, and visual inputs in real time to facilitate human interaction.

The project includes a real-time speech processing pipeline for streaming transcription, voice activity detection, and speaker diarization. It also features an avatar synchronization engine that coordinates character lip animations and visual outputs with synthesized speech.

The framework covers edge AI deployment through containerized packaging and direct integration with embedded hardware boards. Additional capabilities include a telephony gateway for connecting agents to phone networks via the Session Initiation Protocol and tools for real-time visual generation of sketches and doodles.

Features

Real-Time Conversational AI Frameworks - Provides a low-latency framework for building multimodal conversational agents that integrate STT, LLMs, and TTS.
Voice Activity Detection - Identifies speech segments within audio streams to manage natural turn-taking and conversational flow.
Real-Time Transcription - Converts live audio input into text representations in real time for downstream language model processing.

Features

Real-Time Conversational AI Frameworks - Provides a low-latency framework for building multimodal conversational agents that integrate STT, LLMs, and TTS.
Voice Activity Detection - Identifies speech segments within audio streams to manage natural turn-taking and conversational flow.
Real-Time Transcription - Converts live audio input into text representations in real time for downstream language model processing.