# ten-framework/ten-framework

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/ten-framework-ten-framework).**

10,043 stars · 1,191 forks · Python · other

## Links

- GitHub: https://github.com/TEN-framework/ten-framework
- Homepage: https://agent.theten.ai/
- awesome-repositories: https://awesome-repositories.com/repository/ten-framework-ten-framework.md

## Topics

`ai` `multi-modal` `real-time` `video` `voice`

## Description

Ten Framework is a multimodal large language model agent framework designed for building low-latency conversational agents. It integrates voice, text, and visual inputs in real time to facilitate human interaction.

The project includes a real-time speech processing pipeline for streaming transcription, voice activity detection, and speaker diarization. It also features an avatar synchronization engine that coordinates character lip animations and visual outputs with synthesized speech.

The framework covers edge AI deployment through containerized packaging and direct integration with embedded hardware boards. Additional capabilities include a telephony gateway for connecting agents to phone networks via the Session Initiation Protocol and tools for real-time visual generation of sketches and doodles.

## Tags

### Artificial Intelligence & ML

- [Real-Time Conversational AI Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-conversational-ai-frameworks.md) — Provides a low-latency framework for building multimodal conversational agents that integrate STT, LLMs, and TTS.
- [Voice Activity Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/conversational-voice-interaction/voice-agents/voice-activity-detection.md) — Identifies speech segments within audio streams to manage natural turn-taking and conversational flow. ([source](https://github.com/TEN-framework/ten-framework/blob/main/docs/README-CN.md))
- [Real-Time Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/real-time-transcription.md) — Converts live audio input into text representations in real time for downstream language model processing. ([source](https://github.com/TEN-framework/ten-framework/blob/main/docs/README-CN.md))
- [Multimodal AI Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-ai-orchestrators.md) — Coordinates multiple AI model types including vision, speech, and language into a unified context for agentic workflows.
- [Multimodal Conversational Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-conversational-interfaces.md) — Provides a framework for building conversational agents that integrate text, voice, and audio processing for real-time interaction. ([source](https://github.com/TEN-framework/ten-framework/blob/main/README.md))
- [Full-Duplex Multimodal Interaction](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-processing/full-duplex-multimodal-interaction.md) — Processes simultaneous audio and data flows to enable fluid, real-time conversational turns and natural interruptions.
- [Real-Time Speech Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-speech-processing.md) — Implements a real-time speech processing pipeline for streaming transcription, voice activity detection, and speaker diarization.
- [Unified Speech Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-integrations/unified-speech-pipelines.md) — Ships a unified speech pipeline integrating transcription and activity detection for bidirectional voice interaction.
- [Edge AI Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment.md) — Runs conversational logic and agent services on embedded hardware and edge devices for low-latency interaction.
- [Speaker Diarization](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization.md) — Analyzes real-time audio streams to detect, separate, and label multiple distinct speakers. ([source](https://github.com/TEN-framework/ten-framework/blob/main/docs/README-CN.md))

### DevOps & Infrastructure

- [Containerized Deployments](https://awesome-repositories.com/f/devops-infrastructure/containerized-deployments.md) — Packages customized agent services into images for consistent execution across cloud platforms and virtual machines. ([source](https://github.com/TEN-framework/ten-framework/blob/main/docs/README-CN.md))
- [Containerized Packaging](https://awesome-repositories.com/f/devops-infrastructure/containerized-packaging.md) — Bundles customized agent configurations into portable images for consistent deployment across cloud and edge environments.
- [Embedded Hardware Deployment](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies/execution-platforms-and-targets/hardware-profile-deployments/embedded-hardware-deployment.md) — Integrates agent logic directly onto embedded development boards for communication-driven physical device interaction. ([source](https://github.com/TEN-framework/ten-framework/blob/main/docs/README-ES.md))
- [Edge AI Deployment Pipelines](https://awesome-repositories.com/f/devops-infrastructure/infrastructure/infrastructure-as-code/provisioning-and-deployment/edge-ai-deployment-pipelines.md) — Provides tools for packaging conversational agent logic into containers and deploying them onto embedded hardware boards.

### Hardware & IoT

- [Hardware Integration](https://awesome-repositories.com/f/hardware-iot/integration-performance/hardware-interfacing-integration/hardware-integration.md) — Allows agent logic to run directly on embedded development boards for physical device interactions.

### Networking & Communication

- [Audio-Visual Signal Alignment](https://awesome-repositories.com/f/networking-communication/real-time-synchronization/audio-visual-signal-alignment.md) — Coordinates synthesized audio output with avatar lip movements to ensure precise temporal alignment.

### Software Engineering & Architecture

- [Modular Provider Interfaces](https://awesome-repositories.com/f/software-engineering-architecture/modular-provider-interfaces.md) — Implements a pluggable provider model to decouple multimodal capabilities like speech-to-text and avatar animation from core logic.

### User Interface & Experience

- [Lip Synchronization Engines](https://awesome-repositories.com/f/user-interface-experience/avatars/realtime-avatar-renderers/lip-synchronization-engines.md) — Coordinates character avatar mouth movements with synthesized audio output for realistic speaking digital humans. ([source](https://github.com/TEN-framework/ten-framework/blob/main/README.md))
- [Conversational Avatar Animators](https://awesome-repositories.com/f/user-interface-experience/keyboard-input-visualizers/input-reactive-character-animators/conversational-avatar-animators.md) — Synchronizes character movements and facial expressions with LLM-driven conversational audio.
