# kyutai-labs/moshi

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/kyutai-labs-moshi).**

9,672 stars · 890 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/kyutai-labs/moshi
- awesome-repositories: https://awesome-repositories.com/repository/kyutai-labs-moshi.md

## Description

Moshi is a real-time voice foundation model and speech-to-speech framework designed for bidirectional, low-latency conversations. It functions as a full-duplex voice interface that processes audio and text concurrently in a single stream, enabling natural human-machine dialogue without sequential processing delays.

The system utilizes a neural audio codec to compress high-fidelity audio into low-bitrate tokens for efficient transmission. To manage complex responses and reasoning, it employs internal monologue modeling, which generates a hidden stream of thought tokens alongside audible speech.

The project includes a quantized inference server and a hardware-agnostic backend that supports various environments, including Apple silicon and production GPUs. Operational capabilities cover multi-modal tokenization, asynchronous batch processing, and deployment options such as containerization, secure local tunneling, and a web-based interaction interface.

A command-line interaction client is provided for sending and receiving data from an active inference server.

## Tags

### Artificial Intelligence & ML

- [Full-Duplex Multimodal Interaction](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-processing/full-duplex-multimodal-interaction.md) — Provides a full-duplex architecture that processes simultaneous audio and text streams for real-time conversation.
- [Speech-to-Speech Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/speech-to-speech-models/speech-to-speech-frameworks.md) — Provides a full-duplex framework for bidirectional, low-latency speech-to-speech and text-to-speech conversations.
- [Neural Audio Compression](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-tokenization/neural-audio-compression.md) — Uses neural codecs to compress high-fidelity audio into low-bitrate tokens for efficient real-time transmission. ([source](https://github.com/kyutai-labs/moshi/blob/main/moshi))
- [Streaming Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-tokenization/streaming-tokenization.md) — Processes audio data incrementally to produce tokens in real-time for immediate interaction without waiting for full clips. ([source](https://github.com/kyutai-labs/moshi/blob/main/moshi/README.md))
- [Multi-Modal Tokenizers](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-modal-tokenizers.md) — Converts audio and text into a unified token stream for single-sequence processing by the model.
- [Internal Monologue Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/reasoning-models/internal-monologue-modeling.md) — Generates a hidden stream of thought tokens alongside audible speech to coordinate complex responses and reasoning.
- [Asynchronous Batch Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/asynchronous-batch-processing.md) — Uses binary execution masks to process asynchronous data streams at varying rates while protecting internal state.
- [Local Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/local-model-inference-servers.md) — Hosts quantized voice models on personal hardware with GPU acceleration for private, low-latency interactions.
- [Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/model-inference-servers.md) — Ships a model server with hardware acceleration and optimized memory management for real-time inference requests. ([source](https://github.com/kyutai-labs/moshi/blob/main/rust))
- [Backend-Agnostic Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-research/neural-network-toolkits/backend-agnostic-engines.md) — Decouples neural network operations from hardware to support Apple silicon, GPUs, and research environments.
- [Quantized Inference Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes.md) — Provides a backend runtime optimized for executing compressed models with reduced-precision weights on local hardware.
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Employs int8 precision weight quantization to reduce memory footprints and accelerate local hardware inference.

### Networking & Communication

- [Voice Interaction Engines](https://awesome-repositories.com/f/networking-communication/websocket-clients/real-time-interaction-engines/voice-interaction-engines.md) — Implements a low-latency architecture for bidirectional audio streams between the foundation model and users. ([source](https://github.com/kyutai-labs/moshi/blob/main/FAQ.md))

### Data & Databases

- [Execution Masking](https://awesome-repositories.com/f/data-databases/batch-processing/execution-masking.md) — Manages asynchronous data batches using execution masks to protect the internal state of ignored entries. ([source](https://github.com/kyutai-labs/moshi/blob/main/moshi))

### DevOps & Infrastructure

- [Multi-Backend Execution](https://awesome-repositories.com/f/devops-infrastructure/apple-silicon-deployment/multi-backend-execution.md) — Allows inference execution across different backend engines to support research, Apple silicon, and production GPUs. ([source](https://github.com/kyutai-labs/moshi#readme))

### Web Development

- [Headless Server Hosting](https://awesome-repositories.com/f/web-development/web-infrastructure-deployment/web-infrastructure-servers/web-server-hosting/headless-server-hosting.md) — Operates as a standalone backend server accessible via web interfaces or command-line clients. ([source](https://github.com/kyutai-labs/moshi/blob/main/README.md))