# blaizzy/mlx-audio

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/blaizzy-mlx-audio).**

5,994 stars · 446 forks · Python · mit

## Links

- GitHub: https://github.com/Blaizzy/mlx-audio
- awesome-repositories: https://awesome-repositories.com/repository/blaizzy-mlx-audio.md

## Topics

`apple-silicon` `audio-processing` `mlx` `multimodal` `speech-recognition` `speech-synthesis` `speech-to-text` `text-to-speech` `transformers`

## Description

mlx-audio is an audio processing toolkit built on Apple MLX that provides speech transcription, text-to-speech synthesis, voice cloning, and audio source separation using local models. It offers an OpenAI-compatible REST API and web interface for running audio generation and transcription tasks, enabling drop-in integration with existing tools that follow that endpoint structure.

The toolkit supports text-prompted audio source separation, allowing specific sounds to be isolated from mixed recordings based on natural language descriptions. It also provides voice cloning from a short reference audio sample, speech enhancement through noise reduction, and voice activity detection with speaker diarization to distinguish between different speakers in recordings.

Additional capabilities include speech-to-text transcription with word-level timestamp alignment, streaming audio generation that outputs results incrementally, and model weight quantization to reduce memory footprint and accelerate inference. The system manages multiple models through a unified interface and supports WebSocket audio transport for low-latency communication.

## Tags

### Artificial Intelligence & ML

- [Speech Processing Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-processing-toolkits.md) — An audio toolkit built on Apple MLX for speech transcription, text-to-speech, voice cloning, and source separation.
- [OpenAI-Compatible APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/model-integration-interfaces/ai-integration-apis/openai-compatible-apis.md) — Exposes audio processing capabilities through an OpenAI-compatible REST API for drop-in integration.
- [Audio Source Separation Models](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-source-separation-models.md) — Isolates specific sounds from mixed audio files using natural language text prompts.
- [Text-Prompted Separators](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-source-separation-models/source-separation-tools/text-prompted-separators.md) — Isolates specific sounds from mixed audio files using natural language text prompts.
- [Audio Model Hubs](https://awesome-repositories.com/f/artificial-intelligence-ml/model-abstraction-layers/model-abstraction-layers/audio-model-hubs.md) — Provides a unified interface for loading and switching between multiple audio processing models.
- [Speech to Text Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-transcription.md) — Transcribes spoken audio into written text using multilingual speech recognition with word-level timestamps. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Provides multilingual text-to-speech synthesis with voice cloning and streaming capabilities. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))
- [Voice Cloning](https://awesome-repositories.com/f/artificial-intelligence-ml/voice-cloning.md) — Replicates a speaker's voice from a short reference audio sample for personalized speech generation. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))
- [Word-Level Timestamps](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/word-level-timestamps.md) — Provides word-level timestamp alignment for transcribed speech, mapping each word to precise audio positions. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))
- [Speaker Diarizers](https://awesome-repositories.com/f/artificial-intelligence-ml/detection-error-handling/voice-activity-detection/speaker-diarizers.md) — Detects speech segments and distinguishes between different speakers in audio recordings.
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Reduces model weight precision to decrease size and accelerate inference for audio processing models.
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Loads model weights in reduced-precision formats at runtime to decrease memory footprint and accelerate inference.
- [Speech Enhancers](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/speech-enhancers.md) — Cleans noisy audio recordings and distinguishes between different speakers for clearer speech output.
- [Voice Activity Detectors](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/voice-activity-detectors.md) — Detects speech segments and distinguishes between different speakers in audio recordings. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))

### Graphics & Multimedia

- [Streaming Audio Generators](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/generative-audio-chunking/streaming-audio-generators.md) — Ships a streaming audio generation pipeline that outputs results incrementally for low-latency playback. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))
- [Audio Streaming Pipelines](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/generative-audio-chunking/audio-streaming-pipelines.md) — Processes audio in chunks through a chain of models for real-time generation and transcription.

### Web Development

- [Audio API Servers](https://awesome-repositories.com/f/web-development/audio-api-servers.md) — Provides an interactive web UI and OpenAI-compatible REST API for audio generation and transcription. ([source](https://cdn.jsdelivr.net/gh/blaizzy/mlx-audio@main/README.md))
