# argmaxinc/whisperkit

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/argmaxinc-whisperkit).**

5,639 stars · 504 forks · Swift · mit

## Links

- GitHub: https://github.com/argmaxinc/WhisperKit
- Homepage: http://argmaxinc.com/blog/whisperkit
- awesome-repositories: https://awesome-repositories.com/repository/argmaxinc-whisperkit.md

## Topics

`inference` `ios` `macos` `speech-recognition` `swift` `transformers` `visionos` `watchos` `whisper`

## Tags

### Artificial Intelligence & ML

- [On-Device Speech-to-Text SDKs](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/on-device-speech-to-text-sdks.md) — Provides an on-device speech-to-text SDK using Core ML models for private, offline transcription without network connectivity.
- [Speech to Text Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-transcription.md) — Converts spoken audio into written text using on-device AI models with multi-language support. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [OpenAI-Compatible APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/model-integration-interfaces/ai-integration-apis/openai-compatible-apis.md) — Ships a local HTTP server that mirrors the OpenAI Audio API for transcription and translation.
- [Audio Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription.md) — Converts saved audio files into written text in a single pass after recording ends. ([source](https://app.argmaxinc.com/docs/models))
- [CLI Transcription Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/cli-transcription-tools.md) — Transcribes audio files or microphone input directly from the terminal without needing an Xcode project. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Real-Time Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/real-time-transcription.md) — Processes audio input continuously as it arrives, producing low-latency transcription for live applications. ([source](https://app.argmaxinc.com/docs))
- [Speaker-Labeled Live Transcripts](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/real-time-transcription/speaker-labeled-live-transcripts.md) — Produces live transcripts that pair each text segment with the identified speaker in real time. ([source](https://app.argmaxinc.com/docs/wiki/open-source-vs-pro-sdk))
- [On-Device Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-clients/on-device-inference.md) — Runs neural network models directly on Apple hardware using Core ML for on-device inference.
- [On-Device Speech Recognizers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition/on-device-speech-recognizers.md) — Ships an on-device speech recognition SDK using Core ML models for private, offline transcription.
- [Speech-to-Text Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-datasets/english/speech-to-text-translation.md) — Transcribes and translates spoken audio from other languages into English text in one step. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Voice Identity Selections](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/voice-synthesis/modular-voice-configurations/voice-identity-selections.md) — Ships multiple built-in voices and languages for customizing text-to-speech output. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Real-Time Speech Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-speech-processing.md) — Streams audio input and outputs text as speech is spoken for live captioning. ([source](https://app.argmaxinc.com/docs/models))
- [Speaker Diarization](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization.md) — Separates and labels audio segments by speaker identity for per-speaker transcripts. ([source](https://app.argmaxinc.com/docs))
- [Transcription Merges](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/transcription-merges.md) — Combines speaker identification with transcribed text to produce speaker-attributed transcripts. ([source](https://app.argmaxinc.com/docs/wiki/open-source-vs-pro-sdk))
- [Transcription with Speaker Labels](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/transcription-with-speaker-labels.md) — Assigns transcribed words to individual speakers in real time with support for up to four speakers. ([source](https://app.argmaxinc.com/docs/examples/real-time-transcription))
- [Core ML Speech Model Galleries](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-modeling-toolkits/core-ml-speech-model-galleries.md) — A browsable gallery of ready-made Core ML models for speech-to-text, speaker diarization, and text-to-speech tasks.
- [Text-to-Speech](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech.md) — Generates spoken audio from written text using on-device models with voice customization. ([source](https://app.argmaxinc.com/docs/models))
- [Command-Line Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/local-speech-synthesis/command-line-speech-synthesizers.md) — Provides a Swift CLI tool for executing transcription, translation, text-to-speech, and diarization tasks directly from the terminal.
- [On-Device Text-to-Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/on-device-text-to-speech-synthesizers.md) — Provides an on-device text-to-speech engine with real-time streaming playback and customizable voices.
- [On-Device Transcriptions](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/on-device-transcriptions.md) — Transcribes audio into text using on-device inference without requiring a network connection. ([source](https://app.argmaxinc.com/docs/wiki/supported-platforms))
- [Streaming Generations](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/streaming-generations.md) — Produces spoken audio output from text as it is provided, without waiting for the full text. ([source](https://app.argmaxinc.com/docs/wiki/open-source-vs-pro-sdk))
- [Word-Level Timestamps](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/word-level-timestamps.md) — Generates precise start and end timestamps for each word in transcription output. ([source](https://app.argmaxinc.com/docs))
- [Spoken Language Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/language-detection-tools/spoken-language-detection.md) — Identifies the language of spoken audio automatically during transcription without manual selection. ([source](https://app.argmaxinc.com/docs))
- [Android AI Pack Delivery](https://awesome-repositories.com/f/artificial-intelligence-ml/ml-model-delivery-pipelines/android-ai-pack-delivery.md) — Delivers and manages model assets on Android devices through Google Play's AI Pack system with configurable delivery settings.
- [Model Galleries](https://awesome-repositories.com/f/artificial-intelligence-ml/model-galleries.md) — Provides a browsable gallery of ready-made Core ML models for speech-to-text, speaker diarization, and text-to-speech tasks. ([source](https://app.argmaxinc.com/docs/models))
- [Real-Time Audio Transcribers](https://awesome-repositories.com/f/artificial-intelligence-ml/real-time-audio-transcribers.md) — Captures and transcribes audio in real time from a device microphone via the command line. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [CLI Diarization Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/cli-diarization-tools.md) — Labels speakers in audio files directly from the terminal using a Swift CLI tool. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Diarization-Transcription Mergers](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/diarization-transcription-mergers.md) — Combines speaker identification results with transcript text to label who spoke each segment. ([source](https://app.argmaxinc.com/docs/examples/file-transcription))
- [Real-Time Diarizations](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/real-time-diarizations.md) — Labels each segment of a live audio stream with the identity of the speaker who uttered it. ([source](https://app.argmaxinc.com/docs/models))
- [Real-Time Speaker Identifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/speaker-diarization/real-time-speaker-identifiers.md) — Assigns transcribed words to speakers during a live audio stream with support for up to four speakers. ([source](https://app.argmaxinc.com/docs/wiki/open-source-vs-pro-sdk))
- [CLI Speech Synthesizers](https://awesome-repositories.com/f/artificial-intelligence-ml/text-to-speech/cli-speech-synthesizers.md) — Generates and optionally plays speech from text directly from the terminal using a Swift CLI tool. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Custom Vocabularies](https://awesome-repositories.com/f/artificial-intelligence-ml/vocabulary-management/custom-vocabularies.md) — Improves transcription accuracy for domain-specific terms by adding custom words to the model. ([source](https://app.argmaxinc.com/docs))

### Development Tools & Productivity

- [Multi-Task Speech CLIs](https://awesome-repositories.com/f/development-tools-productivity/cli-speech-generators/multi-task-speech-clis.md) — Executes transcription, translation, text-to-speech, and diarization directly from the terminal using a Swift CLI tool. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Speech Processing CLIs](https://awesome-repositories.com/f/development-tools-productivity/command-line-interfaces/speech-processing-clis.md) — Provides a Swift CLI tool for executing transcription, translation, text-to-speech, and diarization from the terminal.

### Graphics & Multimedia

- [Audio Streaming Pipelines](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/generative-audio-chunking/audio-streaming-pipelines.md) — Processes audio in real-time through a chain of buffering, encoding, and inference stages for incremental transcription.
- [On-Device](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-processing/text-to-speech-engines/text-to-speech-engines/on-device.md) — Ships an on-device text-to-speech engine using Core ML models with real-time streaming playback and voice customization.
- [Real-time Synthesis Streaming](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/real-time-synthesis-streaming.md) — Plays generated speech through device speakers frame-by-frame as it is produced. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Instant Streaming Playback](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-processing/text-to-speech-engines/text-to-speech-engines/instant-streaming-playback.md) — Plays synthesized audio frame-by-frame as it is generated with configurable buffering strategies. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))

### Networking & Communication

- [OpenAI-Compatible Audio Servers](https://awesome-repositories.com/f/networking-communication/local-http-servers/openai-compatible-audio-servers.md) — Exposes SDK functionality through a local HTTP server that mirrors the OpenAI Audio API specification.

### Web Development

- [OpenAI-Compatible Servers](https://awesome-repositories.com/f/web-development/openai-compatible-servers.md) — Provides a local server implementing the OpenAI API specification for audio transcription and translation. ([source](https://cdn.jsdelivr.net/gh/argmaxinc/whisperkit@main/README.md))
- [Audio API Servers](https://awesome-repositories.com/f/web-development/openai-compatible-servers/audio-api-servers.md) — Runs a local HTTP server that mirrors the OpenAI Audio API for transcribing and translating audio with streaming support.

### Part of an Awesome List

- [Voice Activity Detection](https://awesome-repositories.com/f/awesome-lists/more/speech-and-audio-processing/voice-activity-detection.md) — Implements voice activity detection to identify speech segments and silence in audio streams. ([source](https://app.argmaxinc.com/docs))

### Content Management & Publishing

- [Timestamped Subtitle Generators](https://awesome-repositories.com/f/content-management-publishing/media-management/subtitle-management-systems/timestamped-subtitle-generators.md) — Produces time-coded subtitle files in SRT and VTT formats from transcribed audio for video content. ([source](https://app.argmaxinc.com/docs))

### Data & Databases

- [Transcription Term Boosts](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/query-interfaces-dsls/multi-term-search-processors/term-weighting-algorithms/transcription-term-boosts.md) — Accepts custom vocabulary terms to improve recognition of specialized names and domain-specific words. ([source](https://app.argmaxinc.com/docs))

### Mobile Development

- [Android AI Pack Configurations](https://awesome-repositories.com/f/mobile-development/android-ai-pack-configurations.md) — Controls which model families are generated, where model assets are sourced from, and whether generation is automatic or manual. ([source](https://app.argmaxinc.com/docs/faq))
