On-Device Speech-to-Text SDKs - Provides an on-device speech-to-text SDK using Core ML models for private, offline transcription without network connectivity.
Speech to Text Transcription - Converts spoken audio into written text using on-device AI models with multi-language support.
OpenAI-Compatible APIs - Ships a local HTTP server that mirrors the OpenAI Audio API for transcription and translation.
Audio Transcription - Converts saved audio files into written text in a single pass after recording ends.
CLI Transcription Tools - Transcribes audio files or microphone input directly from the terminal without needing an Xcode project.
Real-Time Transcription - Processes audio input continuously as it arrives, producing low-latency transcription for live applications.
Speaker-Labeled Live Transcripts - Produces live transcripts that pair each text segment with the identified speaker in real time.
On-Device Inference - Runs neural network models directly on Apple hardware using Core ML for on-device inference.
On-Device Speech Recognizers - Ships an on-device speech recognition SDK using Core ML models for private, offline transcription.
Speech-to-Text Translation - Transcribes and translates spoken audio from other languages into English text in one step.
Voice Identity Selections - Ships multiple built-in voices and languages for customizing text-to-speech output.
Speaker Diarization - Separates and labels audio segments by speaker identity for per-speaker transcripts.
Transcription Merges - Combines speaker identification with transcribed text to produce speaker-attributed transcripts.
Transcription with Speaker Labels - Assigns transcribed words to individual speakers in real time with support for up to four speakers.
Core ML Speech Model Galleries - A browsable gallery of ready-made Core ML models for speech-to-text, speaker diarization, and text-to-speech tasks.
Text-to-Speech - Generates spoken audio from written text using on-device models with voice customization.
Command-Line Speech Synthesizers - Provides a Swift CLI tool for executing transcription, translation, text-to-speech, and diarization tasks directly from the terminal.
On-Device Transcriptions - Transcribes audio into text using on-device inference without requiring a network connection.
Streaming Generations - Produces spoken audio output from text as it is provided, without waiting for the full text.
Multi-Task Speech CLIs - Executes transcription, translation, text-to-speech, and diarization directly from the terminal using a Swift CLI tool.
Speech Processing CLIs - Provides a Swift CLI tool for executing transcription, translation, text-to-speech, and diarization from the terminal.
Audio Streaming Pipelines - Processes audio in real-time through a chain of buffering, encoding, and inference stages for incremental transcription.
On-Device - Ships an on-device text-to-speech engine using Core ML models with real-time streaming playback and voice customization.
OpenAI-Compatible Audio Servers - Exposes SDK functionality through a local HTTP server that mirrors the OpenAI Audio API specification.
OpenAI-Compatible Servers - Provides a local server implementing the OpenAI API specification for audio transcription and translation.
Word-Level Timestamps - Generates precise start and end timestamps for each word in transcription output.
Spoken Language Detection - Identifies the language of spoken audio automatically during transcription without manual selection.
Android AI Pack Delivery - Delivers and manages model assets on Android devices through Google Play's AI Pack system with configurable delivery settings.
Model Galleries - Provides a browsable gallery of ready-made Core ML models for speech-to-text, speaker diarization, and text-to-speech tasks.
Real-Time Audio Transcribers - Captures and transcribes audio in real time from a device microphone via the command line.
CLI Diarization Tools - Labels speakers in audio files directly from the terminal using a Swift CLI tool.
Instant Streaming Playback - Plays synthesized audio frame-by-frame as it is generated with configurable buffering strategies.
Android AI Pack Configurations - Controls which model families are generated, where model assets are sourced from, and whether generation is automatic or manual.
Audio API Servers - Runs a local HTTP server that mirrors the OpenAI Audio API for transcribing and translating audio with streaming support.