VoiceInk

VoiceInk is a system-wide speech-to-text dictation tool that converts spoken audio into text using local or cloud AI models. It functions as a local AI transcription engine and a context-aware voice assistant, allowing users to insert transcribed text directly into any active application on the operating system.

The project distinguishes itself through the use of custom vocabulary management, which trains transcription engines to recognize industry-specific technical terms, professional terminology, and personal names. It further enhances output by using large language models to refine raw transcriptions into polished text, leveraging context injected from the system clipboard and active screen content.

The software includes a hybrid-mode speech recognition system that can operate entirely offline for privacy or utilize remote servers for expanded language support. It features application-specific automation that switches transcription models and dictation profiles based on the active window, alongside configurable keyboard shortcuts for recording control.

The application is written in Swift.

Features

Local Speech-to-Text - Functions as a system-wide utility for converting spoken audio into text using local or cloud AI models.

Context Injection - Provides mechanisms to inject clipboard and screen content into AI prompts to improve transcription and refinement accuracy.

Local Inference Engines - Executes speech-to-text models directly on the host machine to eliminate network latency and ensure data privacy.

Hybrid Speech Recognition - Offers a hybrid mode that processes audio either locally for privacy or via cloud APIs for expanded language support.

State-Aware Prompting - Uses the active state of the system clipboard and screen content to provide contextual awareness for AI prompts.

Custom Vocabularies - Allows users to define custom technical terms and personal names to improve the accuracy of the speech recognition engine.

Global Text Injection Tools - Writes processed text directly into any active application by simulating keyboard input or using the system clipboard.

AI Text Refinement Pipelines - Implements AI-driven pipelines to transform raw voice transcriptions into polished, professionally formatted text.

AI Text Transformers - Uses large language models to polish rough voice transcriptions into professional emails, chats, or social posts.

Local Data Processing - Ensures all audio transcription and text processing occurs on the local machine to maintain data privacy.

Global Shortcut Interceptors - Captures system-wide keyboard hotkeys and push-to-talk triggers to control audio recording across all applications.

Multilingual Transcription - Provides the ability to process audio in multiple languages and switch between them for transcription.

Speech Transcription - Supports remote cloud-based speech transcription for rare languages or hardware with limited local processing power.

Voice Assistants - Integrates a voice-activated AI assistant capable of executing system commands, summarizing text, and answering questions.

Dictation Profiles - Defines sets of transcription models, languages, and formatting rules to tailor recording behavior to specific tasks.

Contextual Transcription Automation - Automatically switches transcription models and enhancement prompts based on the active application or website.

Application-Context Profiles - Automatically toggles transcription models and AI prompts based on the currently active system window.

Recording Controls - Ships configurable keyboard shortcuts and push-to-talk triggers to control the voice capture process system-wide.

Desktop Applications - Open-source dictation and transcription utility for macOS.

Voice Dictation - Real-time speech-to-text application.

Voice To Text - Listed in the “Voice To Text” section of the Awesome Mac awesome list.

BeingpaxVoiceInk

Features

Star history