Handy is a local speech-to-text automation tool designed to convert spoken audio into text and inject it directly into active desktop applications. By running machine learning models entirely on the host hardware, it provides a private, offline-first environment for dictation and command execution. The system functions as a background service that manages microphone input, transcription state, and text output, enabling hands-free typing across various software environments.
The project distinguishes itself through a modular pipeline that integrates local language models for post-transcription refinement. Users can configure custom prompts to automatically format, translate, or correct raw speech output before it is inserted into the target application. This workflow is further enhanced by event-driven automation hooks, which allow the system to trigger custom scripts, keyboard shortcuts, or command sequences in response to transcription events.
Beyond core dictation, the software offers extensive control over the transcription environment, including hardware-aware audio management and real-time translation capabilities. It supports fine-grained adjustments to transcription accuracy, such as vocabulary correction for technical terminology and configurable input latency. The system also maintains a history of past sessions and provides tools for managing clipboard states and system memory usage.