Porcupine is an on-device wake word detection engine that listens for a specific spoken phrase in real-time audio and triggers actions, all processed locally without any cloud connectivity. It includes a custom wake word model creator that generates production-ready models from just a few spoken examples in seconds, requiring no training data. Beyond wake word detection, Porcupine also provides on-device speech recognition for real-time transcription with custom vocabulary, an on-device audio content searcher that indexes and finds spoken phrases in audio files or streams, and a lightweight voice activity detector. For enhanced security, it offers a speaker-verified wake word detector that combines wake word detection with speaker recognition, ensuring a device responds only when the wake word is spoken by an enrolled user.
A key differentiator of Porcupine is its data-free custom model generation, which uses synthetic audio to create custom wake word models without needing users to collect training samples. The engine is built on a platform-agnostic C library foundation, enabling cross-platform integration across mobile, web, desktop, and embedded platforms via lightweight SDKs. Additionally, it provides an open-source accuracy benchmarking framework with test models and audio files, and supports combining wake word detection with speaker verification for secure voice activation.
Beyond wake word management, Porcupine’s capabilities extend to real-time speech-to-text transcription, spoken content search across audio streams, and voice activity detection with minimal false alarms, all performed on-device. The system transforms raw audio into Mel-scale spectrograms and uses a quantized neural network with 8-bit weights, enabling efficient execution on devices with limited compute and memory.