# uberi/speech_recognition

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/uberi-speech-recognition).**

8,973 stars · 2,421 forks · Python · BSD-3-Clause

## Links

- GitHub: https://github.com/Uberi/speech_recognition
- Homepage: https://pypi.python.org/pypi/SpeechRecognition/
- awesome-repositories: https://awesome-repositories.com/repository/uberi-speech-recognition.md

## Topics

`audio` `python` `speech-recognition` `speech-to-text`

## Description

This project is a Python speech recognition library that serves as a unified interface for converting spoken audio into text. It functions as a bridge between Python applications and a variety of speech-to-text engines, providing a consistent way to interact with both local and cloud-based recognition services.

The library distinguishes itself as a multi-engine transcription tool, wrapping diverse online APIs and offline recognition backends into a standardized format. This allows for interchangeable recognition engines and supports multilingual audio transcription through various language packs.

The framework covers audio processing capabilities including live microphone input capture and the transcription of recorded audio files. It includes tools for ambient noise calibration to adjust energy thresholds, audio data manipulation for trimming or splitting recordings, and background monitoring to detect spoken phrases via a separate execution thread.

## Tags

### Artificial Intelligence & ML

- [Speech-to-Text Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-integrations.md) — Provides a unified interface to transcribe spoken audio into text via multiple online and offline recognition engines. ([source](https://cdn.jsdelivr.net/gh/uberi/speech_recognition@master/README.md))
- [Speech-to-Text Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-libraries.md) — Provides a unified Python interface for converting spoken audio from microphones or files into written text.
- [Long Audio Chunk Transcribers](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/transcription-buffering/audio-segmenting/long-audio-chunk-transcribers.md) — Processes recorded audio files by segmenting them into manageable chunks for stable transcription. ([source](https://cdn.jsdelivr.net/gh/uberi/speech_recognition@master/README.md))
- [Unified Provider Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/cloud-ai-integrations/unified-provider-interfaces.md) — Implements a unified provider interface that standardizes communication across diverse cloud-based and local speech recognition engines.
- [On-Device Speech Recognizers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition/on-device-speech-recognizers.md) — Provides a unified interface to local recognition engines for offline speech-to-text transcription.
- [Speech Recognition Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/speech-recognition-libraries.md) — Functions as a comprehensive Python library providing a unified API for multiple speech-to-text engines.
- [Speech Recognition APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-recognition-apis.md) — Integrates with cloud-based speech recognition APIs to transcribe audio data via external web services. ([source](https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst))
- [Unified Recognition Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-transcription/unified-recognition-interfaces.md) — Bridges Python applications to various cloud APIs and local engines through a consistent interface.
- [Offline Media Transcribers](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription/automated-video-transcribers/offline-media-transcribers.md) — Converts audio to text using local recognition engines to enable transcription without an internet connection. ([source](https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst))
- [Custom Phrase Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/conversational-voice-interaction/voice-agents/voice-activity-detection/wake-word-detection/custom-phrase-detection.md) — Automatically identifies the start and end points of spoken phrases within an audio source. ([source](https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst))
- [Multilingual Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/multilingual-transcription.md) — Supports transcription of spoken words across various languages and regional dialects.
- [Audio Trigger Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-trigger-detection.md) — Monitors audio streams in the background for spoken phrases to trigger specific system callbacks. ([source](https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst))
- [Multilingual Support](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition/on-device-speech-recognizers/multilingual-support.md) — Provides support for transcribing audio in multiple languages through various language packs. ([source](https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst))

### Graphics & Multimedia

- [Real-Time Microphone Captures](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-processing/real-time-audio-threading/live-audio-stream-processors/real-time-microphone-captures.md) — Captures live audio from a device microphone to detect and transcribe spoken phrases on the fly.
- [Audio Captures](https://awesome-repositories.com/f/graphics-multimedia/microphone-input-processors/audio-captures.md) — Captures live audio from the device microphone for real-time or background speech-to-text conversion. ([source](https://cdn.jsdelivr.net/gh/uberi/speech_recognition@master/README.md))
- [Camera and Microphone Recorders](https://awesome-repositories.com/f/graphics-multimedia/streaming-distribution/streaming-broadcasting/live-audio-recording-and-broadcasting/camera-and-microphone-recorders.md) — Captures live audio input from physical microphones with configurable sample rates and device indices. ([source](https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst))
- [Audio Processing Frameworks](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/audio-processing-systems/audio-processing-frameworks.md) — Implements a framework for capturing microphone input and managing audio file formats for transcription.

### Software Engineering & Architecture

- [Transcription Engine Adapters](https://awesome-repositories.com/f/software-engineering-architecture/pluggable-backends/transcription-engine-adapters.md) — Features modular adapters that allow interchangeable use of diverse online and offline speech recognition engines.
- [Background Thread Dispatchers](https://awesome-repositories.com/f/software-engineering-architecture/background-thread-dispatchers.md) — Utilizes background thread dispatching to listen for speech and trigger callbacks without blocking the main execution.
- [Integration Adapters](https://awesome-repositories.com/f/software-engineering-architecture/integration-adapters.md) — Provides architectural abstraction layers to decouple various speech recognition engine integrations from the core application logic.

### Web Development

- [Speech-to-Text API Wrappers](https://awesome-repositories.com/f/web-development/external-api-integrations/speech-to-text-api-wrappers.md) — Acts as a wrapper for various cloud-based speech-to-text APIs to provide a consistent transcription interface.

### Part of an Awesome List

- [Ambient Noise Calibration](https://awesome-repositories.com/f/awesome-lists/ai/recurrent-neural-networks/audio-noise-suppression/ambient-noise-calibration.md) — Analyzes background noise levels to dynamically optimize the sensitivity of voice activity detection thresholds.
