Speech Recognition

Features

Speech-to-Text Integrations - Provides a unified interface to transcribe spoken audio into text via multiple online and offline recognition engines.
Speech-to-Text Libraries - Provides a unified Python interface for converting spoken audio from microphones or files into written text.
Long Audio Chunk Transcribers - Processes recorded audio files by segmenting them into manageable chunks for stable transcription.
Unified Provider Interfaces - Implements a unified provider interface that standardizes communication across diverse cloud-based and local speech recognition engines.
On-Device Speech Recognizers - Provides a unified interface to local recognition engines for offline speech-to-text transcription.
Speech Recognition Libraries - Functions as a comprehensive Python library providing a unified API for multiple speech-to-text engines.
Speech Recognition APIs - Integrates with cloud-based speech recognition APIs to transcribe audio data via external web services.
Unified Recognition Interfaces - Bridges Python applications to various cloud APIs and local engines through a consistent interface.
Offline Media Transcribers - Converts audio to text using local recognition engines to enable transcription without an internet connection.
Real-Time Microphone Captures - Captures live audio from a device microphone to detect and transcribe spoken phrases on the fly.
Audio Captures - Captures live audio from the device microphone for real-time or background speech-to-text conversion.
Camera and Microphone Recorders - Captures live audio input from physical microphones with configurable sample rates and device indices.
Transcription Engine Adapters - Features modular adapters that allow interchangeable use of diverse online and offline speech recognition engines.
Speech-to-Text API Wrappers - Acts as a wrapper for various cloud-based speech-to-text APIs to provide a consistent transcription interface.
Custom Phrase Detection - Automatically identifies the start and end points of spoken phrases within an audio source.
Multilingual Transcription - Supports transcription of spoken words across various languages and regional dialects.
Audio Trigger Detection - Monitors audio streams in the background for spoken phrases to trigger specific system callbacks.
Multilingual Support - Provides support for transcribing audio in multiple languages through various language packs.
Ambient Noise Calibration - Analyzes background noise levels to dynamically optimize the sensitivity of voice activity detection thresholds.
Audio Processing Frameworks - Implements a framework for capturing microphone input and managing audio file formats for transcription.
Background Thread Dispatchers - Utilizes background thread dispatching to listen for speech and trigger callbacks without blocking the main execution.
Integration Adapters - Provides architectural abstraction layers to decouple various speech recognition engine integrations from the core application logic.

Open-source alternatives to Speech Recognition

Similar open-source projects, ranked by how many features they share with Speech Recognition.

k2-fsa/sherpa-onnx
k2-fsa/sherpa-onnx
13,017View on GitHub
Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access. The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
C++aarch64androidarm32
View on GitHub13,017
sevask/ecoute
SevaSk/ecoute
6,036View on GitHub
Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox.
Pythongpt-35-turbowhisper-aiwindows
View on GitHub6,036
nl8590687/asrt_speechrecognition
nl8590687/ASRT_SpeechRecognition
8,375View on GitHub
This project is a Chinese automatic speech recognition framework and deep learning system designed to convert spoken Chinese audio into written text. It functions as a toolkit for training, evaluating, and deploying speech-to-text models, utilizing a specialized pinyin-to-text converter that transforms phonetic sequences into Chinese characters using a probability graph model. The system is distinguished by its deployment flexibility, offering a dockerized recognition server that provides transcription capabilities as a remote API. It supports high-performance streaming through a gRPC speech-
Pythonasrtchinese-speech-recognitioncnn
View on GitHub8,375
alphacep/vosk-api
alphacep/vosk-api
14,853View on GitHub
Vosk is an offline speech-to-text engine and API that converts spoken audio into text locally on a device. It provides a cross-platform speech toolkit with language bindings for integrating voice recognition into server environments, Android, iOS, and Raspberry Pi. The project includes a speaker identification tool to distinguish between different voices and an acoustic model trainer for building custom neural network models. These training tools enable speech feature extraction and model accuracy evaluation to improve recognition for specialized domains. The system supports real-time audio
Jupyter Notebookandroidasrdeep-learning
View on GitHub14,853

See all 30 alternatives to Speech Recognition

Uberispeech_recognition

Features

Open-source alternatives to Speech Recognition

k2-fsa/sherpa-onnx

SevaSk/ecoute

nl8590687/ASRT_SpeechRecognition

alphacep/vosk-api

Star history

Open-source alternatives to Speech Recognition

k2-fsa/sherpa-onnx

SevaSk/ecoute

nl8590687/ASRT_SpeechRecognition

alphacep/vosk-api