What are the best open-source alternatives to Speech Recognition?

30 open-source projects similar to uberi/speech_recognition, ranked by shared features. Top picks: k2-fsa/sherpa-onnx, sevask/ecoute, syedhali/ezaudio, nl8590687/asrt_speechrecognition, alphacep/vosk-api, zackriya-solutions/meeting-minutes, koljab/realtimestt, casibase/casibase, livekit/agents, gpac/gpac.

Is k2-fsa/sherpa-onnx a good alternative to Speech Recognition?

Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification task…

Is sevask/ecoute a good alternative to Speech Recognition?

Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox.

Is syedhali/ezaudio a good alternative to Speech Recognition?

EZAudio is an audio library for Apple platforms that provides standardized interfaces for microphone capture, file playback, and hardware output. It functions as a low-latency audio processor and visualization framework designed to manipulate audio buffers and route signals with minimal delay. The…

Is nl8590687/asrt_speechrecognition a good alternative to Speech Recognition?

This project is a Chinese automatic speech recognition framework and deep learning system designed to convert spoken Chinese audio into written text. It functions as a toolkit for training, evaluating, and deploying speech-to-text models, utilizing a specialized pinyin-to-text converter that transf…

Is alphacep/vosk-api a good alternative to Speech Recognition?

Vosk is an offline speech-to-text engine and API that converts spoken audio into text locally on a device. It provides a cross-platform speech toolkit with language bindings for integrating voice recognition into server environments, Android, iOS, and Raspberry Pi. The project includes a speaker i…

Is zackriya-solutions/meeting-minutes a good alternative to Speech Recognition?

This project is a self-hosted meeting transcription and summarization tool that converts audio recordings into text transcripts and structured notes using large language models. It functions as an enterprise meeting documentation manager, allowing for the organization and editing of timestamped rec…

Is koljab/realtimestt a good alternative to Speech Recognition?

RealtimeSTT is a local speech-to-text engine and real-time automatic speech recognition server. It utilizes transformer-based recognition and omnilingual pipelines to convert live audio streams into text, providing a WebSocket-based streaming API for raw PCM audio transmission. The project is dist…

Is casibase/casibase a good alternative to Speech Recognition?

Casibase is an open-source platform that orchestrates multi-turn conversations with large language models and manages retrieval-augmented knowledge bases from a single interface. It provides a unified system for connecting to over 30 AI model providers, ingesting documents into vector embeddings fo…

Is livekit/agents a good alternative to Speech Recognition?

This project is a framework for developing multimodal AI agents that function as programmable participants in real-time communication rooms. It enables the construction of agents that can see, hear, and speak by integrating speech-to-text, large language models, and text-to-speech pipelines to faci…

Is gpac/gpac a good alternative to Speech Recognition?

GPAC is an open-source multimedia framework built around a pluggable filter graph pipeline, where modular processing units called filters connect into a directed graph to handle media workflows. At its core, the framework centers all media packaging and manipulation on the ISO Base Media File Forma…

Back to uberi/speech_recognition

Open-source alternatives to Speech Recognition

30 open-source projects similar to uberi/speech_recognition, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Speech Recognition alternative.

k2-fsa/sherpa-onnx
k2-fsa/sherpa-onnx
13,017View on GitHub
Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access. The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
C++aarch64androidarm32
View on GitHub13,017
sevask/ecoute
SevaSk/ecoute
6,036View on GitHub
Ecoute is a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox.
Pythongpt-35-turbowhisper-aiwindows
View on GitHub6,036
syedhali/ezaudio
syedhali/EZAudio
4,991View on GitHub
EZAudio is an audio library for Apple platforms that provides standardized interfaces for microphone capture, file playback, and hardware output. It functions as a low-latency audio processor and visualization framework designed to manipulate audio buffers and route signals with minimal delay. The project features a hardware-accelerated waveform renderer for drawing real-time audio amplitudes and rolling plots. It also includes a Fast Fourier Transform analyzer that converts time-domain audio samples into frequency-domain data for spectral analysis. The library covers a broad range of capabi
Objective-C
View on GitHub4,991
nl8590687/asrt_speechrecognition
nl8590687/ASRT_SpeechRecognition
8,375View on GitHub
This project is a Chinese automatic speech recognition framework and deep learning system designed to convert spoken Chinese audio into written text. It functions as a toolkit for training, evaluating, and deploying speech-to-text models, utilizing a specialized pinyin-to-text converter that transforms phonetic sequences into Chinese characters using a probability graph model. The system is distinguished by its deployment flexibility, offering a dockerized recognition server that provides transcription capabilities as a remote API. It supports high-performance streaming through a gRPC speech-
Pythonasrtchinese-speech-recognitioncnn
View on GitHub8,375

Open-source alternatives to Speech Recognition

k2-fsa/sherpa-onnx

SevaSk/ecoute

syedhali/EZAudio

nl8590687/ASRT_SpeechRecognition

alphacep/vosk-api

Zackriya-Solutions/meeting-minutes

KoljaB/RealtimeSTT

casibase/casibase

livekit/agents

gpac/gpac

jiaaro/pydub

xiph/opus

audacity/audacity

MusicPlayerDaemon/MPD

vercel/geist-font

livekit/livekit

pipecat-ai/pipecat

fonoster/fonoster

GetStream/Vision-Agents

mediar-ai/screenpipe

cmusphinx/pocketsphinx

vercel/ai

openai/whisper

wechaty/wechaty

QuentinFuxa/WhisperLiveKit

HaujetZhao/CapsWriter-Offline

openai/simple-evals

argmaxinc/WhisperKit

TalAter/annyang

enricoros/big-AGI