Silero Models

Silero Models - run speech-to-text and synthe… | Awesome Repos

Features

Automatic Speech Recognition - Provides pre-trained neural models for converting spoken audio into written text.
Text-to-Speech Synthesis - Provides pre-trained neural models that convert written text into natural sounding spoken audio.
ONNX Model Exporters - Exports neural models to the standardized ONNX format for high-performance execution across various hardware.
ONNX Model Exports - Delivers speech models exported to ONNX format for cross-engine compatibility and performance.
ONNX Runtime Inference - Enables the execution of audio models across different operating systems using the ONNX runtime.
TorchScript Exports - Packages models using TorchScript to enable high-performance inference without requiring a Python runtime.
Pre-trained Speech Models - Provides a comprehensive library of pre-trained neural models for speech recognition, synthesis, and activity detection.
Punctuation Restoration - Includes a pre-trained model that automatically inserts capitalization and punctuation into raw speech transcripts.
Multilingual Synthesis - Provides synthesis models capable of generating speech across a wide variety of regional and minority languages.
Speech to Text Transcription - Converts spoken audio into text using models optimized for performance and size.
Voice Activity Detection - Includes a model for detecting human speech boundaries to isolate voice from background noise.
Word-Level Timestamps - Provides precise start and end times for individual words during the audio transcription process.
Prosody Control - Allows adjustment of pitch, rate, and timing of synthesized speech to achieve natural inflection.
Speech Prosody Generation - Automatically applies word stress and resolves homographs to improve the natural pronunciation of spoken language.
SSML Conversions - Supports Speech Synthesis Markup Language (SSML) to provide precise control over pitch, timing, and prosody.
Phoneme-Based Speech Processors - Implements a synthesis pipeline that converts text into phonetic representations to ensure natural pronunciation.
Speech Synthesis Markup Controls - Integrates markup controls to refine the prosody and pronunciation of synthesized speech.

Open-source alternatives to Silero Models

Similar open-source projects, ranked by how many features they share with Silero Models.

modelscope/funasr
modelscope/FunASR
18,481View on GitHub
FunASR is an automatic speech recognition toolkit and multilingual speech-to-text engine designed to convert spoken audio into written text across more than fifty languages. It provides a framework for speaker diarization, an OpenAI-compatible transcription API for local server hosting, and speech models compatible with the ONNX format. The project distinguishes itself by supporting high-performance inference on edge hardware via self-contained binaries and portable model exports. It incorporates specialized capabilities for natural speech generation with adjustable timbre and emotional expre
Pythonasraudiochinese
View on GitHub18,481
elevenlabs/elevenlabs-python
elevenlabs/elevenlabs-python
2,873View on GitHub
This Python SDK provides a comprehensive toolkit for synthetic audio generation, voice cloning, and the development of conversational AI agents. It enables the creation of lifelike spoken audio from text, the replication of human voices through custom cloning, and the deployment of real-time voice agents capable of interacting with external large language models. The library distinguishes itself through deep integration of conversational AI capabilities, including the design of agent personas and the execution of real-time actions via APIs. It supports professional-grade audio production thro
Pythonartificial-intelligenceconversational-aitext-to-speech
View on GitHub2,873
livekit/agents
livekit/agents
9,379View on GitHub
This project is a framework for developing multimodal AI agents that function as programmable participants in real-time communication rooms. It enables the construction of agents that can see, hear, and speak by integrating speech-to-text, large language models, and text-to-speech pipelines to facilitate low-latency, natural conversations. The system is distinguished by its advanced orchestration of real-time media and conversational flow, including support for full-duplex speech, preemptive response generation, and sophisticated interruption management. It further differentiates itself throu
Pythonagentsaiopenai
View on GitHub9,379
k2-fsa/sherpa-onnx
k2-fsa/sherpa-onnx
13,017View on GitHub
Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access. The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
C++aarch64androidarm32
View on GitHub13,017

See all 30 alternatives to Silero Models

snakers4silero-models

Features

Open-source alternatives to Silero Models

modelscope/FunASR

elevenlabs/elevenlabs-python

livekit/agents

k2-fsa/sherpa-onnx

Star history

Open-source alternatives to Silero Models

modelscope/FunASR

elevenlabs/elevenlabs-python

livekit/agents

k2-fsa/sherpa-onnx