# wenet-e2e/wenet

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/wenet-e2e-wenet).**

5,035 stars · 1,174 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/wenet-e2e/wenet
- Homepage: https://wenet-e2e.github.io/wenet/
- awesome-repositories: https://awesome-repositories.com/repository/wenet-e2e-wenet.md

## Topics

`asr` `automatic-speech-recognition` `conformer` `e2e-models` `production-ready` `pytorch` `speech-recognition` `transformer` `whisper`

## Description

WeNet is an end-to-end automatic speech recognition (ASR) toolkit designed for both Chinese and English, built around transformer-based models. It supports streaming and non-streaming inference out of the box, and is structured to be production-ready, with model export and deployment paths for servers and mobile devices.

The toolkit distinguishes itself through a chunk-based streaming transformer architecture that processes audio in fixed-size segments for low latency while preserving context across chunks. It jointly trains models with both CTC and attention loss to combine alignment accuracy with contextual modeling. Decoding employs a two-pass strategy: an initial CTC decoder generates n-best hypotheses, which are then rescored with a full attention decoder. Weighted finite-state transducer (WFST) decoding integrates an external language model for higher accuracy, and the entire model can be exported to TorchScript for C++ inference without Python dependencies.

Beyond the core recognition engine, WeNet provides a complete pipeline for data preparation, including distributed partitioning, feature normalization, and token dictionary construction. Model training supports multi-GPU setups, checkpoint resumption, and TensorBoard monitoring. Decoding capabilities extend to audio-transcript alignment, word-level timestamp extraction, and N-best generation both with and without a language model. Custom phrase biasing allows injecting prior knowledge to bias recognition toward specific words. Pretrained model snapshots are available for reproducing published results or immediate use.

## Tags

### Artificial Intelligence & ML

- [Bilingual ASR Platforms](https://awesome-repositories.com/f/artificial-intelligence-ml/bilingual-asr-platforms.md) — Provides a complete end-to-end ASR platform for both Chinese and English languages with pretrained models.
- [End-to-End Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/end-to-end-pipelines.md) — Delivers an integrated end-to-end pipeline from data preparation through model training to deployment for ASR.
- [Two-Pass Decoding](https://awesome-repositories.com/f/artificial-intelligence-ml/decoder-architectures/two-pass-decoding.md) — "Uses an initial CTC decoder to generate n-best hypotheses, then rescores them with a full attention decoder for higher accuracy."
- [Decoding Graph Builders](https://awesome-repositories.com/f/artificial-intelligence-ml/decoding-graph-builders.md) — The ASR toolkit builds a decoding graph by composing acoustic model units, a lexicon, and a language model into a single WFST graph. ([source](https://wenet-e2e.github.io/wenet/lm.html))
- [Speech Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-frameworks/speech-model-training.md) — Provides end-to-end training for speech recognition models with multi-GPU support and checkpoint resumption. ([source](https://wenet-e2e.github.io/wenet/tutorial_aishell.html))
- [Transformer ASR Training Workflows](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-frameworks/speech-model-training/transformer-asr-training-workflows.md) — Ships a full training pipeline for transformer-based ASR models with multi-GPU support and checkpoint resumption.
- [Streaming Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/automatic-speech-recognition/streaming-recognition.md) — Provides real-time speech transcription with configurable chunk size for low-latency processing.
- [Streaming ASR Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/model-integrations/asr/streaming-asr-engines.md) — Implements a real-time streaming ASR engine with configurable chunk size for low-latency transcription.
- [Production Inference Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting/production-inference-exports.md) — Exporting trained models to TorchScript or serialized formats for deployment in C++ runtimes on servers and Android.
- [TorchScript Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting/torchscript-exports.md) — "Serializes the trained PyTorch model into TorchScript format for deployment in a standalone C++ runtime without Python." ([source](https://wenet-e2e.github.io/wenet/jit_in_wenet.html))
- [Attention Rescoring Decoders](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-modeling-toolkits/language-model-rescoring/attention-rescoring-decoders.md) — The ASR toolkit improves decoding accuracy by rescoring n-best hypotheses with an attention decoder to select the most accurate transcription. ([source](https://wenet-e2e.github.io/wenet/runtime.html))
- [Speech Transcription](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-transcription.md) — The ASR toolkit decodes audio using multiple strategies, supports streaming and non-streaming transcription, and evaluates word error rate. ([source](https://wenet-e2e.github.io/wenet/runtime.html))
- [Custom Phrase Biasing Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/conversational-voice-interaction/voice-agents/voice-activity-detection/wake-word-detection/custom-phrase-biasing-methods.md) — Injecting prior knowledge from a user-provided phrase list to bias recognition toward specific words or phrases.
- [Audio-Transcript Aligners](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/audio-transcript-aligners.md) — The ASR toolkit aligns an audio recording to a given text transcript, producing per-word timestamps and confidence scores. ([source](https://wenet-e2e.github.io/wenet/python_package.html))
- [Word-Level Timestamps](https://awesome-repositories.com/f/artificial-intelligence-ml/audio-transcription/word-level-timestamps.md) — The ASR toolkit extracts word-level timestamps from CTC spike outputs of the encoder for alignment and downstream processing. ([source](https://wenet-e2e.github.io/wenet/runtime.html))
- [x86 and Android Inference Targets](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-clients/on-device-inference/x86-and-android-inference-targets.md) — Supports running ASR inference on x86 servers and Android devices via a C++ runtime. ([source](https://wenet-e2e.github.io/wenet/runtime.html))
- [WFST Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-integrations/wfst-integrations.md) — Integrates external language models using weighted finite-state transducer graphs to improve recognition accuracy.
- [WFST Language Model Adapters](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-integrations/wfst-language-model-adapters.md) — The ASR toolkit integrates an external language model into decoding using weighted finite-state transducers to boost recognition accuracy. ([source](https://wenet-e2e.github.io/wenet/tutorial_librispeech.html))
- [Model Exporting](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting.md) — Exports trained ASR models to serialized formats for production inference in other languages. ([source](https://wenet-e2e.github.io/wenet/tutorial_aishell.html))
- [C++ Inference Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting/c-inference-exports.md) — Exports trained models to a format deployable in C++ runtimes without Python dependencies. ([source](https://wenet-e2e.github.io/wenet/tutorial_librispeech.html))
- [N-Best Hypothesis Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/n-best-hypothesis-generators.md) — The ASR toolkit generates N-best transcription hypotheses using CTC WFST search with a language model for improved accuracy. ([source](https://wenet-e2e.github.io/wenet/lm.html))
- [Production-Ready ASR Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/production-ready-asr-toolkits.md) — Provides a production-grade ASR toolkit with multi-GPU training, TorchScript export, and C++ inference on servers and mobile.

### Part of an Awesome List

- [Joint CTC and Attention Training](https://awesome-repositories.com/f/awesome-lists/ai/training-and-alignment/joint-ctc-and-attention-training.md) — "Trains the model with both CTC and attention loss simultaneously to leverage complementary strengths of alignment and contextual modeling."

### Data & Databases

- [Speech Decoding Transducers](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/indexing-architectures/finite-state-transducers/speech-decoding-transducers.md) — "Integrates CTC probabilities, a lexicon, and an external language model into a single search graph for beam search decoding."

### Graphics & Multimedia

- [Chunked Streaming Transformers](https://awesome-repositories.com/f/graphics-multimedia/audio-music/audio-streaming-engines/audio-playback-engines/chunked-audio-streaming/chunked-streaming-transformers.md) — "Processes audio in fixed-size chunks to enable low-latency streaming while maintaining context across chunks via chunk-level self-attention."
- [Transformer ASR Toolkits](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/audio-analysis-synthesis/automatic-speech-recognition-toolkits/transformer-asr-toolkits.md) — Provides a toolkit for building and deploying transformer-based end-to-end speech recognition models.

### DevOps & Infrastructure

- [Streaming and Batch Serving](https://awesome-repositories.com/f/devops-infrastructure/model-serving/streaming-and-batch-serving.md) — Serves trained ASR models in both real-time streaming and batch processing modes for production use. ([source](https://wenet-e2e.github.io/wenet/_sources/index.rst.txt))
