ClearerVoice-Studio is a speech processing studio and framework designed for speech enhancement, audio super-resolution, and targeted voice extraction. It provides a suite of tools to remove background noise, increase the sampling rate of low-resolution recordings, and quantify audio clarity through objective quality evaluation metrics. The project features a target speaker extraction tool that isolates specific voices from mixed audio using acoustic, visual, or neural reference signals. It also includes capabilities for overlapping speech separation by capturing temporal patterns and long-ra
ESPnet is a comprehensive speech processing toolkit and PyTorch-based trainer designed for building end-to-end speech recognition, synthesis, and translation models. It provides a structured framework for developing automatic speech recognition systems using transducer and encoder-decoder architectures, alongside engines for text-to-speech synthesis and speech translation pipelines. The project distinguishes itself through a recipe-based workflow execution system that ensures experimental reproducibility by running standardized sequences of scripts for data preparation and model training. It
Tensorflow 2.x implementation of the stacked dual-signal transformation LSTM network (DTLN) for real-time noise suppression. This repository provides the code for training, infering and serving the DTLN model in python. It also provides pretrained models in SavedModel, TF-lite and ONNX format,…