ClearerVoice Studio

ClearerVoice-Studio is a speech processing studio and framework designed for speech enhancement, audio super-resolution, and targeted voice extraction. It provides a suite of tools to remove background noise, increase the sampling rate of low-resolution recordings, and quantify audio clarity through objective quality evaluation metrics.

The project features a target speaker extraction tool that isolates specific voices from mixed audio using acoustic, visual, or neural reference signals. It also includes capabilities for overlapping speech separation by capturing temporal patterns and long-range dependencies within audio waveforms.

The studio covers a broad range of capabilities including source separation, audio super-resolution for reconstructing high-frequency content, and speech noise reduction. It provides a training framework for fine-tuning models for enhancement and separation tasks using custom datasets.

Automation is supported through a command-line interface capable of bulk audio processing across multiple files and directories.

Features

Speech Processing Toolkits - A comprehensive toolkit for enhancing, isolating, and upsampling speech audio using deep learning.

Speech Enhancement Models - Provides a comprehensive framework for removing background noise and improving speech quality through denoising and super-resolution.

Speech Denoisers - Removes background noise and enhances audio quality to make speech recordings clearer.

Multimodal Speech Extraction - Implements speech extraction guided by acoustic data, facial movements, or brain activity.

Primary Speaker Isolation - Isolates a specific target voice from mixed audio signals using reference cues.

Reference-Based Isolation - Isolates specific voices from mixed signals using reference speech, facial movements, or brain activity signals.

Quality Thresholds - Measures speech processing effectiveness using signal-to-noise ratios and perceptual quality scores.

Speech Separation Models - Isolates overlapping voices by capturing long-range dependencies and temporal patterns within the audio waveform.

Target Sound Extraction - Isolates a specific target voice from mixed audio using acoustic, visual, or neural reference signals.

Audio Super-Resolution - Reconstructs high-fidelity audio from low-sampling-rate signals using deep learning to restore high-frequency content.

Audio Quality Evaluation Tools - Calculates signal-to-noise ratios and perceptual scores to quantify audio clarity and distortion.

Audio Signal Fidelity Metrics - Quantifies audio clarity and processing effectiveness using objective fidelity metrics and perceptual scores.

Noise Suppression Model Training - Provides training pipelines to produce custom models for noise suppression in speech recordings.

Model Fine-Tuning - Adapts pre-trained speech models to specific datasets to improve denoising and separation performance.

Speech Model Training - Provides specialized training infrastructure for speech enhancement, separation, and resolution models.

Super-Resolution Training - Provides scripts to train and fine-tune models that upscale low-resolution speech to high-fidelity audio.

Audio Clarity Metrics - Calculates objective metrics and perceptual scores to measure audio distortion and background noise.

Speech Quality Metrics - Quantifies speech processing effectiveness using signal-to-noise ratios and perceptual quality scores.

modelscopeClearerVoice-Studio

Features

Star history