# Audio Loudness Normalization Tools

> Search results for `tool to normalize loudness across an audio library` on awesome-repositories.com. 117 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/tool-to-normalize-loudness-across-an-audio-library

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/tool-to-normalize-loudness-across-an-audio-library).**

## Results

- [accord-net/framework](https://awesome-repositories.com/repository/accord-net-framework.md) (4,540 ⭐) — This project is a scientific computing framework for the .NET ecosystem, providing a comprehensive suite of libraries for numerical analysis, statistics, and mathematical optimization. It serves as a foundational toolkit for developing applications in machine learning, digital signal processing, and computer vision.

The framework provides specialized toolkits for training and deploying predictive models, including neural networks, support vector machines, and decision trees. It further distinguishes itself with deep integrations for real-time visual analysis, such as object tracking and facia
- [ggerganov/kbd-audio](https://awesome-repositories.com/repository/ggerganov-kbd-audio.md) (9,007 ⭐) — This project is an acoustic keystroke recognition system and audio side-channel analysis tool. It identifies typed text by analyzing audio recordings of keyboard sounds through waveform correlation and linguistic frequency analysis.

The system distinguishes itself by combining digital signal processing with a substitution cipher decryptor. It uses simulated annealing and n-gram probability maps to recover plaintext from encoded strings and identifies specific keys by comparing captured waveforms against trained sound profiles.

The toolkit covers a broad range of capabilities, including real-
- [bozbez/win-capture-audio](https://awesome-repositories.com/repository/bozbez-win-capture-audio.md) (4,004 ⭐) — win-capture-audio is an OBS Studio plugin and extension for Windows that captures independent application audio streams. It functions as a Windows application audio router to isolate specific sound sources for recording or streaming.

The project focuses on application audio isolation, allowing users to route audio from a single running program while ignoring other background sounds, notifications, or system audio. This enables the management of game stream audio by separating game sound from voice chat or music.
- [tyiannak/pyaudioanalysis](https://awesome-repositories.com/repository/tyiannak-pyaudioanalysis.md) (6,242 ⭐) — pyAudioAnalysis is a Python library and framework for audio signal processing and analysis. It provides tools for extracting mathematical representations of sound, such as spectrograms, and implements a system for training and evaluating machine learning models to classify audio segments based on acoustic patterns.

The project includes dedicated utilities for audio segmentation, which allow for the removal of silence and the detection of specific audio events to divide recordings into meaningful sections. It also provides data visualization capabilities that use dimensionality reduction to ma
- [deeuu/loudness](https://awesome-repositories.com/repository/deeuu-loudness.md) (40 ⭐) — Loudness is a C++ library with Python bindings for modelling perceived loudness. The library consists of processing modules which can be cascaded to form a loudness model.
- [fishaudio/fish-speech](https://awesome-repositories.com/repository/fishaudio-fish-speech.md) (24,928 ⭐) — This project is a generative speech synthesis engine that converts text into high-fidelity human speech. It utilizes a two-stage autoregressive transformer architecture that separates semantic token prediction from acoustic detail reconstruction to balance linguistic accuracy with audio quality. The system is designed to support multilingual output and conversational AI development, enabling the generation of context-aware speech that maintains flow across multiple dialogue turns.

The platform distinguishes itself through a production-ready inference server that employs continuous batching to
- [librosa/librosa](https://awesome-repositories.com/repository/librosa-librosa.md) (8,200 ⭐) — Librosa is a Python audio analysis library and digital signal processing framework. It functions as a feature extraction suite and music information retrieval tool designed to analyze the structural and sonic characteristics of audio signals.

The library provides specialized capabilities for music analysis, including dynamic tempo tracking to identify rhythmic pulses and spectral feature extraction to compute harmonic spectra, chroma variants, and onset points. It also serves as a time-series audio processor for synchronizing audio streams.

The system covers a broad range of audio processing
- [awesomedata/awesome-public-datasets](https://awesome-repositories.com/repository/awesomedata-awesome-public-datasets.md) (75,979 ⭐) — This project is a community-maintained, open-access directory of high-quality public datasets. It serves as a centralized reference point for researchers, developers, and data scientists to locate reliable information sources across a wide spectrum of industries and scientific fields. By providing a structured index, the repository facilitates the discovery of data necessary for exploratory analysis, machine learning model training, and the development of data-intensive applications.

The directory distinguishes itself through a lightweight, platform-agnostic approach to resource indexing that
- [slhck/ffmpeg-normalize](https://awesome-repositories.com/repository/slhck-ffmpeg-normalize.md) (1,509 ⭐) — This utility is a command-line tool designed to automate volume leveling across audio and video collections. By leveraging external media processing libraries, it adjusts files to a consistent target loudness level, ensuring uniform playback without the need for manual volume adjustments.

The tool distinguishes itself through a two-pass analysis workflow that measures loudness statistics before applying precise gain adjustments. It maintains the relative loudness relationships between tracks when processing collections, ensuring that the dynamic balance of a group of files remains intact. Use
- [k2-fsa/sherpa-onnx](https://awesome-repositories.com/repository/k2-fsa-sherpa-onnx.md) (13,017 ⭐) — Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access.

The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
- [go-audio/audio](https://awesome-repositories.com/repository/go-audio-audio.md) (238 ⭐) — Generic Go package designed to define a common interface to analyze and/or process audio data
- [expo/expo](https://awesome-repositories.com/repository/expo-expo.md) (50,111 ⭐) — Expo is a universal mobile framework designed to build native iOS and Android applications from a single codebase using web-standard technologies. It provides a comprehensive development environment that includes a unified runtime for testing, cloud-based infrastructure for compiling and signing native binaries, and automated tools for managing the entire mobile release lifecycle, including app store submission.

The framework distinguishes itself through a plugin-based native configuration engine that programmatically modifies project files, allowing developers to integrate native modules wit
- [evandrolg/ts-audio](https://awesome-repositories.com/repository/evandrolg-ts-audio.md) (339 ⭐) — :musical_score: ts-audio is an agnostic library that makes it easy to work with AudioContext and create audio playlists in the browser
- [openai/whisper](https://awesome-repositories.com/repository/openai-whisper.md) (102,828 ⭐) — This project is a speech recognition and translation engine that utilizes a sequence-to-sequence transformer architecture to convert audio into text. It is built upon a weakly supervised learning framework, which leverages large-scale, unlabelled audio-transcript data to create generalized speech representations capable of performing simultaneous transcription, language identification, and translation.

The system distinguishes itself through a unified multi-task modeling approach that shares token sequences across different objectives, allowing it to handle diverse languages and vocabularies
- [agones-dev/agones](https://awesome-repositories.com/repository/agones-dev-agones.md) (6,888 ⭐) — Agones is a Kubernetes game server orchestrator designed for hosting, scaling, and managing dedicated multiplayer game servers. It extends the Kubernetes control plane using custom resource definitions to define game server and fleet objects, utilizing a dedicated fleet manager to maintain pools of warm server instances.

The system provides a game server SDK and language-specific client libraries that allow server processes to signal readiness, health, and shutdown states directly to the controller. It distinguishes itself through specialized scaling logic, including the use of WebAssembly mo
- [chancejs/chancejs](https://awesome-repositories.com/repository/chancejs-chancejs.md) (6,541 ⭐) — Chance is a JavaScript library for generating random data, designed to produce realistic test data for automated tests and prototypes. It uses a Mersenne Twister pseudo-random number generator that accepts an optional seed value, enabling reproducible sequences of random values across multiple runs.

The library provides a wide range of generators for common data types, including random integers, floats, booleans, characters, strings, and dates, all with configurable ranges and character pools. It can generate realistic geographic data like addresses, as well as financial data such as credit c
- [nwidart/laravel-normalizer](https://awesome-repositories.com/repository/nwidart-laravel-normalizer.md) (0 ⭐) — This package helps you normalize your data in order to save them into the database. The Goal is to having separate classes that handle the data normalization, and thus can be tested independently.
- [smacke/subsync](https://awesome-repositories.com/repository/smacke-subsync.md) (7,747 ⭐) — Subsync is a subtitle synchronization tool that aligns subtitle timing to video audio tracks or other synchronized subtitle files. It functions as an audio-based aligner and timing validator to ensure dialogue and captions match during playback.

The system utilizes audio-text cross-correlation to match voice activity peaks in audio tracks against subtitle timestamps. It includes a remote media sync client that retrieves files from external servers using standard network protocols for local processing.

To ensure accuracy, the tool calculates confidence scores to block updates that fall below
- [android/ndk-samples](https://awesome-repositories.com/repository/android-ndk-samples.md) (10,513 ⭐) — The Android NDK samples provide a comprehensive collection of code examples demonstrating how to integrate C and C++ native code into Android applications. This repository serves as a practical guide for developers utilizing the Android Native Development Kit to implement performance-critical application components that require direct hardware access and low-level system interaction.

The project highlights the use of the Java Native Interface to bridge managed code with native modules, enabling cross-language function calls and efficient data exchange. It demonstrates how to manage native act
- [m-bain/whisperx](https://awesome-repositories.com/repository/m-bain-whisperx.md) (20,228 ⭐) — WhisperX is an automated speech recognition toolkit designed to convert spoken audio into text while maintaining precise synchronization with the original media. It functions as an integrated pipeline that combines transcription, phoneme-based alignment, and speaker diarization to produce structured, attributed transcripts.

The project distinguishes itself through its use of forced alignment, which matches existing text to audio signals at the phoneme level to generate accurate word-level timestamps. It also incorporates speaker diarization to identify and label unique voices within a recordi
- [sindresorhus/normalize-url](https://awesome-repositories.com/repository/sindresorhus-normalize-url.md) (877 ⭐) — Normalize a URL
- [audiokit/audiokit](https://awesome-repositories.com/repository/audiokit-audiokit.md) (11,381 ⭐) — AudioKit is an audio framework for iOS, macOS, and tvOS that provides tools for digital audio synthesis, signal processing, and audio analysis. It functions as a synthesis engine for generating audio waveforms and textures, a processing library for modifying tonal characteristics, and a toolkit for extracting frequency and amplitude data from sonic signals.

The framework utilizes a modular node architecture and graph-based signal routing to connect audio generators, processors, and outputs. It wraps low-level audio primitives in high-level classes to facilitate sound generation and modificati
- [sindresorhus/modern-normalize](https://awesome-repositories.com/repository/sindresorhus-modern-normalize.md) (7,348 ⭐) — modern-normalize is a CSS reset stylesheet and browser style normalizer. It provides a minimal set of global style overrides designed to remove inconsistent default user-agent styles and prevent browser-specific rendering quirks.

The project establishes a predictable frontend layout baseline by standardizing typography and layout properties. It specifically enforces a consistent layout model by applying border-box sizing across all HTML elements.

The stylesheet uses tag-based global targeting to apply normalization rules directly to HTML elements, removing the need for specific class names.
- [anomalyco/opentui](https://awesome-repositories.com/repository/anomalyco-opentui.md) (12,131 ⭐) — Opentui is a terminal user interface framework for building interactive command line applications. It provides a component-based system featuring a flexbox layout engine, a virtual node component tree, and a low-level 2D cell array renderer.

The project is distinguished by a sophisticated keyboard binding engine that maps complex multi-stroke sequences and chords to named commands using prioritized, reactive layers. It also implements a plugin architecture that allows external modules to inject custom UI components into designated layout slots and extend input logic at runtime.

Its capabilit
- [chainlit/chainlit](https://awesome-repositories.com/repository/chainlit-chainlit.md) (12,213 ⭐) — Chainlit is a Python framework designed for building and deploying interactive, stateful conversational AI interfaces. It provides a backend-driven platform that connects language models and agent frameworks to a web-based chat frontend, managing the complexities of session state, message history, and real-time communication.

The framework distinguishes itself by offering a component-based UI builder that allows developers to inject interactive widgets, rich media, and data visualizations directly into the chat stream. It supports the visualization of complex agent workflows, enabling users t
- [sindresorhus/loud-rejection](https://awesome-repositories.com/repository/sindresorhus-loud-rejection.md) (282 ⭐) — By default, promises fail silently if you don't attach a .catch() handler to them.
- [pyannote/pyannote-audio](https://awesome-repositories.com/repository/pyannote-pyannote-audio.md) (9,203 ⭐) — Pyannote.audio is a PyTorch toolkit for speaker diarization, speaker identification, and speech activity detection. Its primary purpose is to partition audio recordings into segments and assign each segment to a specific speaker identity to determine who spoke when.

The project includes a framework for classifying speaker identities and a pipeline for distinguishing human speech from background noise. It provides specialized tools for handling symmetric-overlap speech, where multiple speakers talk simultaneously, and employs learnable band-pass filters for raw waveform feature extraction.

Th
- [necolas/normalize.css](https://awesome-repositories.com/repository/necolas-normalize-css.md) (53,540 ⭐) — Normalize.css is a CSS reset library and browser style normalizer. It provides a collection of baseline styles designed to ensure a consistent visual appearance of HTML elements across all modern web browsers.

The project functions as a cross-browser consistency layer that corrects common default rendering bugs and eliminates inconsistencies in user-agent stylesheets. It establishes a uniform visual baseline by overriding default browser rendering behaviors and mitigating vendor-specific errors.

This stylesheet standardizes browser rendering to prevent unexpected layout bugs and ensures that
- [google/lyra](https://awesome-repositories.com/repository/google-lyra.md) (3,964 ⭐) — Lyra is a voice compression framework and low-bitrate speech codec designed to transmit high-quality audio over bandwidth-constrained networks. It utilizes an adaptive bitrate audio codec to balance audio quality and network bandwidth during active sessions.

The project employs generative audio compression, using neural networks to synthesize speech signals from minimal data and reconstruct missing audio details. This allows for high-quality voice audio reconstruction from highly compressed byte streams.

The system covers bandwidth-optimized voice over IP and real-time voice communication, f
- [forem/forem](https://awesome-repositories.com/repository/forem-forem.md) (22,726 ⭐) — Forem is an open-source platform designed for building and managing technical communities. It functions as a social publishing engine that enables members to share long-form content, participate in threaded discussions, and engage through social interactions. The platform provides tools for organizations to maintain branded profiles, host community hackathons, and facilitate collaborative learning through structured educational tracks.

Beyond its social features, Forem integrates advanced capabilities for AI agent workflow orchestration and codebase knowledge graphing. It allows developers to
- [apache/nuttx](https://awesome-repositories.com/repository/apache-nuttx.md) (3,912 ⭐) — NuttX is a POSIX-compliant real-time operating system designed for microcontrollers ranging from 8-bit to 64-bit architectures. It provides a deterministic execution environment with a real-time task scheduler and a POSIX embedded kernel to ensure portable code execution across diverse hardware targets.

The project distinguishes itself through a comprehensive hardware abstraction layer that provides standardized drivers for I2C, SPI, CAN, and USB across various semiconductor chipsets. It also features an embedded networking stack supporting TCP, UDP, IPv4, and IPv6, alongside industrial proto
- [sergeysova/styled-normalize](https://awesome-repositories.com/repository/sergeysova-styled-normalize.md) (442 ⭐) — normalize.css for styled-components
- [svc-develop-team/so-vits-svc](https://awesome-repositories.com/repository/svc-develop-team-so-vits-svc.md) (28,097 ⭐) — This project is a singing voice conversion tool based on VITS generative modeling. It transforms the identity of a singing voice to a target speaker while preserving the original melody, lyrics, and intonation.

The system distinguishes itself through hybrid voice synthesis, allowing for the blending of multiple speaker identities via linear model interpolation. It utilizes cluster-based feature retrieval to increase target voice similarity and employs a diffusion probabilistic model as a post-processor to remove electronic artifacts and improve vocal clarity.

The software covers a broad rang
- [jonschlinkert/normalize-pkg](https://awesome-repositories.com/repository/jonschlinkert-normalize-pkg.md) (18 ⭐) — Normalize values in package.json to improve compatibility, programmatic readability and usefulness with third party libs.
- [buriburisuri/speech-to-text-wavenet](https://awesome-repositories.com/repository/buriburisuri-speech-to-text-wavenet.md) (4,007 ⭐) — This project is a deep learning framework designed for end-to-end speech-to-text transcription. It utilizes the WaveNet neural network architecture to process spoken audio input and generate written text transcripts, leveraging connectionist temporal classification to map variable-length audio sequences to character-level outputs.

The system distinguishes itself through a comprehensive training pipeline that supports distributed execution across multiple graphics processing units. It includes specialized utilities for audio data augmentation and the transformation of raw audio files into opti
- [johnalbin/normalize-scss](https://awesome-repositories.com/repository/johnalbin-normalize-scss.md) (1,423 ⭐) — | For use with… | normalize-scss version | |------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------| | The latest Sass | 8.0.0 combining normalize.css v8.0.0 with…
- [rvc-project/retrieval-based-voice-conversion-webui](https://awesome-repositories.com/repository/rvc-project-retrieval-based-voice-conversion-webui.md) (36,025 ⭐) — This project is a comprehensive software suite for voice synthesis and model management, providing a framework for training custom acoustic models and performing voice conversion. It utilizes deep-learning-based acoustic modeling to map source audio characteristics to target voice identities, enabling the transformation of input audio into specific vocal profiles.

The system distinguishes itself through a feature-retrieval-based inference mechanism, which employs vector index files to perform nearest-neighbor searches on acoustic features for high-fidelity timbre matching. Users can manage th
- [facebook/react](https://awesome-repositories.com/repository/facebook-react.md) (245,669 ⭐) — React is a JavaScript library for building user interfaces based on a component-driven architecture and unidirectional data flow.
- [normalize/mz](https://awesome-repositories.com/repository/normalize-mz.md) (1,369 ⭐) — Modernize node.js to current ECMAScript specifications! node.js will not update their API to ES6+ for a while. This library is a wrapper for various aspects of node.js' API.
- [cockroachdb/cockroach](https://awesome-repositories.com/repository/cockroachdb-cockroach.md) (32,207 ⭐) — Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures.

The system distinguishes itself through
- [mdeff/fma](https://awesome-repositories.com/repository/mdeff-fma.md) (2,559 ⭐) — This project is a music information retrieval library and research dataset designed for audio feature extraction and music genre classification. It provides a framework for training and evaluating machine learning models that categorize audio tracks into hierarchical genre structures, supported by a collection of open-licensed MP3 tracks and pre-computed features.

The project includes a music metadata API client to fetch structured track, album, and artist information from external data sources. It utilizes these external integrations to map parent-child relationships between genres and organ
- [facebookresearch/fairseq](https://awesome-repositories.com/repository/facebookresearch-fairseq.md) (32,228 ⭐) — Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning.

The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
- [mapbox/geojson-normalize](https://awesome-repositories.com/repository/mapbox-geojson-normalize.md) (34 ⭐) — Normalize any GeoJSON object into a GeoJSON FeatureCollection.
- [eleutherai/gpt-neo](https://awesome-repositories.com/repository/eleutherai-gpt-neo.md) (8,275 ⭐) — GPT-Neo is an open-source distributed training framework designed for scaling GPT-2 and GPT-3-style language models across multiple devices using mesh-tensorflow for model parallelism. It provides the infrastructure to train transformer-based language models with billions of parameters across distributed computing environments, making large-scale language model research accessible outside of proprietary systems.

The framework supports training both autoregressive GPT-style models and masked language models like BERT or RoBERTa, with configurable masking strategies and token handling. It inclu
- [amitshekhariitbhu/go-backend-clean-architecture](https://awesome-repositories.com/repository/amitshekhariitbhu-go-backend-clean-architecture.md) (6,059 ⭐) — This is a Go backend template that structures a web service into domain, usecase, controller, and repository layers with strict dependency inversion. It provides a foundation for building maintainable and testable REST APIs by separating business logic from transport and data access concerns.

The project implements JWT-based authentication, issuing access and refresh tokens for user signup, login, and protected endpoint access. It uses the Gin HTTP framework to build a Docker-packaged REST API with public and private route groups, request validation, and middleware-based authentication. Depen
- [neonbjb/tortoise-tts](https://awesome-repositories.com/repository/neonbjb-tortoise-tts.md) (14,864 ⭐) — Tortoise-tts is a neural text-to-speech engine and voice cloning toolkit designed for high-quality audio generation. It functions as a zero-shot synthesis system, meaning it can generate speech for unseen speakers without requiring additional training or fine-tuning for each new voice.

The system specializes in replicating human vocal characteristics using small sets of reference audio clips. It allows for the extraction of voice latents to mimic specific speakers, the generation of random synthetic identities, and the blending of multiple voice profiles to create hybrid vocal identities.

Th
- [ergebnis/composer-normalize](https://awesome-repositories.com/repository/ergebnis-composer-normalize.md) (1,111 ⭐) — 🎵 Provides a composer plugin for normalizing composer.json.
- [audiojs/audio](https://awesome-repositories.com/repository/audiojs-audio.md) (295 ⭐) — Audio in JavaScript
- [fincept-corporation/finceptterminal](https://awesome-repositories.com/repository/fincept-corporation-finceptterminal.md) (26,900 ⭐) — FinceptTerminal is a quantitative finance platform and financial engineering library designed for asset valuation, risk management, and fixed-income analytics. It provides a comprehensive suite for algorithmic trading and investment strategy automation, integrating specialized language model agents and node-based workflows to automate market research and alpha generation.

The project distinguishes itself with a dedicated game theory analysis engine for calculating Nash equilibria and simulating strategic interactions in competitive markets. It also features a specialized credit risk modeling
- [nl8590687/asrt_speechrecognition](https://awesome-repositories.com/repository/nl8590687-asrt-speechrecognition.md) (8,375 ⭐) — This project is a Chinese automatic speech recognition framework and deep learning system designed to convert spoken Chinese audio into written text. It functions as a toolkit for training, evaluating, and deploying speech-to-text models, utilizing a specialized pinyin-to-text converter that transforms phonetic sequences into Chinese characters using a probability graph model.

The system is distinguished by its deployment flexibility, offering a dockerized recognition server that provides transcription capabilities as a remote API. It supports high-performance streaming through a gRPC speech-