30 open-source projects similar to librosa/librosa, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Librosa alternative.
pyAudioAnalysis is a Python library and framework for audio signal processing and analysis. It provides tools for extracting mathematical representations of sound, such as spectrograms, and implements a system for training and evaluating machine learning models to classify audio segments based on acoustic patterns. The project includes dedicated utilities for audio segmentation, which allow for the removal of silence and the detection of specific audio events to divide recordings into meaningful sections. It also provides data visualization capabilities that use dimensionality reduction to ma
Matchering is an audio mastering tool and Python library designed to match the frequency balance and loudness of a target track to a specific reference track. It functions as a reference-based mastering system that aligns a target signal's spectral envelope, RMS, and peak amplitude with those of a chosen reference file. The project utilizes a multi-stage processing pipeline featuring an FFT spectral matching engine to adjust frequency response. It ensures output quality through the use of a brickwall limiter to prevent signal clipping while preserving the original waveform shape. The tool pr
Pydub is a Python audio manipulation library and digital audio processor used for editing, slicing, and converting audio files and segments. It serves as a programmatic wrapper for FFmpeg to import and export a wide variety of audio formats. The library functions as an audio signal generator capable of creating synthetic waveforms, such as sine waves and white noise. It also provides tools for digital signal processing, including the application of filters, fades, crossfades, and gain adjustments to sound signals. Its broader capabilities cover programmatic audio editing through concatenatio
This project is a scientific computing framework for the .NET ecosystem, providing a comprehensive suite of libraries for numerical analysis, statistics, and mathematical optimization. It serves as a foundational toolkit for developing applications in machine learning, digital signal processing, and computer vision. The framework provides specialized toolkits for training and deploying predictive models, including neural networks, support vector machines, and decision trees. It further distinguishes itself with deep integrations for real-time visual analysis, such as object tracking and facia
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
Beets is a command-line music library manager that automates the organization, standardization, and maintenance of digital audio collections. It functions as a relational database-backed system that identifies audio content through acoustic fingerprinting and retrieves accurate metadata from online databases to ensure consistent tagging and directory structures. The project distinguishes itself through an event-driven pipeline architecture and a modular plugin system, which allow users to intercept and customize library processing workflows. This extensibility enables the integration of exter
MoviePy is a Python video editing library and automated video processor designed for programmatically cutting, concatenating, and manipulating video and audio files. It serves as a non-linear video editor and an interface for FFmpeg to handle the reading, writing, and conversion of diverse media formats and codecs. The library enables automated video composition through the layering of multiple video and audio streams using transparency and coordinate-based positioning. It supports dynamic content generation by inserting text overlays and performing custom video frame processing where raw fra
This project is a music information retrieval library and research dataset designed for audio feature extraction and music genre classification. It provides a framework for training and evaluating machine learning models that categorize audio tracks into hierarchical genre structures, supported by a collection of open-licensed MP3 tracks and pre-computed features. The project includes a music metadata API client to fetch structured track, album, and artist information from external data sources. It utilizes these external integrations to map parent-child relationships between genres and organ
ProjectM is a cross-platform music visualization library and pixel shader rendering engine. It functions as an audio signal analysis tool that extracts beat and frequency data from audio streams to drive real-time graphical changes. The engine is built for compatibility with the Milkdrop visualization standard, allowing it to parse and load external preset files to define visual styles. It supports the organization of these presets through playlist-driven management to automate transitions between different visual effects. The system can be integrated into external host applications as a sta
rnnoise is a real-time speech denoising library that uses a recurrent neural network to suppress background noise from live voice audio. It is implemented as a lightweight C library with a minimal API, designed for easy integration into audio applications that need low-latency noise reduction. The library employs a gated recurrent unit (GRU) architecture and frequency-domain feature extraction to capture temporal dependencies in speech, operating on short audio frames sequentially for streaming use. It also includes a training pipeline that allows users to train custom noise suppression model
AudioKit is an audio framework for iOS, macOS, and tvOS that provides tools for digital audio synthesis, signal processing, and audio analysis. It functions as a synthesis engine for generating audio waveforms and textures, a processing library for modifying tonal characteristics, and a toolkit for extracting frequency and amplitude data from sonic signals. The framework utilizes a modular node architecture and graph-based signal routing to connect audio generators, processors, and outputs. It wraps low-level audio primitives in high-level classes to facilitate sound generation and modificati
Dejavu is a Python audio fingerprinting library and recognition engine. It functions as a digital audio signature tool used to analyze sound waves and create unique identifiers for the purposes of audio search and retrieval. The project enables automatic music identification by matching live audio feeds or recorded clips against a database of fingerprints. It covers audio content matching and digital audio archiving to identify original source recordings from a stored collection. The system incorporates capabilities for generating audio fingerprints, identifying audio tracks, and recognizing
ArrayFire is a hardware-agnostic compute framework and JIT-compiled tensor engine designed for high-performance numerical computing. It serves as a GPU numerical computing library and parallel signal processing toolkit that abstracts hardware backends, allowing the same codebase to execute across various GPU architectures and CPUs. The project distinguishes itself through a JIT engine that uses expression compilation to fuse operations and minimize memory overhead. It employs a deferred execution graph to optimize computation chains and provides interoperability primitives to share data and e
This project is a deep learning toolkit designed for audio source separation and music information retrieval. It provides a framework for decomposing polyphonic audio signals into distinct components, such as vocals, drums, and bass, by processing raw waveforms through neural network architectures. The library enables users to train custom separation models or fine-tune existing ones to improve accuracy on specific audio datasets. It supports the entire model lifecycle, including the conversion of raw audio into structured, indexed formats to optimize data loading and training efficiency. Th
EZAudio is an audio library for Apple platforms that provides standardized interfaces for microphone capture, file playback, and hardware output. It functions as a low-latency audio processor and visualization framework designed to manipulate audio buffers and route signals with minimal delay. The project features a hardware-accelerated waveform renderer for drawing real-time audio amplitudes and rolling plots. It also includes a Fast Fourier Transform analyzer that converts time-domain audio samples into frequency-domain data for spectral analysis. The library covers a broad range of capabi
Ultimate Vocal Remover is a desktop application designed for AI-driven audio source separation. It utilizes deep learning models to isolate vocals, drums, and other individual instruments from mixed audio files, providing a utility for professional production and creative editing workflows. The software distinguishes itself by leveraging GPU-accelerated tensor computation to perform complex signal processing tasks, significantly reducing the time required for high-fidelity audio extraction. It incorporates a modular plugin architecture that integrates external utilities to support a wide rang
MIDI.js is a JavaScript library for playing MIDI files and triggering musical notes within web browsers. It functions as a web MIDI library and soundfont audio synthesizer, providing the core engines necessary to render instrument sounds for multiple simultaneous tracks. The project includes a specialized MIDI-to-visual sync engine that interpolates musical events into continuous animation loops to synchronize visuals with audio playback. It also provides a soundfont code generator to convert audio instrument files into base64 code for direct browser consumption. The library covers MIDI file
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
SciPy is a scientific computing library for Python that provides a comprehensive collection of mathematical algorithms and numerical tools for research and engineering. It functions as a high-performance numerical analysis framework, bridging high-level Python code with compiled C and Fortran routines to execute complex computations at hardware speeds. The library is built upon array-based data structures that utilize strided memory layouts to enable efficient data manipulation and slicing. By employing vectorized operation dispatch and linking to optimized hardware-specific linear algebra li
Magenta is a comprehensive toolkit for training, synthesizing, and performing music through neural models and hardware-integrated engines. It functions as a machine learning framework that enables the generation, manipulation, and real-time performance of audio, providing the structural foundations for musical intelligence through hierarchical sequence modeling and symbolic processing. The project distinguishes itself by enabling real-time, low-latency neural audio synthesis that can be integrated directly into professional digital audio workstations. It supports interactive musical jamming a
This project is a comprehensive software suite for voice synthesis and model management, providing a framework for training custom acoustic models and performing voice conversion. It utilizes deep-learning-based acoustic modeling to map source audio characteristics to target voice identities, enabling the transformation of input audio into specific vocal profiles. The system distinguishes itself through a feature-retrieval-based inference mechanism, which employs vector index files to perform nearest-neighbor searches on acoustic features for high-fidelity timbre matching. Users can manage th
A library for audio and music analysis, feature extraction.
KittenTTS is a neural text-to-speech engine and text-to-audio synthesis tool that converts written text into spoken audio using lightweight neural network models. It functions as both a speech synthesizer and an audio file generator, producing spoken audio for offline playback. The system includes a text normalization processor that expands numbers and abbreviations into full spoken words to improve the naturalness of the synthesized speech. It supports diverse voice options and provides the ability to adjust playback speed.
Data manipulation and transformation for audio signal processing, powered by PyTorch
A High-performance cross-platform Video Processing Python framework powerpacked with unique trailblazing features :fire:
Python library and CLI tool to interface with Google Translate's text-to-speech API
Python module for handling audio metadata