10 Repos
Tools for automating the separation and conversion of large music libraries.
Distinct from Batch Processing Utilities: Focuses on audio-specific batch processing, distinct from general batch processing utilities.
Explore 10 awesome GitHub repositories matching data & databases · Audio Batch Utilities. Refine with filters or upvote what's useful.
VoxCPM is a multilingual speech synthesis system and text-to-speech inference server. It functions as an AI voice cloning tool and a synthetic voice designer, capable of generating natural speech across global languages and regional dialects using a GPU-accelerated audio generator. The project features a speech model fine-tuning framework that supports both full parameter updates and low-rank adaptation for customizing voice characteristics. It enables high-fidelity voice cloning from reference audio, including cross-lingual voice transfer and acoustic environment mimicry, as well as the crea
Converts text files into separate audio files by treating each line as an individual synthesis task.
Scrapegraph-ai is a Python framework that uses large language models to automate the extraction of structured data from websites and documents. It functions as an AI-driven data extraction pipeline that converts unstructured web content into structured formats using natural language processing and graph-based logic. The project utilizes graph-based task orchestration to model scraping workflows as interconnected nodes. It features a pluggable model interface for connecting to cloud or local artificial intelligence providers and can generate executable Python code on the fly to handle site-spe
Transforms extracted website information into audio files for accessibility or alternative content consumption.
Ultimate Vocal Remover is a desktop application designed for AI-driven audio source separation. It utilizes deep learning models to isolate vocals, drums, and other individual instruments from mixed audio files, providing a utility for professional production and creative editing workflows. The software distinguishes itself by leveraging GPU-accelerated tensor computation to perform complex signal processing tasks, significantly reducing the time required for high-fidelity audio extraction. It incorporates a modular plugin architecture that integrates external utilities to support a wide rang
Automates the separation and conversion of large music libraries through sequential file queuing.
This project is a collection of implementation guides, recipes, and developer resources for building applications with Llama models. It serves as a comprehensive kit for developing autonomous agents, establishing retrieval-augmented generation systems, and executing model fine-tuning. The resource provides specific patterns for multimodal workflows that process text, images, and audio. It includes specialized guidance on adapting pre-trained model weights for targeted tasks and implementing tool-calling orchestration to connect models with external APIs and functions. The codebase covers a b
Transforms PDF content into multi-speaker scripts and audio files using a sequence of specialized models.
EmotiVoice is an emotional text-to-speech engine and bilingual speech synthesizer designed to generate synthetic audio in English and Chinese. It utilizes a deep learning architecture to produce high-fidelity speech with controllable emotional states and timbres. The project includes a voice cloning framework for replicating specific speaker identities by training custom acoustic models on personal audio datasets. It employs a jointly-trained acoustic-vocoder pipeline and style-embedding-based synthesis to manage expression and reduce audio artifacts. The system covers a broad range of speec
Enables the creation of multiple synthetic speech files by processing text lists through a scripting interface.
ReadYou is a self-hosted reading application and RSS feed aggregator that centralizes content from multiple web sources. It functions as a full-text RSS reader, extracting the complete body text from web pages to provide a distraction-free reading experience. The application includes specialized accessibility and speed tools, such as a bionic reading mode that uses pattern-based text highlighting to guide the eye and a text-to-speech system for audio content consumption. The project covers comprehensive subscription management through OPML import and export, feed categorization, and keyword-
Converts extracted web article text into synthetic speech for hands-free audio consumption.
🎤 微软语音合成工具,使用 Electron Vue ElementPlus Vite 构建。
Reads multiple UTF-8 text files sequentially, converts each to MP3, and handles long-text slicing automatically.
Abogen is a text-to-speech audiobook generator that transforms digital documents and subtitle files into audiobooks. It utilizes language models to perform text normalization, rewriting contractions and punctuation to produce more natural speech synthesis. The system features a voice profile mixer that blends multiple voice models using adjustable weight ratios to create personalized synthetic voices. It also includes an automated export system that sends completed audio files and metadata to a remote Audiobookshelf server via a web API. The project manages the end-to-end audiobook productio
Transforms digital documents and subtitle files into high-quality audiobooks with synchronized subtitle tracks.
ncmdump is a proprietary audio cache converter and binary stream decryption utility. It decrypts raw audio data from specialized music cache files and transforms them into standard audio formats. The project functions both as a standalone tool and a cross-language conversion library. It exposes its internal decryption and conversion logic as a dynamic library, allowing the capabilities to be embedded into external applications written in different programming languages. The utility provides batch media processing through recursive directory traversal, enabling the identification and conversi
Automates the conversion and separation of large proprietary music libraries.
Matchering is an audio mastering tool and Python library designed to match the frequency balance and loudness of a target track to a specific reference track. It functions as a reference-based mastering system that aligns a target signal's spectral envelope, RMS, and peak amplitude with those of a chosen reference file. The project utilizes a multi-stage processing pipeline featuring an FFT spectral matching engine to adjust frequency response. It ensures output quality through the use of a brickwall limiter to prevent signal clipping while preserving the original waveform shape. The tool pr
Processes multiple audio files through a mastering pipeline using a command-line interface.