30 open-source projects similar to boy1dr/spleetergui, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best SpleeterGui alternative.
Demucs is a deep learning stem splitter and AI music de-mixing software used to isolate vocals and instruments from a single audio file. It functions as a PyTorch audio source separation tool that splits mixed tracks into individual stems such as drums, bass, and vocals. The system is a hybrid spectrogram waveform separator that combines spectral and waveform analysis. This approach allows the software to process audio in both frequency and time domains to achieve high-fidelity source separation. The tool provides capabilities for audio source separation, including acapella track extraction
Vocal-separate is an audio processing tool designed to isolate vocal and instrumental tracks from audio and video files. It functions as a local artificial intelligence engine that performs source separation directly on the user's machine, ensuring data privacy by eliminating the need for external server connectivity. The system provides a browser-based control interface for managing media uploads and monitoring processing tasks. To handle intensive signal decomposition, it utilizes hardware-accelerated tensor processing, which offloads complex mathematical calculations to dedicated graphics
Ultimate Vocal Remover is a desktop application designed for AI-driven audio source separation. It utilizes deep learning models to isolate vocals, drums, and other individual instruments from mixed audio files, providing a utility for professional production and creative editing workflows. The software distinguishes itself by leveraging GPU-accelerated tensor computation to perform complex signal processing tasks, significantly reducing the time required for high-fidelity audio extraction. It incorporates a modular plugin architecture that integrates external utilities to support a wide rang
This project is a comprehensive technical reference and programming cheatsheet for the Python language. It serves as a curated catalog of language features, syntax patterns, and standard library functions designed to help developers identify and apply correct coding patterns. The documentation covers a broad range of functional areas, including language fundamentals such as object-oriented structuring, functional logic, and list comprehensions. It also provides guidance on utilizing the standard library for data analysis, file management, networking, and concurrent execution. The reference e
Vocal Remover is a deep learning application designed for audio source separation. It functions as a command-line utility that decomposes complex audio signals into individual components, specifically isolating vocals and instrumental tracks from mixed recordings. The software utilizes a symmetric encoder-decoder neural network architecture to process audio spectrograms. By applying learned magnitude masks to the original signal phase, the system reconstructs output audio while maintaining temporal coherence. It supports both the execution of pre-trained models for track extraction and the tr
Linly-Dubbing is an automated video dubbing pipeline designed for multilingual video localization. It converts spoken content in videos into another language by coordinating speech-to-text transcription, text translation, and text-to-speech synthesis. The system distinguishes itself through AI-driven lip synchronization and animation, which aligns facial expressions and mouth movements to the synthesized voiceover. It also utilizes audio source separation to isolate vocals from background music and noise, allowing for clean voice replacement while preserving original background audio. The br
Spleeter is an AI audio source separation library and deep learning toolkit designed to split mixed music files into individual audio stems, such as vocals and drums. It provides a suite of pretrained models for isolating different instruments and voices from a recording. The toolkit includes capabilities for training and evaluating custom audio separation models using labeled datasets and configuration files. It also features utilities for measuring model performance by comparing separation outputs against reference datasets. The system manages audio processing through spectral representati
ACE Step 1.5 is a local text-to-music generation and audio editing system that runs on consumer hardware. It transforms plain-language descriptions into full-length songs with lyrics, and can edit existing audio through cover generation, vocal removal, track separation, and selective repainting. The system supports multilingual prompts and lyrics in over 50 languages, and provides precise control over musical structure including duration, BPM, key, and time signature. The project distinguishes itself through a dual-stream diffusion architecture that processes separate latent streams for vocal
ace-step-ui is an AI music production workspace and interface for generating, editing, and organizing synthetic audio tracks and vocals. It provides a technical control panel for managing prompts, seeds, and style parameters to produce high-quality audio. The project includes a digital audio workstation interface for trimming and fading files, alongside an audio stem separation tool that splits mixed tracks into individual components such as drums, bass, and vocals. It also features a music video creator for generating visual content and procedural album art to accompany generated music. The
MahApps.Metro is a WPF UI framework and control library designed for building modern desktop applications using Windows Presentation Foundation. It serves as a XAML styling toolkit and desktop UI kit that provides a collection of stylized controls and window templates to replace the default appearance of standard Windows desktop components. The framework enables the development of professional desktop interfaces through the use of pre-designed layout patterns, vector icons, and custom window framing. It provides project templates to bootstrap the UI development process and facilitate rapid ap
GuiLite is a header-only C++ graphical user interface library and cross-platform framework. It provides a minimal implementation for rendering user interfaces and visual widgets across diverse environments, ranging from resource-constrained microcontrollers and embedded hardware to full desktop operating systems. The library functions as an embedded graphics system for composing visual layouts and rendering multi-language text using UTF-8 encoding and compatible font engines. It supports the integration of multimedia content, including the display of 3D graphics and video playback. The frame
PySimpleGUI is a Python framework used to build graphical user interfaces. It functions as an adapter-based wrapper that maps multiple GUI toolkit APIs into a single unified interface, allowing for backend-agnostic development across tkinter, Qt, or WxPython. The project uses a layout-based UI definition system where visual hierarchies are defined via nested lists rather than coordinate-based placement. It employs an event-driven polling loop and string-based event mapping to associate interface elements with specific user actions. The framework supports cross-framework GUI design and intera
auto-py-to-exe is a Python to EXE converter and standalone executable packager that provides a web-based graphical interface for PyInstaller. It transforms Python scripts into single binary files that run without requiring a local Python installation or external dependencies. The project functions as a JSON-based build automator, allowing users to save, load, and export complex packaging configurations via JSON files to ensure consistent and reproducible builds across different projects. The tool covers script-to-executable conversion, including the ability to bundle static assets and icons
AudioGPT is an LLM-driven audio framework and processing suite that uses large language models to orchestrate neural audio pipelines. It functions as a multimodal audio generator and processing system, integrating a collection of pretrained models to handle speech synthesis, sound generation, and audio manipulation. The system is distinguished by its ability to generate audio from diverse inputs, including text and images, and its capacity to produce synchronized talking head videos. It also operates as a neural speech translator, converting spoken language between different tongues while pre
DearPyGui is a GPU-accelerated, immediate-mode graphical user interface framework for Python. It provides a high-performance toolkit for building interactive desktop applications by leveraging native hardware-accelerated rendering backends across multiple operating systems. By utilizing an immediate-mode execution model, the library offers direct control over the rendering loop and element state, enabling the creation of responsive, dynamic interfaces. The framework distinguishes itself through its ability to handle complex, high-frequency visual updates, making it suitable for real-time data
Nuklear is a portable, header-only immediate mode graphical user interface library written in C. It is designed to function as a lightweight framework for creating interfaces that render directly to hardware, making it suitable for integration into custom graphics engines, embedded systems, and resource-constrained environments. The library operates by generating abstract draw commands that are converted into vertex buffers, allowing for hardware-accelerated rendering through standard graphics APIs. By utilizing an immediate mode approach, the interface state is defined and updated within the
Phoenix is a cross-platform Python framework designed for building native desktop graphical user interfaces. It functions as a language binding generator and build automation system, enabling developers to create applications that utilize the underlying operating system's native controls and visual styles. The project provides a mechanism for mapping native C++ graphical toolkit components to Python, allowing for the development of desktop
WPF is a .NET desktop UI framework and application framework designed for creating graphical user interfaces specifically for the Windows operating system. It functions as a XAML-based UI toolkit that uses an XML-based language to define interfaces and separate design from application logic. The framework includes a vector graphics rendering engine that produces resolution-independent visuals. This system allows graphics to scale without loss of quality on high-density monitors. The project covers broad capability areas including modern UI styling, rich media content hosting, and desktop int
IOPaint is an AI image editor and Stable Diffusion inpainting tool providing a web interface for removing objects and replacing image content. It utilizes latent diffusion image processing to synthesize high-resolution replacements for erased sections of an image. The project features a specialized AI background remover for isolating subjects and an AI image upscaler that employs super-resolution models for general photos and anime artwork. The software covers a broad range of capabilities including image segmentation for object isolation, face restoration for improving facial details, and t
ToaruOS is an independent operating system built from the ground up without external dependencies. It features a custom x86-64 kernel that supports symmetric multiprocessing and paging, paired with a graphical windowing system and a dedicated bytecode interpreter for application logic. The system distinguishes itself by integrating an embedded Python environment for system-level development and a custom graphical interface that handles its own window composition and text rendering. It includes a compatibility layer for third-party application support and a system package manager for handling
This project is a collection of deep learning tools for image classification and audio tagging, providing a repository of pre-trained model weights and architectures. It serves as a Keras model zoo that enables the immediate use of established neural networks for inference and transfer learning. The library includes a music tagging framework that classifies audio recordings using convolutional recurrent neural networks and mel-spectrograms. For visual data, it provides implementations of architectures such as ResNet, VGG, and Xception, alongside a repository of weights trained on large datase
This project is a cross-platform media downloader and graphical user interface wrapper for the youtube-dl command-line tool. It provides a desktop application for fetching audio and video from websites through a visual interface. The application is implemented as a wxPython desktop application, utilizing the wxPython toolkit to provide a windowed environment. It functions as a wrapper that maps visual form inputs to command-line arguments for the underlying backend utility.
OrbStack is a native macOS application that replaces Docker Desktop, providing an all-in-one environment for running Docker containers, full Linux virtual machines, and local Kubernetes clusters. It runs Linux VMs directly on the macOS hypervisor framework for near-native performance, uses VirtioFS for fast bidirectional file sharing between macOS and Linux, and leverages Rosetta for near-native x86 emulation on Apple Silicon. The system assigns predictable local domain names to containers and VMs with automatic HTTPS certificate generation, forwards ports via event-driven updates, and stores
Duilib is a Windows UI library and custom rendering engine designed for building high-performance graphical user interfaces for PC client software. It functions as a DirectUI framework that allows for the creation of bespoke layouts and tailored interface elements without relying on standard operating system controls. The framework implements a direct user interface approach to render custom graphical elements. This allows for the design of non-standard visual layouts for desktop applications, including those that require high-performance rendering for resource-heavy software. The system cov
BackgroundMusic is a system-level macOS audio management utility that provides an application volume mixer for independent gain and level adjustments for every running application. It functions as a system audio router and pause controller to manage how sound is handled across the operating system. The project features a virtual audio driver that routes internal system sound to recording software and input devices. It also includes automatic music ducking, which monitors system audio activity to pause music playback when other applications produce sound and resume it once that audio ends. Th
This project provides a comprehensive framework for building, training, and managing autonomous agents. It enables the construction of systems that utilize language models to plan, manage memory, and execute multi-step tasks through iterative reasoning loops and tool-based actions. The framework distinguishes itself by offering specialized capabilities for interacting with graphical user interfaces and legacy software, allowing agents to perceive visual elements and perform actions like a human user. It supports complex, cross-application workflows through graph-based orchestration and provid
Gooey is a framework that transforms command-line programs into graphical applications by automatically generating user interfaces from existing argument definitions. By applying a decorator to a script, the tool maps standard command-line arguments to specialized graphical widgets, allowing users to interact with terminal-based utilities through forms, file pickers, and date selectors. The project distinguishes itself by providing a comprehensive suite of customization and lifecycle management tools that extend beyond simple interface generation. It includes capabilities for input validation
Librosa is a Python audio analysis library and digital signal processing framework. It functions as a feature extraction suite and music information retrieval tool designed to analyze the structural and sonic characteristics of audio signals. The library provides specialized capabilities for music analysis, including dynamic tempo tracking to identify rhythmic pulses and spectral feature extraction to compute harmonic spectra, chroma variants, and onset points. It also serves as a time-series audio processor for synchronizing audio streams. The system covers a broad range of audio processing
Sherpa-ONNX is an ONNX-based speech processing toolkit that provides a local speech recognition engine, an on-device voice synthesis tool, and a speaker identification framework. It is designed as a cross-platform speech API that enables speech-to-text, text-to-speech, and speaker verification tasks to be executed locally on a device without requiring network access. The project is distinguished by its ability to perform zero-shot voice cloning and speaker diarization on-device. It supports a wide range of hardware accelerations, including GPU and various NPU architectures, and provides a Web
This project is a comprehensive software suite for voice synthesis and model management, providing a framework for training custom acoustic models and performing voice conversion. It utilizes deep-learning-based acoustic modeling to map source audio characteristics to target voice identities, enabling the transformation of input audio into specific vocal profiles. The system distinguishes itself through a feature-retrieval-based inference mechanism, which employs vector index files to perform nearest-neighbor searches on acoustic features for high-fidelity timbre matching. Users can manage th