Which open-source GitHub repositories match “Video and audio tools”?

ffmpeg/ffmpeg is the closest match — FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration into custom applications. At its core, the project utilizes a packet-based stream engine and a format-agnostic abstraction layer to handle diverse media standards,…

Why does ffmpeg/ffmpeg match “Video and audio tools”?

FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration i…

Why does deepfakes/faceswap match “Video and audio tools”?

Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the…

Why does deniscerri/ytdlnis match “Video and audio tools”?

ytdlnis is a mobile application that serves as a graphical client for the yt-dlp engine on Android. It functions as a media downloader and manager, providing a user interface to retrieve video and audio from websites. The project distinguishes itself by integrating directly with the Android system…

Why does mifi/lossless-cut match “Video and audio tools”?

LosslessCut is a desktop application designed for the precise editing of video and audio files without re-encoding the underlying media streams. By performing stream copying and container remuxing, the software allows users to cut, merge, and rearrange media segments while maintaining the original…

Why does pipecat-ai/pipecat match “Video and audio tools”?

Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI…

Video and audio tools

Explore open-source software for media processing, audio editing, video transcoding, and multimedia streaming applications.

Find the best repos with AI.We'll search the best matching repositories with AI.

ffmpeg/ffmpeg
FFmpeg/FFmpeg
61,176View on GitHub
FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration into custom applications. At its core, the project utilizes a packet-based stream engine and a format-agnostic abstraction layer to handle diverse media standards, containers, and network protocols. The framework distinguishes itself through a modular, graph-based filter execution model that allows for complex, non-linear transformations of audio and video frames. It supports high-performance processing by offloading intensive encoding and decoding tasks to dedicated hardware and utilizing threaded parallel processing to maximize throughput across multiple processor cores. This architecture enables users to construct intricate pipelines for tasks ranging from simple format conversion to advanced real-time media filtering and analysis. Beyond core transcoding, the project covers a broad functional surface including live streaming, hardware device capture, and secure network transport. It provides extensive capabilities for metadata management, subtitle processing, and stream synchronization, alongside diagnostic tools for inspecting media integrity and performance. The system is highly extensible, allowing for the dynamic integration of external codecs and third-party libraries to support specialized media requirements.
CMultimedia Format ConvertersMultimedia Processing SuitesAudio and Video
View on GitHub61,176
deepfakes/faceswap
deepfakes/faceswap
55,289View on GitHub
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated processing and multi-stage image post-processing. It includes specialized tools for manual alignment verification, allowing users to refine detected facial data through a graphical interface to ensure high-quality results. The system also features robust batch-oriented data processing, which partitions media into standardized chunks to optimize memory usage and throughput during intensive neural network operations. Beyond its core synthesis capabilities, the framework covers a broad range of computer vision tasks including facial landmark detection, pose estimation, and mask generation. It integrates sophisticated model management utilities, such as automated loss calculation, gradient clipping, and snapshot recovery, to ensure stable training sessions. The system also provides extensive diagnostic tools for hardware performance monitoring and environment validation, ensuring compatibility across various compute accelerators. The software is managed through a centralized command-line and graphical toolkit that supports persistent configuration and session state management. It is designed to run on diverse hardware configurations by dynamically querying available compute resources and routing tensor operations to the optimal processor.
PythonAutomated Face SwappingFace Swapping EnginesAutomated
View on GitHub55,289
deniscerri/ytdlnis
deniscerri/ytdlnis
7,742View on GitHub
ytdlnis is a mobile application that serves as a graphical client for the yt-dlp engine on Android. It functions as a media downloader and manager, providing a user interface to retrieve video and audio from websites. The project distinguishes itself by integrating directly with the Android system share menu and intents to trigger background downloads from external apps. It includes a dedicated authentication cookie manager to import and sync browser session data, enabling the retrieval of private, age-restricted, or premium content. The application covers broad capability areas including automated media downloading via playlist monitoring and batch URL processing, as well as video post-processing for trimming segments and removing sponsored content. It further provides metadata management for embedding subtitles and chapters, along with a built-in terminal for executing custom command-line arguments. Users can manage download queues, configure network usage restrictions, and utilize incognito session modes to prevent activity from appearing in the download history.
KotlinAndroid Media DownloadersMedia DownloadersAndroid Applications
View on GitHub7,742
mifi/lossless-cut
mifi/lossless-cut
41,364View on GitHub
LosslessCut is a desktop application designed for the precise editing of video and audio files without re-encoding the underlying media streams. By performing stream copying and container remuxing, the software allows users to cut, merge, and rearrange media segments while maintaining the original bit-perfect quality of the source content. The application distinguishes itself by utilizing a stream-copying data pipeline that transfers raw media packets directly from source to destination, significantly reducing processing time compared to traditional transcoding workflows. It also functions as a media container remuxing tool, enabling users to repackage streams into different file formats or structures without altering the data itself. Beyond basic trimming, the tool provides capabilities for high-resolution frame extraction and comprehensive metadata management. Users can capture still images from specific timestamps or scene transitions and import or export timing data and chapter markers to synchronize editing projects with external professional tools. The application is distributed as a cross-platform desktop shell that provides direct access to local file systems for media processing.
TypeScriptVideo EditingMultimedia ProcessingTranscoding Engines
View on GitHub41,364
pipecat-ai/pipecat
pipecat-ai/pipecat
12,846View on GitHub
Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI. The project is distinguished by its ability to coordinate complex multimodal services, including speech-to-text, language models, and text-to-speech, within a single pipeline. It features semantic voice activity detection for natural turn-taking, state-machine conversation flows for dialogue management, and WebRTC-based streaming for bidirectional media connectivity. The framework covers a broad surface of capabilities, including AI integration with various foundation models, asynchronous tool execution for external function calls, and telephony integration with providers such as Twilio and Genesys Cloud. It also includes tools for distributed session management, long-term agent memory, and cloud deployment orchestration for scaling agent instances. The project provides command-line utilities for project scaffolding, deployment auditing, and technical documentation indexing.
PythonData Flow OrchestratorsMultimodal AI OrchestratorsMultimodal Service Orchestration
View on GitHub12,846
goldfire/howler.js
goldfire/howler.js
25,190View on GitHub
Howler.js is a JavaScript library that provides a unified interface for managing audio playback across web browsers. It functions as a cross-browser audio engine, abstracting complex browser audio APIs into a consistent developer experience while ensuring reliable performance through automatic fallback mechanisms. The library distinguishes itself by offering specialized tools for spatial audio and efficient asset management. It includes a spatial audio framework that maps three-dimensional vectors to audio nodes for immersive sound positioning, alongside an audio sprite manager that allows developers to group multiple clips into a single file to reduce network requests. These features are supported by global state management, which tracks active sound instances to maintain consistent volume control and prevent resource leaks. The project covers a broad range of audio capabilities, including precise playback control, synchronization for interactive media, and support for legacy browser environments via standard media element fallbacks. It is designed to handle the requirements of web-based games and interactive applications that demand responsive, multi-source audio environments.
JavaScriptWeb Audio LibrariesAudio ProcessingWeb Game Audio Engines
View on GitHub25,190
rigellute/spotify-tui
Rigellute/spotify-tui
19,019View on GitHub
This project is a terminal-based music controller that provides a text-based interface for managing audio streaming, library navigation, and playback device selection. It functions as a client for remote music services, allowing users to browse catalogs, control playback states, and manage their streaming accounts directly from the command line. The application distinguishes itself through a highly customizable interface and automation capabilities. Users can modify the visual layout, adjust themes, and define custom keyboard shortcuts to create a personalized control workflow. Beyond interactive use, the system supports non-interactive command-line execution, enabling users to trigger playback, search for content, and query their library through shell scripts or terminal commands. The software integrates a broad range of media management tools, including support for searching catalogs, organizing favorite content, and switching between available audio output hardware. It also features real-time audio visualization, rendering pitch information and track analysis data directly within the terminal environment. The application is configured via user-defined settings and authenticates with remote services using secure token exchange protocols.
RustStreaming ClientsAPI IntegrationsMusic Streaming Interfaces
View on GitHub19,019
katspaugh/wavesurfer.js
katspaugh/wavesurfer.js
10,114View on GitHub
wavesurfer.js is a WebAudio playback library and interactive waveform visualizer that renders audio data onto an HTML5 canvas. It enables users to see and navigate sound files through a visual representation of audio peaks, allowing for direct seeking and playback control within a web browser. The project is distinguished by its flexible rendering model, which can use precomputed peak data to display waveforms without downloading or decoding the full audio file. It utilizes a plugin-based extension model to integrate advanced tools such as spectrograms, interactive audio timelines, and real-time audio recorders for capturing microphone input. Its broader capabilities cover audio playback management including rate adjustment and region-based looping, as well as digital signal processing via Web Audio API integration for effects and spatial panning. The library also provides tools for web-based audio editing, such as drawing volume automation curves and marking interactive audio regions. The library supports integration with frontend frameworks to bind waveform rendering and audio controls to component lifecycles.
TypeScriptAudio Visualization ToolsAudio Waveform RenderersAudio Engine Hybrids
View on GitHub10,114
soimort/you-get
soimort/you-get
56,839View on GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media for offline use. The tool distinguishes itself through its ability to handle authenticated content, allowing users to inject browser-stored session cookies to access restricted or private media. It also supports real-time media streaming by piping remote content directly into external playback software, bypassing the need for local disk storage. For complex media tasks, the utility orchestrates external command-line tools to manage file merging, format conversion, and stream playback. Beyond basic acquisition, the software provides comprehensive management features, including automated directory organization for batch processing and the ability to resume interrupted downloads using temporary state files. It also integrates network proxy configurations to route traffic through external servers, facilitating access to content subject to regional restrictions or firewall limitations. Users can further automate workflows by programmatically extracting resource metadata or submitting search queries directly through the terminal.
PythonMedia DownloadersMedia ExtractorsMedia Content Archivers
View on GitHub56,839
evilcult/moviecatcher
EvilCult/moviecatcher
823View on GitHub
电影美剧搜索及在线观看离线下载软件，集成热门资源站，借助百度云实现离线下载以及在线播放功能。
PythonAudio and VideoAudio Video Tools
View on GitHub823
sampotts/plyr
sampotts/plyr
29,862View on GitHub
This project is a customizable media player designed to provide a consistent interface for video and audio content across all modern web browsers and mobile devices. It functions as a unified abstraction layer, standardizing playback behavior and control interfaces for both native media elements and third-party streaming service embeds through a predictable, declarative API. The library distinguishes itself by wrapping native media elements with custom HTML structures, ensuring a uniform look and feel regardless of the underlying browser implementation. Developers can manage playback state, monitor events, and configure settings through a centralized interface, while also utilizing advanced navigation tools like visual seek previews and keyboard shortcuts to enhance the user experience for long-form content. The platform supports a wide range of functional requirements, including accessible media consumption through integrated captioning and screen reader support, as well as extensive visual customization via CSS variables. It handles the complexities of cross-browser compatibility and media lifecycle management, allowing for the integration of custom logic and analytics throughout the playback session.
JavaScriptCross-Browser Media PlayersHTML5 Media PlayersPlayback Controllers
View on GitHub29,862
gethopp/hopp
gethopp/hopp
595View on GitHub
The best OSS remote pair programming app.
RustAudio and VideoCommunication Tools
View on GitHub595
fanmingming/live
fanmingming/live
27,661View on GitHub
This project is an IPTV playlist manager and live stream aggregator designed to organize and maintain custom television channel listings. It functions as a centralized repository for verified broadcast links, providing the tools necessary to consolidate disparate media sources into unified, standardized playlist files compatible with third-party streaming applications. The system distinguishes itself by utilizing client-side stream resolution, where the playback device handles the final network request to the media source, thereby reducing bandwidth demands on the hosting infrastructure. It also integrates remote XML metadata to provide dynamic electronic program guide information, ensuring that scheduling data remains synchronized with the curated channel lists. The platform supports the creation and validation of custom configurations through a web-based interface that relies on static asset delivery. By leveraging standardized text-based playlist formats, the tool enables users to curate personalized media experiences across various regional and international networks without the need for complex backend database management.
JavaScriptPlaylist GeneratorsPlaylist FormatsPlaylist Managers
View on GitHub27,661
gptguy/silentkeys
gptguy/silentkeys
91View on GitHub
Real time, privacy first, low latency push to talk using Parakeet fully on device with Tauri and ORT.
RustAudio and Video
View on GitHub91
mpv-player/mpv
mpv-player/mpv
35,618View on GitHub
This project is a high-performance, terminal-based media player designed for efficient audio and video playback. It utilizes a modular decoding core to handle a wide range of multimedia formats while offloading frame processing to platform-specific hardware-accelerated rendering pipelines to minimize CPU overhead. Beyond its standalone utility, the software functions as an embeddable multimedia engine, providing a native library interface that allows external applications to integrate its advanced decoding and rendering capabilities directly into their own interfaces. The player is distinguished by its extensive automation and control capabilities, which allow it to function as a programmable backend for complex media environments. Users can manage playback behavior through a declarative configuration system that supports conditional profiles and custom input mappings. Furthermore, the software provides a bidirectional inter-process communication interface using JSON-formatted commands, enabling external programs to monitor status and control playback in real time. Extensibility is a core design principle, supported by a scripting automation engine and a native plugin architecture. These features allow developers to observe internal state, modify properties, and load compiled modules to extend functionality without disrupting the main playback loop. The system also includes comprehensive tools for managing audio output drivers, monitoring performance through real-time statistics overlays, and customizing user interactions via on-screen controllers and context menus.
CMedia PlayersAudio PlaybackCommand Line Interfaces
View on GitHub35,618
capsoftware/cap
CapSoftware/Cap
17,026View on GitHub
Cap is a self-hosted screen recording and video collaboration platform designed for teams to replace synchronous meetings with asynchronous video updates. It provides a comprehensive suite for capturing high-resolution desktop activity, including system audio, microphone input, and camera overlays, which are then processed through an integrated post-production workflow. The platform distinguishes itself by offering full data sovereignty through containerized deployment and object storage abstractions, allowing users to host their media assets on private infrastructure or S3-compatible buckets. Beyond simple recording, it features keyframe-based video compositing, automated AI-powered transcription, and visual branding tools that enable creators to polish and annotate their content before sharing. The system facilitates team engagement through a centralized workspace where viewers can provide feedback via timestamped comments, reactions, and playback analytics. It also includes programmatic interfaces for embedding videos into external applications, managing media assets, and automating distribution workflows. The project is distributed as a containerized application, enabling deployment on private servers to maintain complete control over data storage and access permissions.
TypeScriptScreen Capture UtilitiesCollaboration ToolsSelf-Hosted Applications
View on GitHub17,026
remotion-dev/remotion
remotion-dev/remotion
50,931View on GitHub
Remotion is a programmatic video framework that enables the creation of video content using component-based logic and standard web technologies. By leveraging a declarative animation engine, it allows developers to structure visual content as a hierarchy of reusable components, ensuring that animations and state updates remain consistent through deterministic frame execution. The framework distinguishes itself by utilizing a headless browser renderer that captures visual output frame-by-frame to generate high-quality video files. This architecture supports a cloud-native media pipeline, allowing for scalable, parallelized rendering on serverless infrastructure. Developers can interact with their compositions in real time through a browser-based studio environment, which provides tools for debugging, parameter manipulation, and visual testing before final production. Beyond its core rendering capabilities, the project includes a comprehensive suite of tools for managing media assets, including audio, captions, and vector animations. It supports complex visual effects through physics-based motion primitives, property interpolation, and integration with various graphics libraries. The system is designed for automated, high-volume production workflows, offering command-line interfaces and server-side APIs to handle the entire lifecycle of media generation and deployment.
TypeScriptCross-Platform Media FrameworksProgrammatic Video FrameworksAnimation Engines
View on GitHub50,931
gaeljacquin/yt-dlp-gui
gaeljacquin/yt-dlp-gui
66View on GitHub
Cross-platform audio/video downloader
TypeScriptAudio and Video
View on GitHub66
bilibili/ijkplayer
bilibili/ijkplayer
33,165View on GitHub
Ijkplayer is a cross-platform media playback engine designed to provide consistent audio and video rendering across mobile devices. Built upon established open-source multimedia frameworks, it functions as a unified engine that leverages hardware-accelerated decoding to process diverse media formats. The project distinguishes itself by providing a comprehensive toolchain for compiling and configuring low-level media source code into native binary libraries. This allows developers to integrate high-performance playback directly into mobile applications, utilizing a pluggable output architecture that supports custom rendering and audio modules tailored to specific operating system requirements. The library includes a native bridge that exposes core media processing logic to higher-level application environments. It manages the complex build orchestration required to support multiple CPU architectures, providing the necessary scripts and configuration files to generate and link binary frameworks for mobile deployment.
CMedia PlayersCross-Platform Media FrameworksMedia Decoders
View on GitHub33,165
display-design-studio/vue-player
display-design-studio/vue-player
11View on GitHub
Vue Video Player - Lightweight, customizable, and easy-to-implement vue video player.
VueAudio and Video
View on GitHub11

Video and audio tools

FFmpeg/FFmpeg

deepfakes/faceswap

deniscerri/ytdlnis

mifi/lossless-cut

pipecat-ai/pipecat

goldfire/howler.js

Rigellute/spotify-tui

katspaugh/wavesurfer.js

soimort/you-get

EvilCult/moviecatcher

sampotts/plyr

gethopp/hopp

fanmingming/live

gptguy/silentkeys

mpv-player/mpv

CapSoftware/Cap

remotion-dev/remotion

gaeljacquin/yt-dlp-gui

bilibili/ijkplayer

display-design-studio/vue-player