What are the best open-source GitHub repositories for Generative media and diffusion?

wanshuiyin/auto-claude-code-research-in-sleep is the closest match — This project is a machine learning research automation system designed to manage the full research lifecycle, from idea discovery to final paper submission. It utilizes markdown-based skill templates to execute autonomous research tasks and manage iterative loops of deep review and experimentation. The system distinguishes itself through integrated capabilities for academic communication and integrity auditing. It can automat…

Why does wanshuiyin/auto-claude-code-research-in-sleep match “Generative media and diffusion”?

This project is a machine learning research automation system designed to manage the full research lifecycle, from idea discovery to final paper submission. It utilizes markdown-based skill templates to execute autonomous research tasks and manage iterative loops of deep review and experimentation.…

Why does compvis/stable-diffusion match “Generative media and diffusion”?

Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables…

Why does alibaba/mnn match “Generative media and diffusion”?

MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself thro…

Why does suno-ai/bark match “Generative media and diffusion”?

Bark is a generative audio engine and machine learning inference library designed to convert written text into high-fidelity speech and sound effects. It functions as a text-to-audio transformer, utilizing multi-stage neural network architectures to map semantic input tokens into detailed audio cod…

Why does thelastben/fast-stable-diffusion match “Generative media and diffusion”?

This project is a cloud-based AI deployment system and latent diffusion model trainer. It provides a framework for launching image generation interfaces and training pipelines on remote GPU infrastructure, specifically serving as a text-to-image model fine-tuner. The system features a specialized…

Generative media and diffusion

Explore open-source frameworks and models for generating synthetic images, audio, video, and creative media content.

Find the best repos with AI.We'll search the best matching repositories with AI.

wanshuiyin/auto-claude-code-research-in-sleep
wanshuiyin/Auto-claude-code-research-in-sleep
12,182View on GitHub
This project is a machine learning research automation system designed to manage the full research lifecycle, from idea discovery to final paper submission. It utilizes markdown-based skill templates to execute autonomous research tasks and manage iterative loops of deep review and experimentation. The system distinguishes itself through integrated capabilities for academic communication and integrity auditing. It can automate the generation of LaTeX papers, conference slide decks, and evidence-grounded peer review rebuttals. To ensure rigor, it employs cross-model review routing and adversar
PythonFull-Lifecycle Research PipelinesAcademic Integrity AuditsAcademic Paper Generators
View on GitHub12,182
compvis/stable-diffusion
CompVis/stable-diffusion
73,125View on GitHub
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of images from text prompts and the transformation of existing visual inputs based on semantic instructions. The architecture utilizes a modular execution environment that decouples model loading, scheduler logic, and inference components to support diverse hardware configurations. I
Jupyter NotebookCross-Attention MechanismsDenoising SchedulersGenerative Image Engines
View on GitHub73,125
alibaba/mnn
alibaba/MNN
14,242View on GitHub
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
C++AI RuntimesComputational GraphsDeep Learning
View on GitHub14,242
suno-ai/bark
suno-ai/bark
39,159View on GitHub
Bark is a generative audio engine and machine learning inference library designed to convert written text into high-fidelity speech and sound effects. It functions as a text-to-audio transformer, utilizing multi-stage neural network architectures to map semantic input tokens into detailed audio codebooks for synthesis. The system distinguishes itself through a hierarchical transformer stacking approach that separates semantic understanding from acoustic realization. By employing autoregressive token prediction and vector quantized codebook mapping, the engine bridges linguistic and sonic doma
Jupyter NotebookGenerative Audio EnginesSpeech Synthesis ModelsText-to-Audio Synthesis
View on GitHub39,159
thelastben/fast-stable-diffusion
TheLastBen/fast-stable-diffusion
7,889View on GitHub
This project is a cloud-based AI deployment system and latent diffusion model trainer. It provides a framework for launching image generation interfaces and training pipelines on remote GPU infrastructure, specifically serving as a text-to-image model fine-tuner. The system features a specialized training interface for fine-tuning Stable Diffusion models on custom image datasets. It allows for the creation of personalized visual outputs by training models on specific subjects or artistic styles using a small set of reference images. The software covers generative AI deployment, custom style
PythonAI Infrastructure DeploymentsAI Model DashboardsCustom Diffusion Model Training
View on GitHub7,889
automatic1111/stable-diffusion-webui
AUTOMATIC1111/stable-diffusion-webui
163,743View on GitHub
Stable Diffusion Web UI is a browser-based interface designed for managing text-to-image generation tasks. It provides a centralized dashboard for controlling generative processes, including native support for multi-stage model architectures to facilitate high-quality image refinement. The platform distinguishes itself through granular control over the generation process, offering tools for precise parameter management and advanced prompt engineering. Users can customize generation styles and capabilities by integrating external model-extension formats, such as textual inversions, low-rank ad
PythonGenerative AI DashboardsGenerative Media ModelsGeneration Parameter Management
View on GitHub163,743
thudm/cogvideo
THUDM/CogVideo
12,792View on GitHub
CogVideo is a generative video framework that uses diffusion models and transformer-based architectures to synthesize high-resolution video clips. It functions as both a text-to-video and image-to-video generator, converting textual descriptions or static images into temporal visual sequences. The system integrates large language model capabilities to expand short user prompts into detailed descriptions for better visual alignment. It supports the animation of static images through latent seeding and provides the ability to extend the length of existing video sequences. The project includes
PythonImage-to-Video GenerationText-to-Video GeneratorsDiffusion Models
View on GitHub12,792
remotion-dev/remotion
remotion-dev/remotion
50,931View on GitHub
Remotion is a programmatic video framework that enables the creation of video content using component-based logic and standard web technologies. By leveraging a declarative animation engine, it allows developers to structure visual content as a hierarchy of reusable components, ensuring that animations and state updates remain consistent through deterministic frame execution. The framework distinguishes itself by utilizing a headless browser renderer that captures visual output frame-by-frame to generate high-quality video files. This architecture supports a cloud-native media pipeline, allow
TypeScriptCross-Platform Media FrameworksProgrammatic Video FrameworksAnimation Engines
View on GitHub50,931
modelscope/diffsynth-studio
modelscope/DiffSynth-Studio
12,585View on GitHub
DiffSynth-Studio is a comprehensive platform for the lifecycle management of generative diffusion models, providing a unified environment for inference, fine-tuning, and training. It utilizes a modular pipeline architecture and a standardized abstraction layer to support consistent workflows across diverse model configurations for image and video generation. The platform distinguishes itself through a memory-optimized inference engine that dynamically manages resources to facilitate high-resolution generation on constrained hardware. It also integrates specialized training capabilities, inclu
PythonCustom Diffusion Model TrainingDiffusion ModelsDiffusion Pipelines
View on GitHub12,585
lllyasviel/controlnet
lllyasviel/ControlNet
33,942View on GitHub
ControlNet is a framework for structural image generation that extends pre-trained diffusion models with neural network architectures designed for precise spatial control. By injecting structural guidance directly into the latent-space denoising process, the system enables users to enforce geometric or semantic constraints on generated outputs while maintaining style consistency. The framework distinguishes itself through a weight-locked copying mechanism that preserves the integrity of the original model while introducing new control signals. It supports multi-condition synthesis, allowing f
PythonDiffusion Conditioning ArchitecturesGenerative Model Training ToolsStructural Guidance
View on GitHub33,942
black-forest-labs/flux
black-forest-labs/flux
25,637View on GitHub
Flux is a diffusion model inference engine designed for text-to-image generation and image-to-image manipulation. It provides a system for executing open-weight models to transform natural language descriptions into visual imagery or to modify existing images. The project distinguishes itself through a flow-matching framework for image generation and a structural image controller. This controller allows for guided synthesis by using depth maps and Canny edge detection to constrain the geometry and composition of the output. The toolkit covers a broad range of image editing capabilities, incl
PythonDiffusion ModelsText-to-Image GeneratorsAdapter-Based Conditioning
View on GitHub25,637
mxgmn/wavefunctioncollapse
mxgmn/WaveFunctionCollapse
24,697View on GitHub
WaveFunctionCollapse is a procedural generation engine that creates complex, non-repeating patterns by treating spatial arrangement as a constraint satisfaction problem. It functions as a stochastic solver that derives output structures from a single input example, ensuring that every element placed within a grid satisfies specific adjacency requirements relative to its neighbors. The system distinguishes itself by using an entropy-driven approach to grid collapse, where it iteratively selects the cell with the fewest remaining possibilities to trigger a cascade of logical updates. By decompo
C#Constraint-Based Synthesis EnginesProcedural Content GenerationEntropy-Based Solvers
View on GitHub24,697
labmlai/annotated_deep_learning_paper_implementations
labmlai/annotated_deep_learning_paper_implementations
66,981View on GitHub
This project is a collection of deep learning research papers translated into annotated code. It serves as a resource for reproducing academic research, providing implementations of transformers, diffusion models, and reinforcement learning architectures. The library distinguishes itself by using a side-by-side annotation format that combines executable Python code with descriptive markdown notes. This approach provides a structured way to explain the logic of neural network papers alongside their PyTorch-based implementations. The codebase covers several major capability areas, including ge
PythonAnnotated Code ImplementationsPaper ImplementationsAgent Training Patterns
View on GitHub66,981
3b1b/manim
3b1b/manim
87,664View on GitHub
Manim is a Python-based computational geometry framework designed for programmatic video production. It functions as a mathematical animation engine, allowing users to generate high-fidelity visual content by scripting scene definitions rather than using traditional timeline-based editing software. The library is built to translate code-based instructions into precise, frame-accurate animations, making it a tool for explaining complex mathematical functions, geometric proofs, and abstract theories. The engine distinguishes itself through a declarative scene graph that organizes visual element
PythonDeclarative Scene GraphsFrameworksKeyframe Animations
View on GitHub87,664
stability-ai/generative-models
Stability-AI/generative-models
27,189View on GitHub
This is a framework for training and sampling diffusion models to generate high-fidelity images, video, and 4D assets. It provides a modular environment for managing generative AI training pipelines, including the handling of datasets, noise sampling, and loss weighting to stabilize the creation of synthetic content. The project features a modular model configuration system that uses YAML-based assembly to define network submodules and conditioners. It also includes a dedicated toolset for AI image watermarking, allowing for the embedding and detection of invisible markers to verify the origi
PythonDiffusion ModelsLatent Diffusion ModelsDiffusion Process Conditioners
View on GitHub27,189
hacksider/deep-live-cam
hacksider/Deep-Live-Cam
93,878View on GitHub
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a high-performance processing pipeline, the application enables live face swapping and interactive video modifications during active streaming sessions or on pre-recorded media. The system distinguishes itself through a hardware-abstraction execution layer that dynamically routes co
PythonCinematic Video EnhancementsHigh-Performance AI InferenceLive Performance Execution
View on GitHub93,878
keras-team/keras
keras-team/keras
64,094View on GitHub
Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a directed acyclic graph approach, the framework allows users to build intricate models with multiple inputs, outputs, and shared layers, ensuring consistent numerical execution through functional state management. The project distinguishes itself as a multi-backend machine learning
PythonFrameworksModel DefinitionArchitectures
View on GitHub64,094
corentinj/real-time-voice-cloning
CorentinJ/Real-Time-Voice-Cloning
59,918View on GitHub
This project is a neural text-to-speech engine and voice cloning toolkit designed to generate synthetic speech that mimics the vocal characteristics of a target speaker. It functions as a real-time audio synthesizer, utilizing a deep learning pipeline to convert written text into high-fidelity speech output with minimal latency. The system employs a transfer learning framework that leverages pre-trained speaker verification models to adapt synthesis to new, unseen vocal identities. By using an encoder-based speaker embedding process, the toolkit maps variable-length audio samples into a laten
PythonNeural Text-to-Speech EnginesNeural VocodersReal-Time Voice Cloning
View on GitHub59,918
invoke-ai/invokeai
invoke-ai/InvokeAI
27,500View on GitHub
InvokeAI is a self-hosted, professional-grade platform designed for managing generative models and performing complex image synthesis. It provides a local application environment that allows users to execute diffusion models directly on their own hardware, ensuring data privacy and complete ownership of all generated assets. The platform distinguishes itself through a node-based workflow system that enables the construction of reproducible and automated image generation pipelines. By chaining modular functional units into directed acyclic graphs, users can automate intricate production tasks
TypeScriptStable Diffusion Web InterfacesGenerative AISelf-Hosted AI Platforms
View on GitHub27,500
harry0703/moneyprinterturbo
harry0703/MoneyPrinterTurbo
88,651View on GitHub
MoneyPrinterTurbo is an automated video generation tool that synthesizes scripts, voiceovers, subtitles, and background music into finished video files. It functions as a command-line engine that orchestrates the entire content creation pipeline, handling the assembly of media assets through automated processing. The project distinguishes itself by providing a browser-based interface for managing generation parameters and monitoring batch production tasks. It utilizes a modular pipeline that chains together distinct services for script generation and voice synthesis, while relying on a multim
PythonAI Video GeneratorsAutomated Video GeneratorsAutomated Video Synthesis
View on GitHub88,651
zhaochenyang20/awesome-ml-sys-tutorial
zhaochenyang20/Awesome-ML-SYS-Tutorial
5,371View on GitHub
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
PythonAwesome ListDistributed TrainingDistributed Training Frameworks
View on GitHub5,371
airbnb/lottie-web
airbnb/lottie-web
31,926View on GitHub
This project is a cross-platform animation engine and vector animation player designed to render complex motion graphics within web browsers. It functions as a declarative motion framework, allowing developers to decouple visual design from application logic by using structured data files to define sophisticated animations. The library distinguishes itself by offering multiple rendering paths, including native support for vector graphics through the browser document object model and raster-based drawing via canvas elements. It utilizes a dedicated property interpolation engine to calculate ke
JavaScriptAnimation EnginesVector Animation LibrariesDeclarative Motion Frameworks
View on GitHub31,926
huggingface/diffusers
huggingface/diffusers
33,872View on GitHub
Diffusers is a PyTorch-based library and generative AI framework used to build, train, and deploy diffusion pipelines for producing multi-modal media. It provides a suite of tools for generating images, video, and audio from natural language descriptions, as well as specialized systems for text-to-image generation. The project differentiates itself through a modular architecture that separates noise schedulers, pretrained model blocks, and pipeline compositions. This structure allows for the construction of custom generation workflows and the ability to swap individual components of the diffu
PythonDiffusion PipelinesText-to-Image GeneratorsCustom Diffusion Model Training
View on GitHub33,872
xinntao/real-esrgan
xinntao/Real-ESRGAN
35,798View on GitHub
Real-ESRGAN is a deep learning restoration pipeline designed to enhance low-resolution media and improve the visual quality of damaged photographs. It functions as a generative image upscaler that reconstructs high-resolution details from source inputs by utilizing neural networks trained to fill in missing information and remove noise. The project distinguishes itself as a blind super-resolution tool, meaning it improves image sharpness and fidelity without requiring prior knowledge of the specific degradation applied to the source. It employs high-order degradation modeling to address compl
PythonGenerative UpscalersImage Enhancement ToolsBlind Restoration Models
View on GitHub35,798
city96/comfyui-gguf
city96/ComfyUI-GGUF
3,291View on GitHub
ComfyUI-GGUF is a memory optimizer and model loader for ComfyUI that enables the execution of large transformer-based generative models using quantized weights. It provides a system for loading GGUF formatted weights within a node-based diffusion interface to reduce GPU memory consumption. The project includes a quantization tool for converting standard model checkpoints into compressed binary formats and a tensor fixer to restore missing keys and correct architectures in binary model files. These utilities ensure that compressed models remain functional during inference on hardware with limi
PythonComfyUI Custom Node SuitesDiffusion Model Memory OptimizersDiffusion Models
View on GitHub3,291
manimcommunity/manim
ManimCommunity/manim
39,029View on GitHub
Manim is a scriptable, code-driven framework designed for generating precise technical visualizations and mathematical animations. By using a high-level programming interface, it allows users to define geometric shapes, motion paths, and animation logic that are compiled into high-quality video assets. The system functions as a specialized engine for creating reproducible, data-driven representations of complex mathematical concepts and geometric transformations. The framework distinguishes itself through an interpolation-based engine that calculates intermediate states between keyframes to e
PythonAnimation EnginesMathematical AnimationProgrammatic Video Frameworks
View on GitHub39,029
ai-dynamo/dynamo
ai-dynamo/dynamo
6,112View on GitHub
Dynamo is a distributed inference orchestration platform designed for large language models. It functions as a system to coordinate prefill and decode phases across GPU nodes, utilizing a multi-backend runtime adapter to connect engines like vLLM and TensorRT-LLM through a unified block-oriented memory interface. An OpenAI-compatible API server provides the frontend for integration with existing tools and clients. The project is distinguished by its disaggregated serving architecture, which separates prompt processing and token generation onto independent GPU pools to optimize throughput and
RustDisaggregated Inference OrchestrationPrefill-Decode DisaggregationActivation and KV Cache Offloaders
View on GitHub6,112
ffmpeg/ffmpeg
FFmpeg/FFmpeg
61,176View on GitHub
FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration into custom applications. At its core, the project utilizes a packet-based stream engine and a format-agnostic abstraction layer to handle diverse media standards, containers, and network protocols. The framework distinguishes itself through a modular, graph-based filter execution model that allows f
CMultimedia Format ConvertersMultimedia Processing SuitesAudio and Video
View on GitHub61,176
huggingface/smollm
huggingface/smollm
3,624View on GitHub
SmolLM is a project dedicated to the development of small language models. It focuses on training and fine-tuning compact models that maintain high performance while utilizing fewer parameters. The project emphasizes efficient AI inference and on-device text generation, aiming to enable the deployment of lightweight models on edge devices with limited memory and processing power. It utilizes synthetic data generation to produce artificial datasets that improve the reasoning and training of these AI systems. The system supports a variety of optimization and training capabilities, including we
PythonEdge AI Model DeploymentCausal Language ModelingDataset Sharing
View on GitHub3,624
s0md3v/roop
s0md3v/roop
3,527View on GitHub
This application is a deep learning tool designed for automated face swapping in images and videos. It utilizes generative adversarial networks to map facial features from a source image onto a target subject, maintaining the original head pose, lighting, and skin texture of the target media. The software functions as a computer vision pipeline that deconstructs video files into individual frames for sequential processing. It employs pre-trained models for landmark detection and high-dimensional feature extraction to align faces precisely. To accelerate these complex tensor operations, the en
PythonFace Swapping ApplicationsGenerative Identity ModelsInference Engines
View on GitHub3,527
apple/ml-stable-diffusion
apple/ml-stable-diffusion
17,901View on GitHub
This project is a framework for running Stable Diffusion image generation models on Apple Silicon using Core ML hardware acceleration. It provides a local generative AI pipeline for producing images from text prompts using Swift and Python without relying on external cloud APIs. The system includes a model converter to transform deep learning checkpoints into Core ML formats and a model optimizer to quantize weights and activations. It features a ControlNet integration layer to guide image generation using external signals such as edge and depth maps. Capabilities cover text-to-image generat
PythonImage GenerationText-to-Image GeneratorsApple Hardware Acceleration
View on GitHub17,901
rvc-boss/gpt-sovits
RVC-Boss/GPT-SoVITS
58,724View on GitHub
GPT-SoVITS is a text-to-speech synthesis engine and voice cloning toolkit designed for generating natural-sounding human speech. It functions as a neural audio processing pipeline that maps input text to high-fidelity audio waveforms, utilizing conditional variational autoencoders and flow-based decoders to ensure expressive output. The platform distinguishes itself through its ability to perform few-shot voice cloning and cross-lingual speech generation, allowing users to maintain a specific speaker's vocal identity and emotional delivery across multiple languages. By employing cross-modal l
PythonAcoustic ModelsCross-Lingual Speech GeneratorsSynthetic Speech Generation
View on GitHub58,724
sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
PythonChat Completion ServicesDisaggregated InferenceHigh-Throughput Model Serving
View on GitHub29,079
comfy-org/comfyui
Comfy-Org/ComfyUI
117,227View on GitHub
ComfyUI is a node-based generative AI orchestration engine designed for constructing, testing, and executing complex image and video synthesis pipelines. By utilizing a directed acyclic graph execution model, the platform allows users to build reproducible workflows through modular, interconnected processing blocks without requiring manual code implementation. It serves as both a local environment for high-performance model inference and a production-ready server for deploying generative capabilities. The platform distinguishes itself through its focus on workflow portability and extensibilit
PythonNode-Based Generative PipelinesDirected Acyclic Graph Execution EnginesGenerative AI Orchestration Engines
View on GitHub117,227
alembics/disco-diffusion
alembics/disco-diffusion
7,407View on GitHub
This project is a diffusion-based AI art generator and animation framework used to create digital images and motion graphics from text prompts. It functions as a system for producing stylized videos and AI art through iterative diffusion sampling and neural network models. The framework distinguishes itself through specialized tools for 3D depth animation, using depth-map transformations to create spatial movement. It also includes neural style transfer capabilities to apply specific artistic looks, such as watercolor or pixel art, and utilizes optical flow frame blending to reduce flickering
Jupyter NotebookDiffusion Sampling MethodsVideo and AnimationAnimation & Motion Graphics
View on GitHub7,407
fanmingming/live
fanmingming/live
27,661View on GitHub
This project is an IPTV playlist manager and live stream aggregator designed to organize and maintain custom television channel listings. It functions as a centralized repository for verified broadcast links, providing the tools necessary to consolidate disparate media sources into unified, standardized playlist files compatible with third-party streaming applications. The system distinguishes itself by utilizing client-side stream resolution, where the playback device handles the final network request to the media source, thereby reducing bandwidth demands on the hosting infrastructure. It a
JavaScriptPlaylist GeneratorsPlaylist FormatsPlaylist Managers
View on GitHub27,661
fetchai/innovation-lab-examples
fetchai/innovation-lab-examples
1,028View on GitHub
This project provides a comprehensive framework for building, deploying, and orchestrating autonomous agents within a decentralized network. It serves as a collection of patterns and examples for developing intelligent software entities capable of performing complex tasks, making decisions, and interacting with other agents to achieve shared goals. The framework distinguishes itself through its focus on multi-agent orchestration and decentralized communication. It enables the coordination of specialized agent teams that collaborate on workflows through structured messaging protocols, allowing
PythonAutonomous AgentsAgent Communication ProtocolsAgent-to-Agent Communication
View on GitHub1,028
mifi/lossless-cut
mifi/lossless-cut
41,364View on GitHub
LosslessCut is a desktop application designed for the precise editing of video and audio files without re-encoding the underlying media streams. By performing stream copying and container remuxing, the software allows users to cut, merge, and rearrange media segments while maintaining the original bit-perfect quality of the source content. The application distinguishes itself by utilizing a stream-copying data pipeline that transfers raw media packets directly from source to destination, significantly reducing processing time compared to traditional transcoding workflows. It also functions as
TypeScriptVideo EditingMultimedia ProcessingTranscoding Engines
View on GitHub41,364
nielsrogge/transformers-tutorials
NielsRogge/Transformers-Tutorials
11,641View on GitHub
This is a collection of tutorials and practical demonstrations for implementing machine learning tasks using the HuggingFace Transformers library. It serves as a guide for applying transformer architectures across computer vision, natural language processing, and audio analysis. The repository provides implementation examples for multimodal model deployment, including the combination of text, image, and audio inputs. It includes resources for optimizing pre-trained models through fine-tuning on custom datasets and provides examples for preparing PyTorch datasets by converting raw files into t
Jupyter NotebookTransformer ImplementationsTransformer TutorialsComputer Vision
View on GitHub11,641
pixijs/pixijs
pixijs/pixijs
47,416View on GitHub
PixiJS is a high-performance 2D rendering engine designed for building interactive visual content and browser-based games. It provides a hardware-accelerated graphics library that leverages WebGL and WebGPU backends to execute complex scenes, utilizing a hierarchical scene graph to manage object transformations and display order. The project distinguishes itself through a sophisticated architecture that decouples rendering logic from hardware APIs, allowing for consistent performance across diverse browser environments. It features a robust, asynchronous asset pipeline that handles loading, c
TypeScriptGame EnginesRenderingScene Graph Frameworks
View on GitHub47,416
nvlabs/stylegan
NVlabs/stylegan
14,412View on GitHub
StyleGAN is a TensorFlow-based generative adversarial network framework designed for the synthesis of high-resolution synthetic imagery. It utilizes a style-based generator architecture to create realistic visual assets from latent vectors, focusing on the production of high-fidelity images. The system incorporates style mixing and stochastic noise injection to control visual attributes and fine-grained details. It uses adaptive instance normalization and progressive resolution upsampling to manage image quality and variety across different resolutions. The framework covers the full lifecycl
PythonGenerative Adversarial Image SynthesisAdversarial Training ProceduresGenerative Adversarial Architectures
View on GitHub14,412
teamnewpipe/newpipe
TeamNewPipe/NewPipe
38,701View on GitHub
NewPipe is a privacy-focused media client that aggregates content from multiple streaming platforms into a single, unified interface. By utilizing a specialized parsing engine, the application extracts structured metadata directly from raw web content, allowing users to browse and play media without requiring individual service accounts or proprietary tracking. The application distinguishes itself through a decoupled playback engine that separates core streaming logic from the user interface, enabling persistent background audio and floating window playback. To ensure consistent access, the s
JavaMedia PlayersContent AggregatorsContent Extraction Engines
View on GitHub38,701
eventual-inc/daft
Eventual-Inc/Daft
5,225View on GitHub
Daft is a distributed dataframe library and multimodal data processor designed to handle large-scale structured and unstructured data. It functions as a vectorized execution engine that processes tables alongside images, audio, and video, utilizing a unified schema to manage diverse data types. The project distinguishes itself by combining distributed data engineering with large-scale AI inference. It provides an AI data pipeline for batch-optimizing model prompts and generating high-dimensional text embeddings, while utilizing zero-copy memory sharing to execute custom Python functions witho
RustDistributed DataframesMultimodal ProcessingBatch Inference Pipelines
View on GitHub5,225
goldfire/howler.js
goldfire/howler.js
25,190View on GitHub
Howler.js is a JavaScript library that provides a unified interface for managing audio playback across web browsers. It functions as a cross-browser audio engine, abstracting complex browser audio APIs into a consistent developer experience while ensuring reliable performance through automatic fallback mechanisms. The library distinguishes itself by offering specialized tools for spatial audio and efficient asset management. It includes a spatial audio framework that maps three-dimensional vectors to audio nodes for immersive sound positioning, alongside an audio sprite manager that allows de
JavaScriptWeb Audio LibrariesAudio ProcessingWeb Game Audio Engines
View on GitHub25,190
lazyprogrammer/machine_learning_examples
lazyprogrammer/machine_learning_examples
8,823View on GitHub
This project is a comprehensive collection of practical code examples and implementation libraries for machine learning. It provides a wide array of reference materials for building supervised, unsupervised, and reinforcement learning algorithms. The repository serves as a multi-domain resource, featuring specific implementation suites for financial AI, Bayesian statistical modeling, and deep learning architectures. It includes a framework for training intelligent agents using policy gradients and actor-critic models, as well as practical guides for fine-tuning transformers and utilizing larg
PythonDeep Learning ModelsMachine Learning ImplementationsActor-Critic Architectures
View on GitHub8,823
iperov/deepfacelive
iperov/DeepFaceLive
30,536View on GitHub
DeepFaceLive is a desktop application designed for real-time facial replacement and animation within live video streams. By utilizing deep learning models, the software performs high-speed identity mapping and facial feature analysis to transform video content as it is captured. The engine relies on GPU-accelerated inference to execute these complex image manipulation tasks at interactive frame rates. The application distinguishes itself through a modular video processing pipeline that chains specialized tasks to maintain high throughput and low latency. It features a virtual camera streaming
PythonFacial Manipulation ModelsHardware-Accelerated InferenceReal-Time Face Swapping
View on GitHub30,536
resemble-ai/chatterbox
resemble-ai/chatterbox
22,751View on GitHub
Chatterbox is a comprehensive machine learning platform designed for multilingual speech synthesis and real-time audio generation. It functions as an engine that converts text into natural-sounding speech, capable of replicating specific human vocal characteristics and emotional expressions from short audio samples. The platform distinguishes itself through advanced control over the synthesis process, allowing for the manipulation of emotional intensity and the injection of non-verbal vocalizations such as laughter or coughing. It is engineered for low-latency performance, utilizing an optimi
PythonSpeech SynthesisText-to-SpeechVoice Agents
View on GitHub22,751
mrdoob/three.js
mrdoob/three.js
113,086View on GitHub
This project is a high-level 3D graphics engine designed to render complex, hardware-accelerated environments within web browsers. It provides a comprehensive abstraction layer that manages scene graphs, cameras, and lighting, mapping high-level scene definitions onto low-level graphics APIs. By decoupling these definitions from specific hardware targets, the engine ensures consistent performance across diverse browsers and devices. The framework distinguishes itself through a robust architecture that includes a unified math library for high-frequency spatial calculations and a physically bas
JavaScript3D Rendering EnginesAbstraction-Layer Rendering BackendsBrowser-Based 3D Visualizations
View on GitHub113,086

Generative media and diffusion

wanshuiyin/Auto-claude-code-research-in-sleep

CompVis/stable-diffusion

alibaba/MNN

suno-ai/bark

TheLastBen/fast-stable-diffusion

AUTOMATIC1111/stable-diffusion-webui

THUDM/CogVideo

remotion-dev/remotion

modelscope/DiffSynth-Studio

lllyasviel/ControlNet

black-forest-labs/flux

mxgmn/WaveFunctionCollapse

labmlai/annotated_deep_learning_paper_implementations

3b1b/manim

Stability-AI/generative-models

hacksider/Deep-Live-Cam

keras-team/keras

CorentinJ/Real-Time-Voice-Cloning

invoke-ai/InvokeAI

harry0703/MoneyPrinterTurbo

zhaochenyang20/Awesome-ML-SYS-Tutorial

airbnb/lottie-web

huggingface/diffusers

xinntao/Real-ESRGAN

city96/ComfyUI-GGUF

ManimCommunity/manim

ai-dynamo/dynamo

FFmpeg/FFmpeg

huggingface/smollm

s0md3v/roop

apple/ml-stable-diffusion

RVC-Boss/GPT-SoVITS

sgl-project/sglang

Comfy-Org/ComfyUI

alembics/disco-diffusion

fanmingming/live

fetchai/innovation-lab-examples

mifi/lossless-cut

NielsRogge/Transformers-Tutorials

pixijs/pixijs

NVlabs/stylegan

TeamNewPipe/NewPipe

Eventual-Inc/Daft

goldfire/howler.js

lazyprogrammer/machine_learning_examples

iperov/DeepFaceLive

resemble-ai/chatterbox

mrdoob/three.js