8 रिपॉजिटरी
Processes video inputs as discrete sequential frames for real-time manipulation.
Explore 8 awesome GitHub repositories matching data & databases · Frame-Based. Refine with filters or upvote what's useful.
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a high-performance processing pipeline, the application enables live face swapping and interactive video modifications during active streaming sessions or on pre-recorded media. The system distinguishes itself through a hardware-abstraction execution layer that dynamically routes co
Processes video input as discrete sequential frames for real-time manipulation.
Facefusion is a modular framework designed for automated image and video manipulation, specializing in tasks such as face swapping, enhancement, and restoration. It functions as a computer vision processing pipeline that chains independent machine learning modules to perform complex transformations, including facial animation, age modification, and lip synchronization. The system is built to handle both real-time interactive feeds and large-scale batch processing tasks. The platform distinguishes itself through a highly extensible architecture that supports custom processing modules and inter
Processes video inputs as discrete sequential frames for real-time manipulation.
DeOldify is a deep learning system and a set of pre-trained computer vision models designed to apply realistic colors to grayscale photographs and video footage. It functions as a neural media restoration tool that uses trained networks to estimate original hues for black-and-white media and remove glitches and artifacts from aged images and film. The project employs a NoGAN colorization technique that removes the GAN discriminator during training to prevent artifacts and avoid over-saturation of pixels. For cinematic sequences, it applies temporal frame consistency to maintain color stabilit
Applies temporal stability constraints to prevent color flickering across consecutive video frames.
CodeFormer is a deep learning framework designed for the restoration and enhancement of facial images and video sequences. It functions as a comprehensive processing engine capable of reconstructing high-quality facial features from degraded, blurry, or damaged inputs, while also providing tools for image upscaling and generative inpainting to fill missing or corrupted regions. The system distinguishes itself by utilizing a codebook-based quantization approach that maps input patches to high-quality facial representations, supported by transformer-based global modeling to ensure structural co
Applies frame-to-frame constraints to minimize flickering and maintain stability across video sequences.
FramePack is a neural video synthesis engine and generation framework designed to produce long, temporally consistent video sequences. It functions as a diffusion model optimizer, providing a suite of techniques to manage the computational demands of high-parameter video models while maintaining visual stability during extended generation tasks. The system distinguishes itself through a hierarchical approach to frame prediction, which plans distant anchor frames before filling in intermediate content to prevent cumulative temporal drift. By utilizing constant-length context compression and to
Ensures temporal stability in video generation through anchor frame planning.
Pipecat is a framework and software development kit for building real-time multimodal AI agents and speech-to-speech systems. It utilizes a frame-based data pipeline to route audio, video, and text through a modular sequence of processors, enabling the orchestration of low-latency conversational AI. The project is distinguished by its ability to coordinate complex multimodal services, including speech-to-text, language models, and text-to-speech, within a single pipeline. It features semantic voice activity detection for natural turn-taking, state-machine conversation flows for dialogue manag
Injects and routes discrete data frames into the pipeline from the beginning or end for processing.
OpenALPR is a computer vision platform designed to identify vehicle license plates and attributes from live video streams or static images. It functions as an intelligent access control and analytics system, enabling the automation of security monitoring, parking facility management, and operational workflows through real-time vehicle detection. The platform distinguishes itself by supporting international license plate formats and regional configuration mapping, allowing for deployment across diverse geographic standards. It integrates directly with existing network camera infrastructure, pe
Analyzes continuous video feeds by sampling frames to perform real-time detection and tracking of moving vehicles.
Hallo is an audio-driven talking head generator and portrait animation framework. It synchronizes a static portrait image with an audio file to produce realistic talking head videos by mapping audio spectral features to facial expressions and lip movements. The system utilizes a diffusion video synthesis model that employs iterative denoising and latent representations to generate temporally consistent video frames. It incorporates identity-preserving feature extraction and latent space motion modeling to maintain visual consistency and control facial poses. The toolkit provides capabilities
Enforces frame-to-frame smoothness using learned priors to prevent flickering and visual artifacts.