4 مستودعات
Temporal transformations applied to sequences of images to prepare video data for training.
Distinct from Image Data Preprocessing: Focuses on temporal operations like mirroring and reversal for video, rather than static image preprocessing.
Explore 4 awesome GitHub repositories matching artificial intelligence & ml · Video Sequence Preprocessing. Refine with filters or upvote what's useful.
LivePortrait is a deep learning framework for portrait animation that transfers facial expressions from a driving video to a static image. It functions as an AI motion retargeting tool, mapping movements between different identities while preserving the unique features of the source portrait. The system includes specialized capabilities for cross-species portrait animation, adapting human-centric models to non-human subjects and animals. It also features a motion template generator that converts driving videos into portable files to accelerate inference and protect the identity of the origina
Applies temporal and spatial preprocessing to video sequences to prepare them for motion extraction.
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
Performs temporal mirroring and frame reversal to prepare video sequences for generative model training.
LatentSync هو مولد فيديو مدفوع بالصوت ونموذج مزامنة شفاه انتشار كامن مصمم لمزامنة حركات شفاه المتحدث في فيديو مع مسار صوتي مستهدف. يوفر إطار عمل تدريب لمزامنة الشفاه لتطوير شبكات المزامنة على مجموعات بيانات فيديو وصوت مخصصة. يستخدم النظام خط أنابيب معالجة فيديو مسبقة لتنظيف، وتقسيم، ومحاذاة بيانات الوجه. ويتضمن أداة تقييم مزامنة مرئية تحسب درجات الثقة لقياس دقة محاذاة الصوت والمرئيات في مقاطع الفيديو التي تم إنشاؤها. يغطي المشروع إمكانات لتطوير شبكة مزامنة مخصصة، وإدارة تكوين التدريب لذاكرة الأجهزة والدقة، وتقييم الفيديو الاصطناعي.
Cleans and segments video files by aligning faces and filtering for quality before training synchronization models.
This project is a PyTorch implementation of 3D residual networks designed for video action recognition. It provides a spatiotemporal architecture that analyzes both spatial frames and temporal motion to classify human activities within video clips. The system includes a distributed model training framework to accelerate learning across multiple compute nodes. It supports the deployment and fine-tuning of pre-trained model weights, allowing the adaptation of existing networks to specific new datasets. The codebase covers the full pipeline for spatiotemporal learning, including video dataset p
Provides video sequence preprocessing utilities to transform raw video into training-ready image frames.