4 个仓库
Temporal transformations applied to sequences of images to prepare video data for training.
Distinct from Image Data Preprocessing: Focuses on temporal operations like mirroring and reversal for video, rather than static image preprocessing.
Explore 4 awesome GitHub repositories matching artificial intelligence & ml · Video Sequence Preprocessing. Refine with filters or upvote what's useful.
LivePortrait is a deep learning framework for portrait animation that transfers facial expressions from a driving video to a static image. It functions as an AI motion retargeting tool, mapping movements between different identities while preserving the unique features of the source portrait. The system includes specialized capabilities for cross-species portrait animation, adapting human-centric models to non-human subjects and animals. It also features a motion template generator that converts driving videos into portable files to accelerate inference and protect the identity of the origina
Applies temporal and spatial preprocessing to video sequences to prepare them for motion extraction.
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
Performs temporal mirroring and frame reversal to prepare video sequences for generative model training.
LatentSync 是一个音频驱动的视频生成器和潜在扩散唇形同步模型,旨在将视频中说话者的唇形动作与目标音轨同步。它提供了一个唇形同步训练框架,用于在自定义视频和音频数据集上开发同步网络。 该系统利用视频预处理流水线来清理、分割和对齐人脸数据。它包括一个视觉同步评估工具,该工具计算置信度分数以衡量生成视频中音频和视觉对齐的准确性。 该项目涵盖了自定义同步网络开发、针对硬件内存和分辨率的训练配置管理以及合成视频评估的功能。
Cleans and segments video files by aligning faces and filtering for quality before training synchronization models.
该项目是用于视频动作识别的 3D 残差网络(3D Residual Networks)的 PyTorch 实现。它提供了一种时空架构,通过分析空间帧和时间运动来对视频片段中的人类活动进行分类。 该系统包含一个分布式模型训练框架,以加速跨多个计算节点的学习过程。它支持预训练模型权重的部署与微调,允许将现有网络适配到特定的新数据集。 代码库涵盖了时空学习的全流程,包括用于将原始文件转换为图像序列的视频数据集预处理工具、动作推理功能以及用于计算识别准确率的指标。
Provides video sequence preprocessing utilities to transform raw video into training-ready image frames.