30 open-source projects similar to nvidia/vid2vid, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Vid2vid alternative.
pix2pixHD is a conditional generative adversarial network designed to transform semantic label maps into high-resolution photorealistic images. It functions as a high-resolution image synthesizer and an image-to-image translation model capable of producing synthetic images at 2048x1024 resolution. The system includes a semantic image editor that allows for the modification of high-resolution visuals by updating the underlying semantic label maps. This enables interactive image editing and the generation of photorealistic images based on source images or discrete label maps. The framework pro
StarGAN is a PyTorch image-to-image translation framework designed to synthesize visual styles and attributes across multiple domains. It implements a generative adversarial network that serves as a deep learning image translator for modifying specific visual characteristics within an image dataset. The framework uses a single unified model to handle translations between multiple image domains rather than requiring separate pairs of models. It is a research implementation that learns mappings between different image attributes without the need for paired training data. The project covers the
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)
This project is a PyTorch object detection framework that implements the Faster R-CNN architecture. It serves as a vision model for predicting precise bounding boxes around multiple objects within images and live video feeds. The system is optimized for multi-GPU training to reduce the time required for model convergence. It utilizes a GPU-accelerated design to handle the training and inference of complex detection networks. The framework covers the full object detection lifecycle, including custom network training and inference for static images and real-time video streams. It includes capa
Darkflow is an object detection framework and computer vision pipeline that provides a programmatic interface for performing real-time image analysis and object identification. It functions as a tool for loading weights, fine-tuning models, and executing inference on both static images and video feeds. The project serves as a converter that translates Darknet configurations and weights into TensorFlow graphs to enable retraining and deployment. It includes a model exporter that saves trained graphs into portable protobuf files for use on mobile and native devices. The system covers capabilit
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"
MediaPipe is a cross-platform machine learning framework designed for building and deploying pipelines that process live and streaming media. It provides a system for connecting processing components into custom machine learning chains to analyze real-time audio and video streams. The framework includes a suite of pre-trained models for tasks such as hand, face, and pose tracking, along with tools for retraining and customizing these models with specific datasets. It also features a dedicated benchmarker for measuring the execution speed and accuracy of machine learning models directly within
This project is a deep learning framework designed for training and deploying image-to-image translation models. It serves as a research platform for experimenting with neural network architectures that transform visual content between distinct stylistic domains, supporting both paired and unpaired training data. The framework distinguishes itself through its support for cycle-consistency constraints, which allow for image translation between domains without requiring corresponding paired examples. It provides a structured pipeline that utilizes adversarial loss optimization, where generator
We have a reimplementation of the UNIT method that is more performant. It is avaiable at Imaginaire
CVPR 2025 Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
This repo is used to research convolutional networks primarily for computer vision tasks. For this purpose, the repo contains (re)implementations of various classification, segmentation, detection, and pose estimation models and scripts for training/evaluating/converting.
TensorFlow implementation of Unsupervised Cross-Domain Image Generation.
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
OpenPose is a real-time pose estimation engine designed to detect and track human body, face, hand, and foot landmarks. It functions as a multi-person motion tracker, identifying the spatial coordinates of multiple individuals simultaneously within video streams or static images. Beyond two-dimensional detection, the software acts as a three-dimensional kinematics processor, reconstructing spatial movement data from single or multiple synchronized camera perspectives. The system distinguishes itself through a bottom-up approach that utilizes part-affinity fields to associate body parts across
This project is an unsupervised image restoration tool that uses a convolutional neural network as a structural prior to reconstruct images from noisy or incomplete data. It functions as a neural network image prior, utilizing the inherent biases of the network architecture to restore pixels without the need for a pre-trained dataset or external learning. The system performs zero-shot image restoration by treating the network architecture itself as a regularization term. It uses a randomly initialized encoder-decoder structure and iterative gradient descent to minimize pixel-wise loss, recove
Keras-GAN is a collection of generative adversarial network implementations built with Keras for synthetic data generation and image manipulation. It provides frameworks for image-to-image translation, image inpainting, and neural image super-resolution. The library includes tools for learning disentangled latent space representations to control specific attributes of synthetic outputs. It also features capabilities for image domain translation using paired or unpaired data and the ability to fill corrupted or missing image parts by analyzing surrounding visual context. The project covers ge
⚠️ Regrettably, I cannot perform maintenance due to the loss of the materials. I'm archiving this repository for reference
CycleGAN is a generative adversarial network framework designed for unpaired image-to-image translation. It enables the conversion of images between two distinct visual domains using datasets that do not require direct one-to-one matching examples. The project implements a deep learning style transfer tool capable of artistic style transfer, object transfiguration, and domain-to-domain conversion. It uses a dual-generator architecture and cycle-consistency loss to ensure that images translated to a target domain and back recover their original state. The framework covers core machine learnin
Image Deblurring using Generative Adversarial Networks
CVPR25 Official implementation of `MobileMamba: Lightweight Multi-Receptive Visual Mamba Network.'
Pytorch implementation of CoordConv introduced in An intriguing failing of convolutional neural networks and the CoordConv solution paper
This project is a PyTorch implementation of 3D residual networks designed for video action recognition. It provides a spatiotemporal architecture that analyzes both spatial frames and temporal motion to classify human activities within video clips. The system includes a distributed model training framework to accelerate learning across multiple compute nodes. It supports the deployment and fine-tuning of pre-trained model weights, allowing the adaptation of existing networks to specific new datasets. The codebase covers the full pipeline for spatiotemporal learning, including video dataset p
pix2pix is a framework for image-to-image translation using conditional generative adversarial networks. It functions as a supervised trainer and visual domain mapper designed to learn a mapping between input and output images for style and domain transfer. The system utilizes a U-Net encoder-decoder architecture combined with a PatchGAN local discriminator to enforce high-frequency local consistency. It employs L1 loss regularization to ensure generated outputs remain structurally close to the ground truth. The project covers a broad range of computer vision capabilities, including semantic
This repository provides structured code examples and project templates designed for classroom instruction in machine learning and neural networks. It offers reference implementations of deep learning models for both computer vision and natural language processing tasks, built using PyTorch as the core framework. The codebase is organized as a modular project template with separate directories for data handling, model definitions, and training scripts, promoting reusability and clarity. It includes predefined pipelines for image classification and text processing, along with a command-line in
This project is a computer vision benchmark and image classification dataset used to measure and compare the accuracy of machine learning models. It provides a standardized collection of labeled fashion product images and training data formatted to be compatible with the MNIST dataset structure. The dataset consists of fixed-dimension grayscale images and label-based category mappings, stored in a binary format. It includes pre-split training and testing sets and a static distribution to ensure consistent cross-model benchmarking. The repository supports image classification benchmarking and
CVPR'18 ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans