30 open-source projects similar to aigc-apps/sd-webui-easyphoto, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Sd Webui EasyPhoto alternative.
OutfitAnyone is a diffusion-based virtual try-on system and AI person-garment integration tool. It functions as an image-to-image clothing transfer model designed to visualize how specific clothing items look on any person regardless of their pose. The system adapts garment textures and shapes to a person's body and pose to produce photorealistic results. It specifically focuses on adjusting clothing deformation based on body shape to maintain high fidelity and detail consistency during the fitting process. The project covers AI fashion visualization and virtual garment fitting, providing ca
This project is a generative adversarial network designed for image animation and motion transfer. It functions as a computer vision framework that synthesizes video sequences by applying motion patterns extracted from a driving video onto a static source image. The model distinguishes itself by using a keypoint-based representation to decouple object appearance from temporal movement. By tracking structural deformations through learned latent coordinates, it performs motion retargeting and synthetic media production without requiring manual annotations or object-specific training data. The
DeepFaceLab is a deep learning software suite designed for face swapping and the creation of deepfake videos. It functions as a neural network image compositor that replaces human faces or entire heads in video files to produce synthetic media. The tool provides capabilities for digital facial manipulation, including the ability to modify the perceived age of people in video sequences. It uses automated pattern recognition to blend source faces onto target frames to create seamless visual composites. The system covers a broad technical surface including landmark-based face alignment, autoenc
IDM-VTON is an AI virtual try-on framework and fashion synthesis tool designed to generate realistic images of people wearing specific garments. It operates as a diffusion-based image generator that blends garment textures with human poses to create synthetic fashion imagery. The system implements virtual fitting room capabilities through a generative model that combines person and clothing inputs. It includes a web-based interface to run interactive visual demonstrations and synthesize try-on images in real-time. The framework covers the broader domain of AI fashion visualization, enabling
LivePortrait is a deep learning framework for portrait animation that transfers facial expressions from a driving video to a static image. It functions as an AI motion retargeting tool, mapping movements between different identities while preserving the unique features of the source portrait. The system includes specialized capabilities for cross-species portrait animation, adapting human-centric models to non-human subjects and animals. It also features a motion template generator that converts driving videos into portable files to accelerate inference and protect the identity of the origina
Magic Animate is a diffusion model video generator designed for human image animation. It transforms a static human photo into a temporally consistent video by mapping movements from a reference motion clip, acting as a tool to create realistic animations from a single image. The system ensures visual stability and minimizes flicker through temporal attention injection and motion-controlled noise scheduling. To accelerate the generation of high-resolution video, it includes a distributed GPU inference engine that splits model workloads across multiple graphics cards. The project covers a com
AniPortrait is an AI video synthesis pipeline designed to generate photorealistic speaking portraits and facial animations. It functions as a talking head generator and audio-driven animator that synchronizes lip movements, expressions, and head poses to speech or reference video sources. The system includes a facial expression transfer tool for reenacting movements from a source video onto a static reference image. It utilizes a latent diffusion model with reference-based image conditioning to maintain visual identity and consistency across generated frames. The pipeline covers audio-to-exp
Facechain is a generative AI toolchain and portrait generator designed to create personalized synthetic identities and consistent digital portraits. It provides a pipeline for training and refining diffusion models to produce subject-driven image synthesis from reference photos. The project focuses on digital twin generation, enabling the creation of a personalized model from a single image to maintain identity consistency across various poses and artistic styles. It utilizes identity fusion and similarity sorting to balance facial accuracy with stylized visual effects. The toolkit covers a
Hallo is an audio-driven talking head generator and portrait animation framework. It synchronizes a static portrait image with an audio file to produce realistic talking head videos by mapping audio spectral features to facial expressions and lip movements. The system utilizes a diffusion video synthesis model that employs iterative denoising and latent representations to generate temporally consistent video frames. It incorporates identity-preserving feature extraction and latent space motion modeling to maintain visual consistency and control facial poses. The toolkit provides capabilities
EMO is an AI portrait animator and audio-to-video diffusion model designed to generate expressive talking head videos. It transforms a single static portrait image and an audio track into a synchronized video of a person speaking. The system focuses on digital human synthesis, producing high-fidelity facial movements and emotional cues. It synchronizes lip movements and facial gestures to match spoken voice recordings to create realistic portrait animations. The framework utilizes a diffusion process and a cross-modal alignment mechanism to ensure timing between audio signals and visual land
ComfyUI is a modular generative AI workflow orchestrator and node-based GUI for designing and executing complex diffusion model pipelines. It functions as both a visual interface for building generative logic graphs and a programmable backend API that exposes diffusion model operations for external integration. The system distinguishes itself through a graph-based execution model that supports differential workflow execution, re-running only modified nodes to reduce computation. It features dynamic model offloading to manage memory between system RAM and GPU VRAM and utilizes metadata-embedde
Rope is a graphical user interface for swapping faces in images and videos. It functions as a deepfake video editor and image face swapper that utilizes pre-trained deep learning models to replace identities in visual media. The tool includes specialized capabilities for AI video post-production, such as occlusion-aware blending to handle foreground objects and mouth-parsing refinement to align facial expressions. It also serves as an AI face restoration tool, using saliency-based restoration to recover clarity and sharpness in swapped facial regions. The software provides a pipeline for vis
This project is a Dreambooth implementation designed to personalize Stable Diffusion models. It serves as an AI image personalization tool and model tuner that enables the creation of unique subject identifiers to generate consistent, personalized images. The system focuses on subject-driven image synthesis by fine-tuning pre-trained diffusion models on small, custom datasets. This allows the model to recognize specific people, objects, or artistic styles and place those learned subjects into diverse contexts via text-to-image conditioning. The implementation includes a diffusion model optim
OOTDiffusion is an AI virtual try-on system designed for controllable image synthesis. It generates images of people wearing specific clothing items by superimposing garments onto human figures for both half-body and full-body compositions. The project facilitates digital fashion prototyping and virtual clothing fitting by creating garment-to-person overlays. It aims to maintain the original identity of the wearer and the specific details of the clothing during the synthesis process. The system utilizes a latent diffusion model and conditioning-based image generation to control the output. I
PhotoMaker is a diffusion-based identity generator designed for person-specific image synthesis. It creates high-fidelity photos and avatars of specific individuals using stacked embeddings, which allows for the generation of consistent human identities without the need for custom model training or fine-tuning. The system utilizes zero-shot identity synthesis and identity adapters to maintain recognizable facial features across various visual contexts. It supports artistic style transfer by combining identity information with specialized model weights and integrates external control framework
ComfyUIIPAdapterplus is a node-based extension for ComfyUI that implements IPAdapter models to guide image generation using reference images. It functions as an image prompting tool and a Stable Diffusion image adapter, allowing reference files to serve as visual prompts for controlling style, composition, and subject identity. The project provides specialized capabilities for maintaining facial identity and high-fidelity features across generated portraits. It enables the transfer of visual characteristics and artistic styles from reference images, as well as the extraction of spatial layo
Dot is a deep learning face swap tool used to replace faces in live video streams, recorded media, and static images. It functions as a deepfake media processor and real-time video manipulator that applies facial transformations through neural network mapping. The system includes a virtual camera video injector that routes processed output into a system-level virtual device to simulate a physical hardware webcam. This allows generated video to be used within third-party video conferencing software. The tool supports real-time source switching via keyboard inputs to toggle between different s
sd-scripts is a suite of utilities designed for fine-tuning generative models, preprocessing datasets, and converting model weights. It provides a collection of scripts for executing Stable Diffusion training through methods such as DreamBooth, textual inversion, and full fine-tuning, alongside a framework for creating and managing Low-Rank Adaptation weights. The project features specialized capabilities for model weight conversion between different architectures and precision formats. It includes tools for merging adaptation weights into base models, extracting weights from trained models,
InstantID is a diffusion-based identity preservation framework designed for zero-shot image generation. It allows for the synthesis of images featuring a specific person's facial identity using a single reference photo without requiring additional model training or fine-tuning. The project distinguishes itself through the use of consistency model distillation to accelerate inference, reducing the number of steps needed to produce high-quality results. It combines identity-preserving feature extraction with multi-modal prompt integration to merge visual embeddings from a reference image with t
Stable-Video-Infinity is a video synthesis tool based on Stable Video Diffusion designed for creating long-form animations and consistent visual content. It serves as an AI video extension framework and a conditioned animation synthesizer capable of producing video sequences of arbitrary length. The project enables infinite video extension by bypassing standard model duration constraints through an error-recycling loop. It supports conditioned animation synthesis using external inputs such as image streams, audio files, or skeletal motion data to guide the generation process. The framework i
EasyVtuber is 2D avatar animation software that transforms a single static image into a real-time animated character. It functions as a face tracking animation tool and live streaming avatar driver, mapping facial movements from webcams or iOS devices to drive virtual expressions and head motion. The project distinguishes itself through a neural animation pipeline that includes AI video upscaling and frame interpolation to increase visual smoothness and resolution. It utilizes a transparent video streaming system via Spout2, allowing rendered frames with alpha channels to be sent directly to
HunyuanVideo-1.5 is a video generation foundation model and text-to-video diffusion framework. It utilizes a latent video diffusion model and a spatio-temporal transformer architecture to generate high-definition video sequences from text descriptions and images. The project enables cinematic camera control for directing pans and tilts and provides image-to-video animation capabilities. It supports visual style adaptation through low-rank adaptation tuning and uses a language model for prompt refinement to improve visual alignment. The model covers high-resolution video upscaling via a super
TurboDiffusion is a video diffusion inference engine and generator designed to create high-resolution videos from text prompts and images. It provides a runtime environment for executing optimized diffusion model checkpoints with a focus on reducing latency and GPU memory usage. The project features a specialized training framework for aligning sparse-linear attention models with pretrained full-attention models. This system includes capabilities for sparse attention parameter merging and sparse-linear model alignment to reduce computational costs during inference while maintaining output qua
Unofficial implementation of PhotoMaker for ComfyUI
LivePortrait is a computer vision framework designed for portrait animation and generative video synthesis. It functions as a deep learning system that transfers facial expressions and head movements from a driving video source onto a static image or an existing portrait video, effectively decoupling the subject's identity from the dynamic motion patterns. The framework utilizes keypoint-based motion retargeting and implicit 3D latent representations to map movements across different subjects, including both human and animal portraits. By employing canonical motion normalization and feature-s
PaddleGAN is a generative AI framework and deep learning computer vision library built on the PaddlePaddle framework. It serves as a toolkit for image and video synthesis, providing a collection of generative adversarial network implementations for creating synthetic visual content. The library focuses on advanced synthesis capabilities, including the generation of talking heads through lip motion synchronization and the creation of synthetic videos via motion transfer from driving sequences. It provides tools for domain-to-domain translation, allowing for image style transfer and the transfo
This project provides methodologies and guides for structured prompt engineering, generative workflows, and specialized image generation strategies. It serves as a framework for optimizing inputs to large language models across coding, writing, and analysis tasks, as well as a library of techniques for controlling diffusion models. The project distinguishes itself through an AI-driven software design framework that converts business requirements into technical architectures and code using domain-driven prompting. It also implements generative AI workflow patterns that use sequential prompt pi
AnimateDiff is a latent diffusion video generator and text-to-video diffusion framework. It converts existing text-to-image diffusion models into animation generators by applying specialized motion modules, allowing for the creation of video sequences without modifying the original base model. The project provides an image-to-video animation framework that uses sparse RGB images, sketches, or structural keyframe constraints to guide generation. It further distinguishes itself with a motion adapter system that injects cinematic camera movements, such as zooming, panning, and tilting, into anim
imaginAIry is a system for generating and refining images and videos using diffusion models. It operates as a web-based server that triggers generation requests through standard API calls, allowing for the creation of visuals and video sequences from text prompts or existing files. The project provides a suite for AI image editing and upscaling, enabling the modification of visuals through natural language instructions and super-resolution tools to increase detail and image size. The system includes capabilities for structural image control using depth maps, edge maps, and body poses to main