30 open-source projects similar to openai/guided-diffusion, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Guided Diffusion alternative.
This project is a deep learning framework for AI image super-resolution and facial synthesis. It provides a diffusion model image upscaler and a generative facial image synthesizer capable of transforming low-resolution images into high-resolution outputs using pretrained model weights. The system utilizes iterative diffusion refinement and low-resolution guided sampling to restore fine details and sharpness. It supports both unconditional image generation, where images are created from scratch, and guided resolution enhancement for high-fidelity facial reconstruction. The repository include
This is a PyTorch implementation of a text-to-image model designed for synthesizing high-fidelity images from natural language descriptions. It utilizes a diffusion image generator to transform latent embeddings into visual data through an iterative denoising process. The system employs a two-stage latent mapping process, using a CLIP-based latent prior to map text embeddings to image embeddings before decoding them into pixels. It features a cascading diffusion decoder that produces high-resolution imagery by passing low-resolution outputs through a sequence of models at increasing scales.
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
StableCascade is a generative AI system and latent diffusion framework designed for text-to-image synthesis and image-to-image transformations. It utilizes a multi-stage cascade architecture that encodes and decodes images via a latent space to produce high-fidelity visual imagery. The system includes a cascade diffusion pipeline for controlling image structure through inpainting, outpainting, and super-resolution. It also provides a toolkit for image-to-image generation and the creation of image variations using embeddings. The framework supports model optimization through low-rank adaptati
This project is a diffusion model training framework and image synthesis pipeline. It provides the tools necessary to train generative models to learn image data distributions through an iterative denoising process. The framework includes a generative model evaluation tool consisting of automated scripts used to measure the quality and accuracy of produced samples. The system covers model training pipelines and performance evaluation for generative diffusion models.
This project is a comprehensive instructional resource and course for building neural networks using PyTorch. It covers the fundamental building blocks of deep learning, including tensor manipulation, automatic differentiation, and the construction of modular neural network components. The repository serves as a technical guide for several specialized domains. It provides implementation details for computer vision tasks such as image classification, object detection, and semantic segmentation, as well as natural language processing workflows involving transformers, recurrent networks, and gen
Latent Diffusion is a framework for high-resolution image synthesis that performs the denoising process within a compressed latent space. It uses variational autoencoders to encode images into a lower-dimensional representation, reducing the computational cost of noise prediction compared to operating on raw pixels. The project enables text-to-image generation by integrating natural language descriptions through cross-attention conditioning. It also supports image inpainting and restoration, filling masked or missing image areas with generated content, and example-based synthesis using retrie
DiT is a latent diffusion model and transformer-based generative AI framework implemented in PyTorch. It functions as a class-conditional image generator that replaces traditional convolutional backbones with a transformer architecture to synthesize high-fidelity images. The project utilizes patch-based latent processing and latent space compression to operate on low-dimensional image representations. It incorporates class-conditional guidance and adjustable guidance scales to control the visual content of generated images during the sampling process. The framework covers distributed model t
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
This is a PyTorch-based implementation of diffusion models for synthesizing photorealistic images and video. It provides a framework for text-to-image and text-to-video generation, as well as unconditional image synthesis. The system utilizes a cascading diffusion pipeline to produce high-resolution imagery by passing low-resolution outputs through a sequence of super-resolution models. It also includes capabilities for image inpainting, allowing the reconstruction of masked or missing regions of visual media guided by surrounding context and text prompts. The project includes tools for diff
IC-Light is a diffusion-based image editor and generative tool designed for controlling the illumination of foreground subjects. It functions as an image relighting system that uses latent diffusion models to modify lighting effects on isolated subjects. The project provides two primary methods for lighting control: text-based relighting, which uses descriptive prompts and lighting directions, and background-based relighting, which conditions the foreground lighting to match the visual properties of a provided background image. Beyond illumination, the system includes a surface normal estima
Facechain is a generative AI toolchain and portrait generator designed to create personalized synthetic identities and consistent digital portraits. It provides a pipeline for training and refining diffusion models to produce subject-driven image synthesis from reference photos. The project focuses on digital twin generation, enabling the creation of a personalized model from a single image to maintain identity consistency across various poses and artistic styles. It utilizes identity fusion and similarity sorting to balance facial accuracy with stylized visual effects. The toolkit covers a
HunyuanImage-3.0 is a diffusion-based text-to-image tool and large language model image generator designed for creating high-fidelity, photorealistic visual content. It functions as an image-to-image synthesis framework and a multimodal visual reasoning engine. The system includes a prompt refinement system that automatically rewrites sparse user inputs into detailed descriptions to improve output precision. It also employs a reasoning chain architecture to analyze image inputs and prompts, decomposing complex editing tasks into structured sub-tasks. The project covers a range of synthesis c
Lama Cleaner is an AI-powered image editing application focused on inpainting, object removal, and generative filling. It provides a suite of tools for erasing unwanted elements from photos and filling the resulting gaps using generative artificial intelligence. The project includes specialized capabilities for image outpainting to extend borders, background removal through object segmentation, and face restoration to fix visual defects. It also features an image upscaler to increase resolution and clarity via super-resolution AI, as well as a Stable Diffusion-based editor for replacing speci
This is a collection of Jupyter notebooks that serve as educational guides for training, fine-tuning, and deploying machine learning models within the Hugging Face ecosystem. The notebooks cover the full lifecycle of model development, from loading and configuring pre-trained transformers to packaging trained models for real-time inference via scalable endpoints. The notebooks demonstrate a range of capabilities including diffusion model training and fine-tuning for image generation and editing, transformer model adaptation for natural language processing tasks, and parameter-efficient fine-t
AnyText is a visual text synthesis framework and latent diffusion text model designed to generate and edit text within images. It functions as a multilingual diffusion text generator that blends glyph and stroke data into latent image features to ensure precise character placement and rendering. The system enables the modification or replacement of existing characters and words inside images while preserving the surrounding visual context. It supports the creation of stylized text effects through the use of a weight-merging pipeline that combines specialized model weights and adaptation layer
This project is an educational course and collection of training materials focused on generative diffusion models. It provides a curriculum and practical guides for training, fine-tuning, and deploying models capable of synthesizing images, audio, and video. The material covers specific implementation strategies including noise-based synthesis, iterative refinement, and latent space compression. It provides instruction on guiding generative outputs through conditional synthesis and prompt adherence optimization, as well as techniques for image inpainting and text-based editing. The project i
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
This project is a cloud-based AI deployment system and latent diffusion model trainer. It provides a framework for launching image generation interfaces and training pipelines on remote GPU infrastructure, specifically serving as a text-to-image model fine-tuner. The system features a specialized training interface for fine-tuning Stable Diffusion models on custom image datasets. It allows for the creation of personalized visual outputs by training models on specific subjects or artistic styles using a small set of reference images. The software covers generative AI deployment, custom style
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
RoomGPT is a generative AI image processor designed to transform photographs of existing rooms into redesigned interior spaces. It functions as an AI interior design generator and room visualizer that applies new styles and layouts to uploaded images using machine learning models. The system utilizes diffusion-based image transformation and prompt-template engineering to modify visual environments and generate home decor visualizations. These capabilities allow for the creation of diverse interior design variations based on specific style prompts. The infrastructure includes client-side imag
SUPIR is an AI image upscaler and restoration system designed to remove artifacts and restore quality to real-world photographs. It functions as a diffusion-based image enhancer and restoration tool that uses large-scale model scaling to produce high-resolution results with photorealistic details. The system balances visual aesthetics with input fidelity, allowing for a trade-off between strict adherence to the original image and the overall visual appeal of the output. It leverages large-scale model inference to improve image clarity and maintain realistic details during the upscaling proces
lora-scripts is a fine-tuning toolkit designed for adapting base diffusion models to specific styles or subjects. It provides a specialized set of scripts and tools for executing low-rank adaptation and Dreambooth training jobs. The project features a web-based graphical interface that manages the training workflow, allowing users to configure and execute jobs without manual script editing. This interface maps user inputs to hyperparameters and provides a real-time dashboard for monitoring training metrics and loss curves to track model convergence. The system includes a dataset tagging mana
This project is a research-oriented PyTorch framework designed for the implementation and training of generative video diffusion models. It provides a modular toolkit that extends standard image-based diffusion techniques into three dimensions, enabling the synthesis of coherent video sequences through iterative denoising processes. The framework distinguishes itself by utilizing factored space-time attention, which decomposes high-dimensional video data into separate spatial and temporal layers to maintain motion consistency while managing computational complexity. It supports multi-modal tr
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of images from text prompts and the transformation of existing visual inputs based on semantic instructions. The architecture utilizes a modular execution environment that decouples model loading, scheduler logic, and inference components to support diverse hardware configurations. I
PyMC is a Bayesian probabilistic programming framework used for building probabilistic models and performing Bayesian inference. It provides a probabilistic graphical model library for specifying random variables, priors, and likelihood functions, supported by an MCMC sampling engine and variational inference tools to estimate posterior distributions. The framework features a GPU-accelerated inference backend that compiles models into machine code to increase execution speed. It utilizes a backend-agnostic tensor execution model and just-in-time graph compilation to optimize the computation o
Accelerate is a PyTorch distributed training library that abstracts the boilerplate required to run models across multiple GPUs, TPUs, and CPUs. It functions as a deep learning model scaler and distributed hardware orchestrator, allowing the same training script to run on different hardware backends without modifying the core logic. The project provides a distributed training command line interface for configuring compute environments and launching jobs across single or multi-node clusters. It includes a mixed precision training framework to implement FP16 and BF16 precision, reducing memory
Videocrafter is a latent diffusion model designed for AI video synthesis. It functions as both a text-to-video and image-to-video generation system, synthesizing high-quality video sequences from descriptive text prompts or static image inputs. The model utilizes a diffusion-based neural network to transform inputs into animated content, ensuring visual consistency and temporal coherence throughout the generated sequences. This allows for the creation of custom video clips and the animation of static images into fluid motion.
This project is an educational collection of computational notebooks and tutorials focused on Bayesian machine learning and probabilistic programming. It provides a framework for building predictive models that represent uncertainty by defining probability distributions over parameters rather than relying on single point estimates. The repository serves as a library of statistical methods for estimating parameter distributions, performing regression, and quantifying confidence levels in predictive systems. It covers a range of techniques including Gaussian process regression, Markov chain Mon
Sygil-webui is a web interface for Stable Diffusion latent diffusion models, providing a creative suite for text-to-image and text-to-video synthesis. It functions as an image generation tool and a latent diffusion image editor, allowing users to create visuals and video sequences from textual descriptions. The project includes a dedicated model training interface for creating custom textual inversion embeddings, which introduces specific new concepts or styles into the diffusion models. It also features specialized tools for generative image editing, including mask-based inpainting, image-to