Latent Diffusion

Features

Latent Diffusion Models - Provides a latent diffusion model framework that performs the denoising process within a compressed latent space.
Diffusion Model Training - Includes workflows for training latent diffusion models on large datasets of diverse visual examples.
Latent Space Encoders - Implements latent space encoders to compress high-resolution images for efficient diffusion processing.
Text-to-Image Generators - Generates high-resolution images from natural language descriptions using latent diffusion pipelines.
Cross-Attention Conditioning - Uses cross-attention conditioning to integrate text and image embeddings into the denoising network.
Variational Autoencoders - Uses variational autoencoders to compress high-resolution images into a lower-dimensional latent space.
Iterative Denoising Pipelines - Implements iterative denoising pipelines to recursively remove Gaussian noise from random signals.
Latent Noise Prediction - Performs iterative noise prediction within a compressed representation to improve computational efficiency.
Classifier-Free Guidance - Implements classifier-free guidance to balance image fidelity and diversity during the denoising process.
Image Inpainting - Provides image inpainting capabilities to restore missing areas by combining known pixels with generated content.
Example-Based Synthesis - Implements example-based synthesis to create new visuals based on retrieved image patterns.
Image Generation - Supports unconditional image generation by sampling directly from the trained model.
Example-Based Synthesis - Creates new images by combining natural language prompts with similar visual examples retrieved from a database.
High-Resolution Synthesis - Synthesizes high-fidelity, high-resolution images from trained generative models.
Image Restoration and Generation - Provides image restoration and inpainting to fill masked areas and restore the visual whole.
Area Filling and Clearing - Fills missing image areas and gaps with generated visual content.
Latent Inpainting Masks - Supports mask-based latent inpainting to reconstruct missing image regions within the compressed space.
Generation - Listed in the “Generation” section of the Awesome Diffusion Models awesome list.

Open-source alternatives to Latent Diffusion

Similar open-source projects, ranked by how many features they share with Latent Diffusion.

lucidrains/dalle2-pytorch
lucidrains/DALLE2-pytorch
11,310View on GitHub
This is a PyTorch implementation of a text-to-image model designed for synthesizing high-fidelity images from natural language descriptions. It utilizes a diffusion image generator to transform latent embeddings into visual data through an iterative denoising process. The system employs a two-stage latent mapping process, using a CLIP-based latent prior to map text embeddings to image embeddings before decoding them into pixels. It features a cascading diffusion decoder that produces high-resolution imagery by passing low-resolution outputs through a sequence of models at increasing scales.
Pythonartificial-intelligencedeep-learningtext-to-image
View on GitHub11,310
stability-ai/stablecascade
Stability-AI/StableCascade
6,548View on GitHub
StableCascade is a generative AI system and latent diffusion framework designed for text-to-image synthesis and image-to-image transformations. It utilizes a multi-stage cascade architecture that encodes and decodes images via a latent space to produce high-fidelity visual imagery. The system includes a cascade diffusion pipeline for controlling image structure through inpainting, outpainting, and super-resolution. It also provides a toolkit for image-to-image generation and the creation of image variations using embeddings. The framework supports model optimization through low-rank adaptati
Jupyter Notebook
View on GitHub6,548
facebookresearch/dit
facebookresearch/DiT
8,642View on GitHub
DiT is a latent diffusion model and transformer-based generative AI framework implemented in PyTorch. It functions as a class-conditional image generator that replaces traditional convolutional backbones with a transformer architecture to synthesize high-fidelity images. The project utilizes patch-based latent processing and latent space compression to operate on low-dimensional image representations. It incorporates class-conditional guidance and adjustable guidance scales to control the visual content of generated images during the sampling process. The framework covers distributed model t
Python
View on GitHub8,642
lucidrains/imagen-pytorch
lucidrains/imagen-pytorch
8,415View on GitHub
This is a PyTorch-based implementation of diffusion models for synthesizing photorealistic images and video. It provides a framework for text-to-image and text-to-video generation, as well as unconditional image synthesis. The system utilizes a cascading diffusion pipeline to produce high-resolution imagery by passing low-resolution outputs through a sequence of super-resolution models. It also includes capabilities for image inpainting, allowing the reconstruction of masked or missing regions of visual media guided by surrounding context and text prompts. The project includes tools for diff
Pythonartificial-intelligencedeep-learningimagination-machine
View on GitHub8,415

See all 30 alternatives to Latent Diffusion

CompVislatent-diffusion

Features

Open-source alternatives to Latent Diffusion

lucidrains/DALLE2-pytorch

Stability-AI/StableCascade

facebookresearch/DiT

lucidrains/imagen-pytorch

Star history

Open-source alternatives to Latent Diffusion

lucidrains/DALLE2-pytorch

Stability-AI/StableCascade

facebookresearch/DiT

lucidrains/imagen-pytorch