Generative Models

This is a framework for training and sampling diffusion models to generate high-fidelity images, video, and 4D assets. It provides a modular environment for managing generative AI training pipelines, including the handling of datasets, noise sampling, and loss weighting to stabilize the creation of synthetic content.

The project features a modular model configuration system that uses YAML-based assembly to define network submodules and conditioners. It also includes a dedicated toolset for AI image watermarking, allowing for the embedding and detection of invisible markers to verify the origin of generated media.

The system supports text-to-image generation and novel-view video synthesis, transforming single input videos into consistent 4D assets. Capabilities cover latent diffusion sampling using customizable numerical solvers, as well as conditioning mechanisms that use external embedders to steer the generative process.

Features

Diffusion Models - Provides a comprehensive framework for training and sampling diffusion models to generate high-fidelity images, video, and 4D assets.

Latent Diffusion Models - Implements high-fidelity content generation by iteratively removing noise within compressed latent spaces.

Input Standardization Conditioners - Standardizes input types like vectors and sequences through a single conditioner to guide model generation.

Text-to-Image Generators - Produces synthetic high-resolution images based on natural language text prompts or existing image inputs.

Media Synthesis from Text - Generates high-fidelity images and video from textual or latent descriptions using diffusion architectures.

Generative Model Training Tools - Provides a modular environment for configuring networks and pipelines to train diffusion models.

External Embedder Conditioning - Steers the generative process using external embedders that process text and class labels.

Training Datasets - Provides systems for processing large-scale image and label datasets for generative model training.

Latent Conditioning Mechanisms - Implements mechanisms for injecting spatial or semantic guidance into the latent space of diffusion models.

Model Training Pipelines - Ships a comprehensive workflow for sourcing datasets, configuring networks, and training diffusion models.

Modular Agent Assembly - Uses a modular system to assemble network submodules and conditioners via YAML definitions.

Diffusion Process Conditioners - Implements a standardized conditioner to process diverse input types into vectors for steering the generative process.

Noise Level Sampling Strategies - Provides a system for managing noise level sampling and loss weighting to stabilize generative training.

Noise-Aware Loss Weighting - Provides modular loss weighting that decouples noise level sampling from loss calculation to stabilize training.

Training Pipelines - Offers a modular environment for managing datasets, noise sampling, and loss weighting in training pipelines.

Generative Training Data Pipelines - Ships a map-style data pipeline to efficiently feed large-scale image and label pairs into the training loop.

Numerical Diffusion Solvers - Provides customizable mathematical solvers to transform noise into images via discretized time-steps.

Model Architecture Configurations - Implements a YAML-based assembly process to define network submodules and conditioners without manual boilerplate code.

Diffusion Sampling Methods - Provides customizable numerical solvers and discretization methods to generate final outputs from the diffusion model.

Novel View Synthesis Engines - Transforms single input videos into consistent 4D assets by generating novel camera views and frames.

Model Component Assembly - Uses YAML-based assembly to define and instantiate model submodules without modifying the core source code.

AI Content Watermarking - Implements a dedicated system for embedding and detecting invisible markers in AI-generated imagery.

AI Watermark Detection Tools - Includes a dedicated utility for detecting invisible markers embedded in images to verify model origin.

Invisible Watermarks - Integrates subtle, invisible markers into generated pixel data to verify the origin of synthetic content.

Synthetic Media Provenance Verification - Includes a toolset to identify embedded markers in generated images to verify the model of origin.

Diffusion Model Research - High-resolution image synthesis using latent diffusion.

Generative Media Tools - Generative models for image and video.

Generation - Listed in the “Generation” section of the Awesome Diffusion Models awesome list.

Stability-AIgenerative-models

Features

Star history