VAR

Coarse-to-Fine Generation - Generates images by iteratively increasing resolution through a sequence of increasingly detailed scale predictions.

Autoregressive Visual Token Predictors - Implements a generative model that predicts images across multiple scales using visual tokens.

Scale-Based Tokenization - Represents images as a hierarchy of discrete tokens corresponding to different resolution levels for autoregressive processing.

Classifier-Free Guidance - Uses classifier-free guidance to balance image sample quality and diversity during the generation process.

Generative Image Models - Provides a comprehensive framework for training and sampling image generation models using a coarse-to-fine approach.

Autoregressive Image Generation - Trains and uses models that predict image tokens in a sequence to create new visual content.

LLM-Based Generators - An image generation architecture that applies large language model scaling laws and autoregressive sampling to visual data.

Next-Scale Prediction - Uses a coarse-to-fine resolution approach that predicts the next scale instead of standard raster-scan token prediction.

Generative Model Training Tools - Provides a training system for autoregressive image generation models with automated state management.

Large Scale Training - Manages the training process for generative models on massive image datasets with checkpointing and recovery.

Visual Scaling Laws - Studies how increasing model size and data affects the quality of generated images using a scale-based approach.

Scale-Based Image Samplers - Produces high-resolution images by iteratively refining coarse predictions into fine-grained details.

FoundationVisionVAR

Features

Star history