# foundationvision/var

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/foundationvision-var).**

8,702 stars · 570 forks · Jupyter Notebook · MIT

## Links

- GitHub: https://github.com/FoundationVision/VAR
- awesome-repositories: https://awesome-repositories.com/repository/foundationvision-var.md

## Topics

`auto-regressive-model` `autoregressive-models` `diffusion-models` `generative-ai` `generative-model` `gpt` `gpt-2` `image-generation` `large-language-models` `neurips` `transformers` `vision-transformer`

## Description

VAR is a visual autoregressive model and image generation framework that applies large language model scaling laws to visual data. It functions as an image generator that uses a coarse-to-fine next-scale prediction approach rather than traditional raster-scan tokenization.

The system utilizes scale-based tokenization to represent images as a hierarchy of discrete tokens. It generates high-resolution content by iteratively predicting the next resolution level, refining coarse predictions into fine-grained details.

The project covers a broad range of capabilities including autoregressive image generation, visual scaling laws research, and visual content sampling. It incorporates classifier-free guidance to balance sample quality and diversity during the generation process.

The training infrastructure includes automated state management and checkpoint-based resumption to maintain progress during large-scale training runs.

## Tags

### Artificial Intelligence & ML

- [Coarse-to-Fine Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-image-models/autoregressive-image-generation/coarse-to-fine-generation.md) — Generates images by iteratively increasing resolution through a sequence of increasingly detailed scale predictions.
- [Autoregressive Visual Token Predictors](https://awesome-repositories.com/f/artificial-intelligence-ml/autoregressive-models/visual-token-generation/autoregressive-visual-token-predictors.md) — Implements a generative model that predicts images across multiple scales using visual tokens.
- [Scale-Based Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/autoregressive-models/visual-token-generation/autoregressive-visual-token-predictors/scale-based-tokenization.md) — Represents images as a hierarchy of discrete tokens corresponding to different resolution levels for autoregressive processing.
- [Classifier-Free Guidance](https://awesome-repositories.com/f/artificial-intelligence-ml/classifier-free-guidance.md) — Uses classifier-free guidance to balance image sample quality and diversity during the generation process.
- [Generative Image Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-image-models.md) — Provides a comprehensive framework for training and sampling image generation models using a coarse-to-fine approach.
- [Autoregressive Image Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-image-models/autoregressive-image-generation.md) — Trains and uses models that predict image tokens in a sequence to create new visual content.
- [LLM-Based Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-image-models/autoregressive-image-generation/llm-based-generators.md) — An image generation architecture that applies large language model scaling laws and autoregressive sampling to visual data.
- [Next-Scale Prediction](https://awesome-repositories.com/f/artificial-intelligence-ml/text-generation-strategies/token-prediction/next-scale-prediction.md) — Uses a coarse-to-fine resolution approach that predicts the next scale instead of standard raster-scan token prediction. ([source](https://github.com/foundationvision/var#readme))
- [Generative Model Training Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-model-training-tools.md) — Provides a training system for autoregressive image generation models with automated state management. ([source](https://github.com/foundationvision/var#readme))
- [Large Scale Training](https://awesome-repositories.com/f/artificial-intelligence-ml/large-scale-training.md) — Manages the training process for generative models on massive image datasets with checkpointing and recovery.
- [Visual Scaling Laws](https://awesome-repositories.com/f/artificial-intelligence-ml/model-predictions/scaling-law-predictors/visual-scaling-laws.md) — Studies how increasing model size and data affects the quality of generated images using a scale-based approach.

### Graphics & Multimedia

- [Scale-Based Image Samplers](https://awesome-repositories.com/f/graphics-multimedia/scale-based-image-samplers.md) — Produces high-resolution images by iteratively refining coarse predictions into fine-grained details.
