# compvis/taming-transformers

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/compvis-taming-transformers).**

6,510 stars · 1,222 forks · Jupyter Notebook · MIT

## Links

- GitHub: https://github.com/CompVis/taming-transformers
- Homepage: https://arxiv.org/abs/2012.09841
- awesome-repositories: https://awesome-repositories.com/repository/compvis-taming-transformers.md

## Description

Taming Transformers is a generative system for high-resolution image synthesis that combines a vector-quantized GAN image encoder with an autoregressive transformer. It utilizes a discrete latent space to represent images as codebook tokens, enabling the production of high-fidelity visuals through a hybrid architecture.

The project provides specialized capabilities for layout-based scene synthesis, allowing for the creation of complex images by placing objects according to defined bounding box coordinates. It also includes tools for image inpainting to fill missing sections of an image by analyzing surrounding pixels and learned structural patterns.

The framework covers image compression analysis through latent reconstruction and supports model optimization via training on custom image datasets to refine token quality.

## Tags

### Artificial Intelligence & ML

- [High-Resolution Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/image-super-resolution-models/high-resolution-synthesis.md) — Synthesizes high-fidelity, high-resolution visuals using a combination of discrete latent spaces and learned visual patterns.
- [Autoregressive Visual Token Predictors](https://awesome-repositories.com/f/artificial-intelligence-ml/autoregressive-models/visual-token-generation/autoregressive-visual-token-predictors.md) — Implements an autoregressive transformer that sequentially predicts discrete visual tokens to generate high-resolution images.
- [Discretized Visual Representations](https://awesome-repositories.com/f/artificial-intelligence-ml/discretized-visual-representations.md) — Employs a discrete codebook to quantize continuous feature vectors into a grid of learnable tokens.
- [Image Synthesis Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-capabilities/image-synthesis-models.md) — Combines an autoregressive transformer with a convolutional generator to synthesize high-resolution visual content.
- [Latent-to-Pixel Decoding](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/latent-space-generative-models/latent-space-projections/image-to-latent-projections/latent-to-pixel-decoding.md) — Uses a convolutional decoder to map discrete latent tokens back into pixel space for high-fidelity image reconstruction.
- [Latent Space Encoders](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/latent-space-generative-models/latent-space-projections/latent-space-encoders.md) — Utilizes a VQ-GAN encoder to compress high-resolution images into a discrete grid of codebook tokens.
- [Layout-Guided Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/bounding-box-regression/bounding-box-representations/bounding-box-generation/layout-guided-synthesis.md) — Allows for precise control of object placement in generated scenes via bounding box coordinates.
- [Image Inpainting](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-architectures/image-inpainting.md) — Provides capabilities to fill missing or empty image sections by analyzing surrounding pixels and learned patterns.
- [Hybrid Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/model-architectures/hybrid-architectures.md) — Combines convolutional networks for local texture and transformers for global composition in a hybrid architecture.

### Part of an Awesome List

- [Layout-Based Scene Synthesis](https://awesome-repositories.com/f/awesome-lists/ai/image-generation-and-synthesis/layout-based-scene-synthesis.md) — Produces complex images by placing specific objects according to defined bounding box coordinates. ([source](https://cdn.jsdelivr.net/gh/compvis/taming-transformers@master/README.md))

### Graphics & Multimedia

- [Grid-Based Image Layouts](https://awesome-repositories.com/f/graphics-multimedia/grid-based-image-layouts.md) — Generates complex scenes by placing specific objects according to defined bounding box coordinates.