# nvlabs/sana

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nvlabs-sana).**

8,310 stars · 647 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/NVlabs/Sana
- Homepage: https://nvlabs.github.io/Sana/docs/
- awesome-repositories: https://awesome-repositories.com/repository/nvlabs-sana.md

## Description

Sana is a framework for high-resolution image and video synthesis based on a linear diffusion transformer. It provides a toolkit for the training, fine-tuning, and execution of text-to-image and text-to-video models, as well as a video generative world model capable of simulating physical environments with precise spatial control.

The project is distinguished by its use of linear complexity layers to handle high resolutions and its support for long-form, minute-length video generation in real time. It implements a two-stage inference paradigm that separates structural generation from visual texture refinement and utilizes block-based caching to maintain temporal consistency across extended sequences.

The framework covers a broad range of capabilities, including supervised fine-tuning, reinforcement learning via reward model integration, and image model personalization. It supports advanced video controls such as camera trajectory adherence, image-to-video synthesis, and streaming video editing.

Performance is managed through model weight quantization, VRAM reduction techniques, and sharded data parallelism for large-scale training.

## Tags

### Artificial Intelligence & ML

- [Text-to-Image Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-pipelines/text-to-image-generators.md) — Synthesizes high-resolution images from text prompts using a linear diffusion transformer to balance quality and efficiency. ([source](https://github.com/nvlabs/sana#readme))
- [Diffusion Transformers](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-architectures/diffusion-transformers.md) — Utilizes a linear diffusion transformer with linear complexity layers to handle high-resolution image and video synthesis.
- [Chunk-Causal Training](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-masking/causal/chunk-causal-training.md) — Trains video models by processing sequences in overlapping segments to maintain temporal consistency across long durations.
- [Constant-Memory Video Caching](https://awesome-repositories.com/f/artificial-intelligence-ml/constant-memory-video-caching.md) — Employs a fixed-size recurrent state to generate arbitrarily long video sequences without increasing memory usage. ([source](https://nvlabs.github.io/Sana/docs/longsana/))
- [Custom Diffusion Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-diffusion-model-training.md) — Implements training workflows for high-resolution image synthesis using a linear diffusion transformer. ([source](https://nvlabs.github.io/Sana/docs/sol_rl/))
- [Latent Space Generative Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/latent-space-generative-models.md) — Operates on compressed latent representations using a variational autoencoder to reduce compute overhead during synthesis.
- [Latent-to-Pixel Decoding](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/latent-space-generative-models/latent-space-projections/image-to-latent-projections/latent-to-pixel-decoding.md) — Converts compressed latent representations back into viewable pixel-based images using a variational autoencoder. ([source](https://nvlabs.github.io/Sana/docs/ComfyUI/SANA-1.5_FlowEuler.json))
- [Prompt Encoding Tensors](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-pipelines/text-to-image-generators/prompt-encoding-tensors.md) — Transforms natural language descriptions into conditioning tensors that guide the image generation process. ([source](https://nvlabs.github.io/Sana/docs/ComfyUI/SANA-1.5_FlowEuler.json))
- [Text-to-Video Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-pipelines/text-to-video-generators.md) — Synthesizes high-quality video content from natural language descriptions across various artistic styles. ([source](https://nvlabs.github.io/Sana/Video/))
- [Text-to-Image Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/text-to-image-synthesis.md) — Synthesizes high-resolution images based on text prompts using a command line interface, SDK, or API. ([source](https://nvlabs.github.io/Sana/docs/sglang/))
- [Generative Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-model-fine-tuning.md) — Provides toolkits for adjusting pre-trained generative models using custom datasets through full parameter updates or low-rank adaptation. ([source](https://nvlabs.github.io/Sana/docs/sana_cosmos_rl/))
- [Long-Video Training Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations/long-video-training-optimizations.md) — Executes multi-stage training using ODE initialization and self-forcing to learn long-term temporal dependencies for minute-long videos. ([source](https://nvlabs.github.io/Sana/docs/longsana/))
- [Two-Stage Texture Refinement](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/computer-vision-segmentation-models/object-detection-models/multi-stage-inference-pipelines/two-stage-texture-refinement.md) — Implements a two-stage inference paradigm that separates structural generation from visual texture refinement.
- [Low-Rank Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/low-rank-adaptation.md) — Modifies model behavior for specific styles or subjects using low-rank adaptation matrices instead of full weights.
- [Model Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines.md) — Provides a toolkit for supervised fine-tuning, LoRA adaptation, and reinforcement learning of diffusion models.
- [World](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-pipelines/world.md) — Implements chunk-causal training for the first stage of the video world model using distributed parallelism. ([source](https://nvlabs.github.io/Sana/docs/sana_wm/))
- [LoRA Adapter Loaders](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-management/lora-adapter-loaders.md) — Integrates low-rank adaptation weights during inference to modify the style or content of generated images. ([source](https://nvlabs.github.io/Sana/docs/sglang/))
- [Video Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation.md) — Synthesizes high-definition, minute-length video content from text or image prompts. ([source](https://github.com/nvlabs/sana#readme))
- [Image-to-Video Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation/image-to-video-generation.md) — Produces high-quality video sequences using a reference image and text prompt for guidance. ([source](https://nvlabs.github.io/Sana/Video/))
- [Long-form Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation/video-clip-generators/long-form-generation.md) — Produces high-resolution, minute-long video sequences using memory-efficient block linear attention. ([source](https://nvlabs.github.io/Sana/docs/))
- [Reward Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling.md) — Integrates external scoring metrics and reward models to guide the training process and improve generative output. ([source](https://nvlabs.github.io/Sana/docs/sol_rl/))
- [Diffusion Reinforcement Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/diffusion-reinforcement-learning.md) — Optimizes generative output quality by combining supervised fine-tuning with reinforcement learning and asynchronous reward services. ([source](https://nvlabs.github.io/Sana/docs/))
- [Distributed Training Sharding](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-sharding.md) — Uses sharded data parallelism to distribute model parameters and gradients across multiple processors for massive model training. ([source](https://nvlabs.github.io/Sana/docs/sana/))
- [Personalized Image Synthesis](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-image-models/personalized-image-synthesis.md) — Implements adaptation training to specialize text-to-image models on specific subjects using a small set of reference images. ([source](https://nvlabs.github.io/Sana/docs/sana_lora_dreambooth/))
- [Inference Rollout Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-methodologies/reinforcement-learning-integrations/inference-rollout-optimizations.md) — Improves generation efficiency through decoupled two-stage rollouts and brute-force scaling. ([source](https://nvlabs.github.io/Sana/docs/sol_rl/))
- [Fully Sharded Data Parallelism](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/distributed-and-scaling-strategies/distributed-learning/fully-sharded-data-parallelism.md) — Distributes model parameters and gradients across multiple processors to enable training of massive generative networks.
- [VRAM Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/precision-quantization/vram-quantization.md) — Lowers memory usage by applying fp8 or fp4 precision to linear layers within transformer blocks. ([source](https://nvlabs.github.io/Sana/docs/sana_wm/))
- [Quantized Inference Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes.md) — Executes models using int4 precision to minimize memory overhead and increase processing speed during inference. ([source](https://nvlabs.github.io/Sana/docs/model_zoo/))
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Compresses large model checkpoints into 4-bit representations to reduce memory requirements while maintaining output quality. ([source](https://nvlabs.github.io/Sana/docs/4bit_sana/))
- [Reinforcement Learning Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-optimizers.md) — Implements reinforcement learning algorithms to refine the visual quality of generated images and videos based on reward signals. ([source](https://nvlabs.github.io/Sana/docs/sana_cosmos_rl/))
- [Reinforcement Learning Reward Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-reward-systems.md) — Implements mechanisms for quantifying and assigning rewards to guide model optimization via reinforcement learning.
- [Structural Image Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/structural-image-generation.md) — Applies fine-grained spatial and structural constraints to the image synthesis process using a specialized transformer module. ([source](https://nvlabs.github.io/Sana/docs/sana_controlnet/))
- [Training Convergence Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/training-convergence-optimization.md) — Speeds up training convergence through the use of low-precision rollout selection and high-precision optimization. ([source](https://github.com/nvlabs/sana#readme))
- [Multi-Stage Refinement](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation/multi-stage-refinement.md) — Implements a two-stage inference paradigm to improve visual quality and resolution of generated videos. ([source](https://nvlabs.github.io/Sana/Video/))
- [Streaming Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation/streaming-generation.md) — Produces video chunks incrementally and writes them to files in real-time for immediate playback. ([source](https://nvlabs.github.io/Sana/docs/sana_wm/))
- [VRAM Offloading](https://awesome-repositories.com/f/artificial-intelligence-ml/vram-offloading.md) — Enables high-resolution generation on limited hardware by offloading encoders and transformer components to the system processor. ([source](https://nvlabs.github.io/Sana/docs/sglang/))

### Part of an Awesome List

- [Foundational World Models](https://awesome-repositories.com/f/awesome-lists/ai/foundational-world-models.md) — Simulates consistent physical environments using a generative world model with precise spatial control. ([source](https://github.com/nvlabs/sana#readme))
- [KV Cache Management](https://awesome-repositories.com/f/awesome-lists/ai/kv-cache-management.md) — Utilizes block-based KV caching to efficiently generate minute-long video sequences by storing computed keys and values.
- [Inference Acceleration](https://awesome-repositories.com/f/awesome-lists/ai/inference-acceleration.md) — Reduces the number of sampling steps to accelerate the generation of high-resolution images. ([source](https://github.com/nvlabs/sana#readme))
- [Hybrid Generative Models](https://awesome-repositories.com/f/awesome-lists/ai/hybrid-generative-models.md) — One-step diffusion synthesis using continuous-time consistency distillation.

### Graphics & Multimedia

- [Generative Camera Controls](https://awesome-repositories.com/f/graphics-multimedia/generative-camera-controls.md) — Enables precise per-frame camera trajectory control using domain-specific action strings and matrices. ([source](https://nvlabs.github.io/Sana/docs/sana_wm/))
- [Generative Video Editing](https://awesome-repositories.com/f/graphics-multimedia/ai-video-editing-automation/generative-video-editing.md) — Transforms high-resolution video sequences over hundreds of frames using streaming inference and token caching. ([source](https://nvlabs.github.io/Sana/docs/sana_streaming/))
- [Cinematic Video Enhancements](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/video-transformation-enhancement/cinematic-video-enhancements.md) — Injects high-frequency details into structural priors using a distilled refiner to enhance visual fidelity. ([source](https://nvlabs.github.io/Sana/Video/bet-small-win-big/blog.html))
- [AI Style Transfers](https://awesome-repositories.com/f/graphics-multimedia/video-production/video-editing/ai-style-transfers.md) — Provides bidirectional processing for precise local modifications and style transfers on short video clips. ([source](https://nvlabs.github.io/Sana/docs/sana_streaming/))
- [Real-Time Streaming Edits](https://awesome-repositories.com/f/graphics-multimedia/video-production/video-editing/real-time-streaming-edits.md) — Implements real-time streaming video-to-video editing on minute-scale footage with temporal consistency. ([source](https://github.com/nvlabs/sana#readme))
- [Video Upscaling Pipelines](https://awesome-repositories.com/f/graphics-multimedia/video-upscaling-pipelines.md) — Combines base generation with a spatial upsampler and refiner to enhance video resolution and quality. ([source](https://nvlabs.github.io/Sana/docs/sana_video/))

### Data & Databases

- [Training Memory Optimizers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers.md) — Lowers training memory usage by offloading unused components to the system processor and utilizing 8-bit optimizers. ([source](https://nvlabs.github.io/Sana/docs/sana_lora_dreambooth/))
