# thu-ml/turbodiffusion

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/thu-ml-turbodiffusion).**

3,339 stars · 232 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/thu-ml/TurboDiffusion
- Homepage: https://arxiv.org/pdf/2512.16093
- awesome-repositories: https://awesome-repositories.com/repository/thu-ml-turbodiffusion.md

## Topics

`ai-infra` `consistency-model` `diffusion-models` `distillation` `inference-acceleration` `mlsystem` `rcm` `sageattention` `sparse-linear-attention` `video-generation`

## Description

TurboDiffusion is a video diffusion inference engine and generator designed to create high-resolution videos from text prompts and images. It provides a runtime environment for executing optimized diffusion model checkpoints with a focus on reducing latency and GPU memory usage.

The project features a specialized training framework for aligning sparse-linear attention models with pretrained full-attention models. This system includes capabilities for sparse attention parameter merging and sparse-linear model alignment to reduce computational costs during inference while maintaining output quality.

The engine implements several performance optimization strategies, including weight quantization for consumer-grade hardware, timestep distillation to reduce the number of inference steps, and sparse-attention approximations. It also supports an interactive inference server that enables stateful, multi-turn video generation through a terminal interface to eliminate model reload times.

## Tags

### Artificial Intelligence & ML

- [Text-to-Video Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-pipelines/text-to-video-generators.md) — Provides an optimized engine for synthesizing high-resolution videos directly from natural language text prompts. ([source](https://github.com/thu-ml/TurboDiffusion/blob/main/README.md))
- [Video Diffusion Models](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-models/latent-diffusion-models/video-diffusion-models.md) — A high-resolution video generator that uses optimized diffusion model checkpoints for text-to-video and image-to-video synthesis.
- [Step-Distilled Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/diffusion-models/inference-acceleration/step-distilled-accelerators.md) — Reduces generation time and the number of inference steps through attention acceleration and timestep distillation. ([source](https://github.com/thu-ml/TurboDiffusion#readme))
- [Inference Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/diffusion-visual-models/generative-ai-models/diffusion-models/inference-optimizations.md) — Reduces inference time and GPU memory usage through quantization, timestep distillation, and attention acceleration.
- [GPU Memory Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-memory-optimizers.md) — Uses weight quantization to optimize VRAM usage, enabling execution on consumer-grade GPU hardware.
- [Quantized Model Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization/quantized-model-implementations.md) — Provides a memory-optimized runtime using low-precision weight formats to run large video models on consumer GPUs.
- [Sparse Attention Modules](https://awesome-repositories.com/f/artificial-intelligence-ml/sparse-attention-modules.md) — Replaces dense attention mechanisms with sparse linear approximations to lower the computational cost of video frames.
- [Stable Diffusion Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/stable-diffusion-inference-engines.md) — A high-performance runtime for executing video diffusion model weights with attention acceleration and timestep distillation.
- [Step Distillation Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/step-based-schedulers/step-execution-engines/execution-step-controllers/denoising-step-controllers/step-distillation-accelerators.md) — Accelerates video generation by distilling the diffusion sampling process into a reduced number of iterations.
- [Sparse Attention Alignment](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/linear-complexity-attention/linear-attention-training-frameworks/sparse-attention-alignment.md) — Trains sparse-attention models to mimic pretrained full-attention models to maintain output quality while reducing computation.
- [Sparse Attention Alignment Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/linear-complexity-attention/linear-attention-training-frameworks/sparse-attention-alignment-toolkits.md) — Provides a toolkit for aligning sparse-linear attention models with pretrained full-attention models to reduce computational costs.
- [Sparse Linear Alignment](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/linear-complexity-attention/linear-attention-training-frameworks/sparse-linear-alignment.md) — Aligns a sparse-attention model's predictions with a full-attention pretrained model to mitigate distribution shift. ([source](https://github.com/thu-ml/TurboDiffusion#readme))
- [Interactive Video Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/interactive-video-inference.md) — Enables multi-turn video sequence generation through a terminal interface to eliminate model reload times.
- [Weight Merging Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/instruction-tuned-language-models/weight-space-merging-techniques/weight-merging-utilities.md) — Implements utilities to merge sparse-attention weight updates directly into model checkpoints to reduce latency.
- [Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/model-inference-servers.md) — Provides a server implementation for hosting video generation models to enable network-accessible inference. ([source](https://github.com/thu-ml/TurboDiffusion/blob/main/README.md))
- [Sparse Attention Parameter Merging](https://awesome-repositories.com/f/artificial-intelligence-ml/sparse-attention-parameter-merging.md) — Combines parameter updates from sparse-attention training into existing checkpoints to enable sparse attention inference. ([source](https://github.com/thu-ml/TurboDiffusion#readme))
- [Image-to-Video Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/video-generation/image-to-video-generation.md) — Transforms static images into video sequences by combining a starting frame with a descriptive text prompt. ([source](https://github.com/thu-ml/TurboDiffusion/blob/main/README.md))

### Data & Databases

- [Persistent Inference States](https://awesome-repositories.com/f/data-databases/storage-engines/key-value/inference-state-caching/inference-state-management/persistent-inference-states.md) — Maintains persistent model instances in memory to enable multi-turn video generation without reloading between requests.
