TurboDiffusion

Features

Text-to-Video Generators - Provides an optimized engine for synthesizing high-resolution videos directly from natural language text prompts.

Video Diffusion Models - A high-resolution video generator that uses optimized diffusion model checkpoints for text-to-video and image-to-video synthesis.

Step-Distilled Accelerators - Reduces generation time and the number of inference steps through attention acceleration and timestep distillation.

Inference Optimizations - Reduces inference time and GPU memory usage through quantization, timestep distillation, and attention acceleration.

GPU Memory Optimizers - Uses weight quantization to optimize VRAM usage, enabling execution on consumer-grade GPU hardware.

Quantized Model Implementations - Provides a memory-optimized runtime using low-precision weight formats to run large video models on consumer GPUs.

Sparse Attention Modules - Replaces dense attention mechanisms with sparse linear approximations to lower the computational cost of video frames.

Stable Diffusion Inference Engines - A high-performance runtime for executing video diffusion model weights with attention acceleration and timestep distillation.

Step Distillation Accelerators - Accelerates video generation by distilling the diffusion sampling process into a reduced number of iterations.

Sparse Attention Alignment - Trains sparse-attention models to mimic pretrained full-attention models to maintain output quality while reducing computation.

Sparse Attention Alignment Toolkits - Provides a toolkit for aligning sparse-linear attention models with pretrained full-attention models to reduce computational costs.

Sparse Linear Alignment - Aligns a sparse-attention model's predictions with a full-attention pretrained model to mitigate distribution shift.

Interactive Video Inference - Enables multi-turn video sequence generation through a terminal interface to eliminate model reload times.

Weight Merging Utilities - Implements utilities to merge sparse-attention weight updates directly into model checkpoints to reduce latency.

Model Inference Servers - Provides a server implementation for hosting video generation models to enable network-accessible inference.

Sparse Attention Parameter Merging - Combines parameter updates from sparse-attention training into existing checkpoints to enable sparse attention inference.

Image-to-Video Generation - Transforms static images into video sequences by combining a starting frame with a descriptive text prompt.

Persistent Inference States - Maintains persistent model instances in memory to enable multi-turn video generation without reloading between requests.

TurboDiffusion is a video diffusion inference engine and generator designed to create high-resolution videos from text prompts and images. It provides a runtime environment for executing optimized diffusion model checkpoints with a focus on reducing latency and GPU memory usage.

The project features a specialized training framework for aligning sparse-linear attention models with pretrained full-attention models. This system includes capabilities for sparse attention parameter merging and sparse-linear model alignment to reduce computational costs during inference while maintaining output quality.

The engine implements several performance optimization strategies, including weight quantization for consumer-grade hardware, timestep distillation to reduce the number of inference steps, and sparse-attention approximations. It also supports an interactive inference server that enables stateful, multi-turn video generation through a terminal interface to eliminate model reload times.

Features