ComfyUI Nunchaku

ComfyUI-nunchaku is a 4-bit diffusion inference engine and a set of nodes for running low-precision quantized diffusion models within ComfyUI visual workflows. It provides a backend that reduces memory overhead and increases generation speed for transformer models.

The project includes specialized tools for identity-preserving generation and an image-to-image guidance toolkit that uses depth maps and reference images. It also features a multimodal visual question answering implementation and a utility for merging multiple quantized model files into single unified files.

The engine covers a broad range of image generation and editing capabilities, including text-to-image generation, structural image control, text-based inpainting, and face restoration. It implements various performance optimizations such as device-aware memory offloading, fused kernel projections, and joint image-text attention.

The system includes hardware-aware dependency installation that detects system hardware to deploy compatible pre-compiled binaries.

Features

Text-to-Image Generators - Produces visual assets from written descriptions using a memory-efficient, 4-bit quantized inference engine.

Stable Diffusion Inference Engines - Provides a high-performance 4-bit inference engine specifically for executing transformer-based diffusion models.

Identity Adapters - Integrates specialized identity adapters into diffusion models to maintain consistent person-specific features.

Image-to-Image Diffusion Toolkits - Ships a toolkit for controlling image generation using depth maps, reference images, and identity weights.

Node-Based Generative Pipelines - Integrates low-precision model execution into visual graph-based workflows via ComfyUI nodes.

Identity-Driven Image Generation - Extracts identity embeddings from a reference image to keep person-specific features consistent during generation.

Joint Attention Mechanisms - Computes joint attention for image and text streams to accelerate inference speed.

LoRA Adapter Loaders - Integrates LoRA weights with adjustable strengths to modify model output characteristics.

Quantized Inference Accelerators - Performs efficient image-based inference using low-precision neural networks within a workflow.

Quantized Text-to-Image Generation - Provides a memory-efficient 4-bit inference engine for generating images from text prompts.

Quantized Inference Runtimes - Executes low-precision neural networks to reduce memory usage and increase image generation speed.

ComfyUI Custom Node Suites - Implements a suite of ComfyUI custom nodes for running quantized diffusion models in visual workflows.

Low-Bit Inference Engines - Provides an execution engine optimized for running models with 4-bit packed weights to reduce memory overhead.

Joint Image-Text Attention - Accelerates inference by computing combined attention for image and text streams within a single operation.

Image-to-Image Translation - Converts an existing image into a new visual output using a reference source and text prompt.

Transformer Projection Kernels - Accelerates inference by combining projections and rotations into single optimized kernels within transformer layers.

Hardware Device Management - Automatically manages the movement of model weights between CPU and GPU hardware.

Composition-Controlled Generators - Directs the image generation process based on spatial or structural input to ensure precise composition.

Image Editing - Provides tools for modifying existing visual content using generative AI instructions.

Text-Guided Inpainting - Fills or modifies specific image regions using text descriptions and low-precision models.

Quantized Editing - Modifies existing images using compressed neural networks and acceleration techniques to reduce memory usage.

Reference-Conditioned Generation - Creates new images from a reference source utilizing a vision-language model.

Style Transfers - Generates new images based on an input reference while maintaining structural consistency via depth detection.

Inference Optimization Kernels - Implements fused kernel projections and rotations to accelerate transformer model inference speed.

Weight Merging Utilities - Merges multiple quantized weight files into a single state to blend styles or preserve identities.

Local Inference Engines - Manages the installation and removal of the optimized 4-bit inference engine.

Weight Offloading - Implements dynamic movement of model weights between CPU and GPU memory to support high-resolution generation.

Model Weight Utilities - Combines multi-file quantized model directories into single unified files for streamlined distribution.

Multimodal Model Runners - Implements a visual question answering node that processes images and text using quantized multimodal models.

Model Merging - Combines multiple pre-trained diffusion model weights to blend styles or capabilities.

Quantized Model Consolidation - Combines multiple quantized model files into a single unified file for easier distribution.

Weight Quantization - Compresses and combines model weights into lower-precision formats to reduce memory footprint.

Quantized Model Implementations - Implements specialized low-precision transformer model versions to reduce memory usage within visual workflows.

Visual Reference Prompting - Influences generated output by injecting visual reference data into the model pipeline via specialized adapters.

Visual Question Answering - Provides a visual question answering implementation that processes images and text using quantized multimodal models.

Model Memory Managers - Controls the allocation and offloading of model weights between system memory and GPU to optimize VRAM.

nunchaku-aiComfyUI-nunchaku

Features

Star history