ControlNet V1 1 Nightly | Awesome Repository

This project is a neural network extension for Stable Diffusion that provides spatial control and geometric consistency for text-to-image generation. It functions as an image structure controller and conditioning tool, enabling the use of external inputs to guide the layout and geometry of generated imagery.

The framework is distinguished by its ability to transform input images into structural guides through various preprocessors. These include the extraction of depth maps, normal maps, and human pose landmarks, as well as the detection of Canny edges, anime lineart, and straight architectural lines. It also supports semantic segmentation to define object placement via colored masks and converts hand-drawn scribbles into detailed images.

Beyond basic conditioning, the project covers image editing and upscaling through tiled detail refinement and inpainting. It provides tools for custom diffusion model training, including dataset annotation and content shuffle preprocessing. Performance is managed via GPU memory optimizations such as sliced attention to reduce resource consumption during the sampling process.

Features

Spatial Conditioning Controllers - Injects structured image data like depth and edge maps as conditioning signals to guide the denoising process.
Diffusion Structural Control - Provides a framework for guiding diffusion model output using spatial constraints like depth maps and semantic segmentation.
Visual Landmark Extractors - Identifies human body and face landmarks from photographs to control character posture in generated images.

Features

Spatial Conditioning Controllers - Injects structured image data like depth and edge maps as conditioning signals to guide the denoising process.
Diffusion Structural Control - Provides a framework for guiding diffusion model output using spatial constraints like depth maps and semantic segmentation.
Visual Landmark Extractors - Identifies human body and face landmarks from photographs to control character posture in generated images.