14 repository-uri
Fine-tuning generative models for specific creative or structural tasks.
Distinguishing note: Focuses on the domain of custom diffusion model fine-tuning.
Explore 14 awesome GitHub repositories matching artificial intelligence & ml · Custom Diffusion Model Training. Refine with filters or upvote what's useful.
ControlNet is a framework for structural image generation that extends pre-trained diffusion models with neural network architectures designed for precise spatial control. By injecting structural guidance directly into the latent-space denoising process, the system enables users to enforce geometric or semantic constraints on generated outputs while maintaining style consistency. The framework distinguishes itself through a weight-locked copying mechanism that preserves the integrity of the original model while introducing new control signals. It supports multi-condition synthesis, allowing f
Fine-tunes generative models on specialized datasets to learn unique visual patterns.
Diffusers is a PyTorch-based library and generative AI framework used to build, train, and deploy diffusion pipelines for producing multi-modal media. It provides a suite of tools for generating images, video, and audio from natural language descriptions, as well as specialized systems for text-to-image generation. The project differentiates itself through a modular architecture that separates noise schedulers, pretrained model blocks, and pipeline compositions. This structure allows for the construction of custom generation workflows and the ability to swap individual components of the diffu
Provides a toolbox for developing and fine-tuning custom diffusion models for specific styles or tasks.
This project is a comprehensive framework and toolkit for developing, optimizing, and deploying transformer-based models across multimodal, document intelligence, and natural language processing tasks. It provides a unified neural architecture that processes text, vision, audio, and document layout data through a shared set of weights, enabling researchers and developers to build foundational models that align cross-modal representations. The platform distinguishes itself through advanced training and inference strategies designed for large-scale deep learning. It incorporates specialized mec
Trains two-stage diffusion models on large-scale image-text datasets annotated with character-level segmentation masks and optical character recognition data.
This project provides a cloud-based notebook configuration for deploying a Stable Diffusion web interface. It functions as a specialized environment for image generation, incorporating a model trainer for fine-tuning weights and creating training datasets. The system emphasizes infrastructure persistence by saving software installations and model files to cloud storage, avoiding repetitive setups between sessions. It uses a tunnel-based interface to expose the web dashboard to a public URL for remote interaction. The project covers end-to-end AI workflows, including dataset preparation and t
Provides a toolkit for fine-tuning generative diffusion models on specialized datasets.
DiffSynth-Studio is a comprehensive platform for the lifecycle management of generative diffusion models, providing a unified environment for inference, fine-tuning, and training. It utilizes a modular pipeline architecture and a standardized abstraction layer to support consistent workflows across diverse model configurations for image and video generation. The platform distinguishes itself through a memory-optimized inference engine that dynamically manages resources to facilitate high-resolution generation on constrained hardware. It also integrates specialized training capabilities, inclu
Enables the development of specialized generative models through training on custom datasets for precise artistic control.
kohya_ss is a graphical user interface and workbench for fine-tuning diffusion models, specifically designed for Stable Diffusion. It provides a suite of tools for training generative AI models, including specialized interfaces for creating Low-Rank Adaptation weights and training ControlNet spatial control networks. The project distinguishes itself through integrated VRAM usage optimization and hardware acceleration, featuring specific support for Intel GPUs via XPU-accelerated libraries. It implements parameter-efficient training methods and memory-saving techniques like gradient checkpoint
Ships a full suite of tools for configuring optimizers and datasets to train diffusion-based neural networks.
Z-Image is an AI image editing engine and generation framework designed for photorealistic synthesis and the refinement of diffusion models. It functions as a multilingual text-to-image renderer and a system for training custom foundation models to generate and edit visuals using natural language instructions. The project distinguishes itself through a reasoning-based prompt enhancer that expands simple descriptions into detailed visual instructions using a structured reasoning chain. It also features specialized capabilities for rendering high-quality Chinese and English typography within ge
Enables the training and specialization of custom diffusion models to improve specific image generation and editing capabilities.
ai-toolkit is a diffusion model training toolkit designed for fine-tuning image and video generation models. It functions as a containerized model trainer and GPU training job manager, providing the infrastructure to orchestrate dependencies and manage training processes on remote GPU hardware. The system utilizes low-rank adaptation techniques, including LoRA and LoKr weight optimization, to reduce the hardware requirements for model training. It distinguishes itself through a web-based training controller that allows for the monitoring and modification of hyperparameters, secured by token-b
Offers a comprehensive toolkit for fine-tuning image and video diffusion models using custom datasets.
Sana is a framework for high-resolution image and video synthesis based on a linear diffusion transformer. It provides a toolkit for the training, fine-tuning, and execution of text-to-image and text-to-video models, as well as a video generative world model capable of simulating physical environments with precise spatial control. The project is distinguished by its use of linear complexity layers to handle high resolutions and its support for long-form, minute-length video generation in real time. It implements a two-stage inference paradigm that separates structural generation from visual t
Implements training workflows for high-resolution image synthesis using a linear diffusion transformer.
Sygil-webui is a web interface for Stable Diffusion latent diffusion models, providing a creative suite for text-to-image and text-to-video synthesis. It functions as an image generation tool and a latent diffusion image editor, allowing users to create visuals and video sequences from textual descriptions. The project includes a dedicated model training interface for creating custom textual inversion embeddings, which introduces specific new concepts or styles into the diffusion models. It also features specialized tools for generative image editing, including mask-based inpainting, image-to
Trains textual inversion embeddings from photos to teach the model specific new concepts or styles.
Stable Diffusion Web UI is a browser-based interface for generating, editing, and upscaling images and videos using latent diffusion models. It functions as a text-to-image generator, an AI image editor, and a tool for increasing image resolution and clarity. The system includes capabilities for custom model training, specifically allowing the creation of textual inversion embeddings to teach a model new concepts and visual styles from user photos. It also provides tools for AI video production, generating short clips from text prompts. The software covers image-to-image transformation, imag
Allows users to train custom textual inversion embeddings to teach the model new concepts and styles.
This project is a cloud-based AI deployment system and latent diffusion model trainer. It provides a framework for launching image generation interfaces and training pipelines on remote GPU infrastructure, specifically serving as a text-to-image model fine-tuner. The system features a specialized training interface for fine-tuning Stable Diffusion models on custom image datasets. It allows for the creation of personalized visual outputs by training models on specific subjects or artistic styles using a small set of reference images. The software covers generative AI deployment, custom style
Provides a framework for fine-tuning generative diffusion models on specialized datasets to match specific subjects.
This project is a Dreambooth implementation designed to personalize Stable Diffusion models. It serves as an AI image personalization tool and model tuner that enables the creation of unique subject identifiers to generate consistent, personalized images. The system focuses on subject-driven image synthesis by fine-tuning pre-trained diffusion models on small, custom datasets. This allows the model to recognize specific people, objects, or artistic styles and place those learned subjects into diverse contexts via text-to-image conditioning. The implementation includes a diffusion model optim
Provides custom diffusion model training capabilities to optimize models for specific creative subjects or structural tasks.
This project is a neural network extension for Stable Diffusion that provides spatial control and geometric consistency for text-to-image generation. It functions as an image structure controller and conditioning tool, enabling the use of external inputs to guide the layout and geometry of generated imagery. The framework is distinguished by its ability to transform input images into structural guides through various preprocessors. These include the extraction of depth maps, normal maps, and human pose landmarks, as well as the detection of Canny edges, anime lineart, and straight architectur
Supports the fine-tuning of generative models for specific structural tasks through dataset preprocessing and annotation.