30 open-source projects similar to thelastben/fast-stable-diffusion, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Fast Stable Diffusion alternative.
This project provides a cloud-based notebook configuration for deploying a Stable Diffusion web interface. It functions as a specialized environment for image generation, incorporating a model trainer for fine-tuning weights and creating training datasets. The system emphasizes infrastructure persistence by saving software installations and model files to cloud storage, avoiding repetitive setups between sessions. It uses a tunnel-based interface to expose the web dashboard to a public URL for remote interaction. The project covers end-to-end AI workflows, including dataset preparation and t
kohya_ss is a graphical user interface and workbench for fine-tuning diffusion models, specifically designed for Stable Diffusion. It provides a suite of tools for training generative AI models, including specialized interfaces for creating Low-Rank Adaptation weights and training ControlNet spatial control networks. The project distinguishes itself through integrated VRAM usage optimization and hardware acceleration, featuring specific support for Intel GPUs via XPU-accelerated libraries. It implements parameter-efficient training methods and memory-saving techniques like gradient checkpoint
Stable Diffusion Web UI is a browser-based interface for generating, editing, and upscaling images and videos using latent diffusion models. It functions as a text-to-image generator, an AI image editor, and a tool for increasing image resolution and clarity. The system includes capabilities for custom model training, specifically allowing the creation of textual inversion embeddings to teach a model new concepts and visual styles from user photos. It also provides tools for AI video production, generating short clips from text prompts. The software covers image-to-image transformation, imag
Diffusers is a PyTorch-based library and generative AI framework used to build, train, and deploy diffusion pipelines for producing multi-modal media. It provides a suite of tools for generating images, video, and audio from natural language descriptions, as well as specialized systems for text-to-image generation. The project differentiates itself through a modular architecture that separates noise schedulers, pretrained model blocks, and pipeline compositions. This structure allows for the construction of custom generation workflows and the ability to swap individual components of the diffu
sd-scripts is a suite of utilities designed for fine-tuning generative models, preprocessing datasets, and converting model weights. It provides a collection of scripts for executing Stable Diffusion training through methods such as DreamBooth, textual inversion, and full fine-tuning, alongside a framework for creating and managing Low-Rank Adaptation weights. The project features specialized capabilities for model weight conversion between different architectures and precision formats. It includes tools for merging adaptation weights into base models, extracting weights from trained models,
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
ai-toolkit is a diffusion model training toolkit designed for fine-tuning image and video generation models. It functions as a containerized model trainer and GPU training job manager, providing the infrastructure to orchestrate dependencies and manage training processes on remote GPU hardware. The system utilizes low-rank adaptation techniques, including LoRA and LoKr weight optimization, to reduce the hardware requirements for model training. It distinguishes itself through a web-based training controller that allows for the monitoring and modification of hyperparameters, secured by token-b
DiffSynth-Studio is a comprehensive platform for the lifecycle management of generative diffusion models, providing a unified environment for inference, fine-tuning, and training. It utilizes a modular pipeline architecture and a standardized abstraction layer to support consistent workflows across diverse model configurations for image and video generation. The platform distinguishes itself through a memory-optimized inference engine that dynamically manages resources to facilitate high-resolution generation on constrained hardware. It also integrates specialized training capabilities, inclu
Sana is a framework for high-resolution image and video synthesis based on a linear diffusion transformer. It provides a toolkit for the training, fine-tuning, and execution of text-to-image and text-to-video models, as well as a video generative world model capable of simulating physical environments with precise spatial control. The project is distinguished by its use of linear complexity layers to handle high resolutions and its support for long-form, minute-length video generation in real time. It implements a two-stage inference paradigm that separates structural generation from visual t
Sygil-webui is a web interface for Stable Diffusion latent diffusion models, providing a creative suite for text-to-image and text-to-video synthesis. It functions as an image generation tool and a latent diffusion image editor, allowing users to create visuals and video sequences from textual descriptions. The project includes a dedicated model training interface for creating custom textual inversion embeddings, which introduces specific new concepts or styles into the diffusion models. It also features specialized tools for generative image editing, including mask-based inpainting, image-to
This project is a framework for running Stable Diffusion image generation models on Apple Silicon using Core ML hardware acceleration. It provides a local generative AI pipeline for producing images from text prompts using Swift and Python without relying on external cloud APIs. The system includes a model converter to transform deep learning checkpoints into Core ML formats and a model optimizer to quantize weights and activations. It features a ControlNet integration layer to guide image generation using external signals such as edge and depth maps. Capabilities cover text-to-image generat
InvokeAI is a self-hosted, professional-grade platform designed for managing generative models and performing complex image synthesis. It provides a local application environment that allows users to execute diffusion models directly on their own hardware, ensuring data privacy and complete ownership of all generated assets. The platform distinguishes itself through a node-based workflow system that enables the construction of reproducible and automated image generation pipelines. By chaining modular functional units into directed acyclic graphs, users can automate intricate production tasks
Z-Image is an AI image editing engine and generation framework designed for photorealistic synthesis and the refinement of diffusion models. It functions as a multilingual text-to-image renderer and a system for training custom foundation models to generate and edit visuals using natural language instructions. The project distinguishes itself through a reasoning-based prompt enhancer that expands simple descriptions into detailed visual instructions using a structured reasoning chain. It also features specialized capabilities for rendering high-quality Chinese and English typography within ge
StableCascade is a generative AI system and latent diffusion framework designed for text-to-image synthesis and image-to-image transformations. It utilizes a multi-stage cascade architecture that encodes and decodes images via a latent space to produce high-fidelity visual imagery. The system includes a cascade diffusion pipeline for controlling image structure through inpainting, outpainting, and super-resolution. It also provides a toolkit for image-to-image generation and the creation of image variations using embeddings. The framework supports model optimization through low-rank adaptati
This project is a collection of deep learning research papers translated into annotated code. It serves as a resource for reproducing academic research, providing implementations of transformers, diffusion models, and reinforcement learning architectures. The library distinguishes itself by using a side-by-side annotation format that combines executable Python code with descriptive markdown notes. This approach provides a structured way to explain the logic of neural network papers alongside their PyTorch-based implementations. The codebase covers several major capability areas, including ge
This project provides a Chinese large language model based on the LLaMA architecture. It is an instruction-tuned model optimized for natural language processing and multi-turn conversations in Chinese. The system includes a framework for parameter-efficient fine-tuning using low-rank adaptation and quantization to reduce memory requirements. It also implements retrieval augmented generation for local document question answering and supports long-context processing for sequences up to 64K tokens. The project covers a broad set of capabilities including supervised instruction tuning, reinforce
Latent Diffusion is a framework for high-resolution image synthesis that performs the denoising process within a compressed latent space. It uses variational autoencoders to encode images into a lower-dimensional representation, reducing the computational cost of noise prediction compared to operating on raw pixels. The project enables text-to-image generation by integrating natural language descriptions through cross-attention conditioning. It also supports image inpainting and restoration, filling masked or missing image areas with generated content, and example-based synthesis using retrie
ControlNet is a framework for structural image generation that extends pre-trained diffusion models with neural network architectures designed for precise spatial control. By injecting structural guidance directly into the latent-space denoising process, the system enables users to enforce geometric or semantic constraints on generated outputs while maintaining style consistency. The framework distinguishes itself through a weight-locked copying mechanism that preserves the integrity of the original model while introducing new control signals. It supports multi-condition synthesis, allowing f
HunyuanVideo-1.5 is a video generation foundation model and text-to-video diffusion framework. It utilizes a latent video diffusion model and a spatio-temporal transformer architecture to generate high-definition video sequences from text descriptions and images. The project enables cinematic camera control for directing pans and tilts and provides image-to-video animation capabilities. It supports visual style adaptation through low-rank adaptation tuning and uses a language model for prompt refinement to improve visual alignment. The model covers high-resolution video upscaling via a super
This project is a toolkit for fine-tuning and managing text-to-image diffusion models. It focuses on low-rank adaptation to create small, portable weight files that customize model styles and behaviors without modifying the entire base model. The project provides specialized utilities for model distillation using singular value decomposition to extract adapters from fully trained models, as well as tools for blending and merging multiple adapters through weight interpolation. It includes capabilities for subject inversion and pivotal tuning to increase the visual fidelity of specific identiti
SmolLM is a project dedicated to the development of small language models. It focuses on training and fine-tuning compact models that maintain high performance while utilizing fewer parameters. The project emphasizes efficient AI inference and on-device text generation, aiming to enable the deployment of lightweight models on edge devices with limited memory and processing power. It utilizes synthetic data generation to produce artificial datasets that improve the reasoning and training of these AI systems. The system supports a variety of optimization and training capabilities, including we
ComfyUI-GGUF is a memory optimizer and model loader for ComfyUI that enables the execution of large transformer-based generative models using quantized weights. It provides a system for loading GGUF formatted weights within a node-based diffusion interface to reduce GPU memory consumption. The project includes a quantization tool for converting standard model checkpoints into compressed binary formats and a tensor fixer to restore missing keys and correct architectures in binary model files. These utilities ensure that compressed models remain functional during inference on hardware with limi
This project is a neural network extension for Stable Diffusion that provides spatial control and geometric consistency for text-to-image generation. It functions as an image structure controller and conditioning tool, enabling the use of external inputs to guide the layout and geometry of generated imagery. The framework is distinguished by its ability to transform input images into structural guides through various preprocessors. These include the extraction of depth maps, normal maps, and human pose landmarks, as well as the detection of Canny edges, anime lineart, and straight architectur
SD.Next is an all-in-one web interface and multi-backend inference engine for generating, editing, and processing images and videos using diffusion models. It functions as a comprehensive tool for diffusion model management and an automated image processing pipeline for bulk operations. The project is distinguished by its hardware-backend abstraction layer, which provides automatic detection and acceleration for NVIDIA CUDA, AMD ROCm, Intel OpenVINO, and DirectML. It features a headless generative API and a programmatic command interface, allowing users to trigger tasks via REST API or CLI wi
Flux is a diffusion model inference engine designed for text-to-image generation and image-to-image manipulation. It provides a system for executing open-weight models to transform natural language descriptions into visual imagery or to modify existing images. The project distinguishes itself through a flow-matching framework for image generation and a structural image controller. This controller allows for guided synthesis by using depth maps and Canny edge detection to constrain the geometry and composition of the output. The toolkit covers a broad range of image editing capabilities, incl
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
StyleGAN3 is a PyTorch implementation of a generative adversarial network designed for high-fidelity image synthesis. It functions as an image synthesis model and a deep learning research tool used to train and deploy networks that generate realistic synthetic imagery from custom datasets. The project is specifically an alias-free generative model, utilizing an architecture that eliminates jagged artifacts to produce smooth translational and rotational image sequences. This enables the creation of alias-free videos and the generation of high-resolution photos without visual distortions. The
Stable Diffusion Web UI is a browser-based interface designed for managing text-to-image generation tasks. It provides a centralized dashboard for controlling generative processes, including native support for multi-stage model architectures to facilitate high-quality image refinement. The platform distinguishes itself through granular control over the generation process, offering tools for precise parameter management and advanced prompt engineering. Users can customize generation styles and capabilities by integrating external model-extension formats, such as textual inversions, low-rank ad
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex