30 open-source projects similar to automatic1111/stable-diffusion-webui, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Stable Diffusion Webui alternative.
ComfyUI is a modular generative AI workflow orchestrator and node-based GUI for designing and executing complex diffusion model pipelines. It functions as both a visual interface for building generative logic graphs and a programmable backend API that exposes diffusion model operations for external integration. The system distinguishes itself through a graph-based execution model that supports differential workflow execution, re-running only modified nodes to reduce computation. It features dynamic model offloading to manage memory between system RAM and GPU VRAM and utilizes metadata-embedde
InvokeAI is a self-hosted, professional-grade platform designed for managing generative models and performing complex image synthesis. It provides a local application environment that allows users to execute diffusion models directly on their own hardware, ensuring data privacy and complete ownership of all generated assets. The platform distinguishes itself through a node-based workflow system that enables the construction of reproducible and automated image generation pipelines. By chaining modular functional units into directed acyclic graphs, users can automate intricate production tasks
Semantic Kernel is an artificial intelligence orchestration framework designed to integrate large language models with existing codebases. It functions as an agentic workflow engine, providing a standardized interface that connects generative models to traditional application logic, data sources, and external tools to automate complex, multi-step business tasks. The platform distinguishes itself through a modular plugin architecture and a planner-based reasoning engine that decomposes high-level goals into executable sequences of functions. By utilizing a connector-based abstraction layer, it
This project is a speech recognition and translation engine that utilizes a sequence-to-sequence transformer architecture to convert audio into text. It is built upon a weakly supervised learning framework, which leverages large-scale, unlabelled audio-transcript data to create generalized speech representations capable of performing simultaneous transcription, language identification, and translation. The system distinguishes itself through a unified multi-task modeling approach that shares token sequences across different objectives, allowing it to handle diverse languages and vocabularies
TensorRT-LLM is a platform and toolkit designed for compiling, optimizing, and serving transformer-based models on accelerated hardware. It functions as a framework that transforms machine learning models into efficient execution graphs, providing an engine to refine these models for specific hardware to maximize throughput and minimize latency during text generation. The project distinguishes itself through advanced execution strategies that manage the entire inference pipeline. It utilizes kernel-level fusion and static graph execution to optimize mathematical operations and computational f
Whisper.cpp is a high-performance, local-first speech recognition engine designed to run large-scale machine learning models on consumer hardware. It functions as a portable library that converts audio into text, supporting both static file transcription and real-time stream processing. By utilizing a lightweight inference engine and weight quantization, the project minimizes memory and compute overhead, allowing for efficient execution without reliance on external cloud APIs or internet connectivity. The project distinguishes itself through a hardware-agnostic compute abstraction that offloa
LocalAI is a local generative AI platform and inference engine designed to host large language, vision, and audio models on private hardware. It functions as an API compatible gateway that mimics proprietary service endpoints, allowing existing third-party software to integrate with a self-hosted backend. The platform distinguishes itself as a distributed AI model orchestrator, capable of scaling inference across machine clusters using VRAM-aware routing and hardware coordination. It provides a unified interface for diverse open-source backends and supports self-hosted RAG infrastructure thro
Audiocraft is a deep learning audio library and machine learning framework designed for training, fine-tuning, and evaluating generative models for music and sound effects. It functions as a text-to-music generative model and a neural audio codec, providing the tools necessary to compress audio signals into discrete representations and synthesize high-fidelity waveforms from textual descriptions. The framework is distinguished by its ability to combine multiple conditioning signals, allowing for the generation of audio based on text prompts, melodic excerpts, or style-based audio clips. It al
This project is a comprehensive platform for hosting and interacting with large language models directly on local hardware. It provides a web-based graphical interface that allows users to manage model loading, configure generation parameters, and execute text or chat interactions entirely offline. By running models locally, the software ensures complete data privacy and eliminates reliance on external cloud services for generative tasks. Beyond basic inference, the platform functions as a versatile workbench for generative AI development. It includes an integrated pipeline for fine-tuning mo
Flux is a diffusion model inference engine designed for text-to-image generation and image-to-image manipulation. It provides a system for executing open-weight models to transform natural language descriptions into visual imagery or to modify existing images. The project distinguishes itself through a flow-matching framework for image generation and a structural image controller. This controller allows for guided synthesis by using depth maps and Canny edge detection to constrain the geometry and composition of the output. The toolkit covers a broad range of image editing capabilities, incl
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vect
This project provides a cloud-based notebook configuration for deploying a Stable Diffusion web interface. It functions as a specialized environment for image generation, incorporating a model trainer for fine-tuning weights and creating training datasets. The system emphasizes infrastructure persistence by saving software installations and model files to cloud storage, avoiding repetitive setups between sessions. It uses a tunnel-based interface to expose the web dashboard to a public URL for remote interaction. The project covers end-to-end AI workflows, including dataset preparation and t
This project is a cloud-based AI deployment system and latent diffusion model trainer. It provides a framework for launching image generation interfaces and training pipelines on remote GPU infrastructure, specifically serving as a text-to-image model fine-tuner. The system features a specialized training interface for fine-tuning Stable Diffusion models on custom image datasets. It allows for the creation of personalized visual outputs by training models on specific subjects or artistic styles using a small set of reference images. The software covers generative AI deployment, custom style
Uptime Kuma is a self-hosted monitoring platform designed to track the availability and performance of network services and websites. It functions as a centralized dashboard that executes asynchronous health checks on a scheduled interval, providing real-time visibility into infrastructure health and service uptime. The platform distinguishes itself through a dedicated notification engine that dispatches alerts across multiple third-party messaging services, alongside a public status page generator that allows users to communicate service health and historical metrics via custom domains. Its
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of images from text prompts and the transformation of existing visual inputs based on semantic instructions. The architecture utilizes a modular execution environment that decouples model loading, scheduler logic, and inference components to support diverse hardware configurations. I
Prompt Optimizer is a framework designed for the iterative refinement and testing of text-based instructions for large language models. It functions as an automated evaluation pipeline that systematically adjusts prompt structure, constraints, and clarity to improve the accuracy and consistency of model outputs. The system distinguishes itself through a model-agnostic interface that standardizes communication across different artificial intelligence providers. It incorporates a versioned asset management system to track prompt history, enabling developers to maintain consistency and perform r
GPT Researcher is an autonomous agent framework designed to automate the process of gathering, synthesizing, and documenting information from diverse web and local sources. It functions as a research-oriented execution environment that orchestrates specialized agents to perform complex, multi-branch research tasks, transforming raw data into structured, factual, and cited reports. The project distinguishes itself through a graph-based orchestration layer that manages state transitions and information flow between specialized agents. It employs recursive tree-search execution to explore comple
tiny-llm is a large language model inference engine and transformer model implementation. It serves as a quantized model runtime and paged key-value cache manager, providing a specialized inference stack optimized for Apple Silicon. The system distinguishes itself through high-throughput execution techniques, including continuous batching and paged attention. It utilizes a paged memory system to eliminate fragmentation during token generation and employs on-the-fly dequantization of compressed weights to reduce the memory footprint during matrix multiplication. The project covers a broad ran
This project is an LLM translation plugin and extension for the Bob translation tool. It connects the application to external large language models via API to provide real-time text translation, automated grammar correction, and content polishing. The plugin functions as an AI prompt engineering tool, allowing for the definition of custom system instructions and templates to control the style, tone, and reasoning depth of the output. It operates as a streamed API client that processes incremental text responses to display translation results in real time. Its capabilities cover translation b
L1B3RT4S is an adversarial machine learning toolkit designed for red teaming and evaluating the robustness of large language models. It provides a research framework for investigating how safety alignment mechanisms and content moderation systems respond to sophisticated input strategies. The project focuses on identifying vulnerabilities in model guardrails by employing techniques such as adversarial narrative framing, dynamic context injection, and latent space steering. It utilizes multi-agent prompt decomposition and recursive text transformation to analyze how structural changes to input
ComfyUIIPAdapterplus is a node-based extension for ComfyUI that implements IPAdapter models to guide image generation using reference images. It functions as an image prompting tool and a Stable Diffusion image adapter, allowing reference files to serve as visual prompts for controlling style, composition, and subject identity. The project provides specialized capabilities for maintaining facial identity and high-fidelity features across generated portraits. It enables the transfer of visual characteristics and artistic styles from reference images, as well as the extraction of spatial layo
Adetailer is a Stable Diffusion inpainting extension and automated detail enhancer that identifies specific image regions to improve quality through targeted inpainting. It functions as an AI image masking tool that uses detection models to create precise masks for automated image editing. The system distinguishes itself by integrating structural guides, such as depth and pose, to constrain the inpainting process and maintain anatomical consistency. It also supports object-specific prompt assignment, allowing unique text instructions to be mapped to multiple detected objects within a single i
This project is a Stable Diffusion WebUI extension that provides a graphical interface for personalized portrait generation and AI photo editing. It allows users to train custom identity models from a small set of uploaded images to create consistent digital versions of specific people. The extension includes a virtual try-on system that replaces clothing in images by aligning reference garments with template bodies. It also features tools for face swapping in both static images and videos, as well as a portrait animator that transforms static images into dynamic videos using reference-guided
A free and open-source inpainting & image-upscaling tool powered by webgpu and wasm on the browser。| 基于 Webgpu 技术和 wasm 技术的免费开源 inpainting & image-upscaling 工具, 纯浏览器端实现。
Modly is a local AI 3D model generator that converts two-dimensional images into three-dimensional meshes. It is a privacy-focused tool that processes data directly on the host graphics card using GPU-accelerated inference. The system serves as an extensible AI model framework, allowing the integration of external model extensions and runtime files from remote repositories. It utilizes a manifest-driven plugin architecture to add new generation methods by loading metadata and files from external version control systems. The toolset includes a command-line interface for triggering generation
Open-Sora is a video generation framework designed to produce cinematic sequences from text prompts and images. It functions as a generative system that transforms written descriptions or reference images into video content featuring realistic textures and lighting. The project includes a dedicated prompt engineering tool that uses large language models to expand simple user inputs into detailed descriptions. It also features a motion controller for adjusting movement intensity in generated sequences and evaluating motion levels in existing video files. The framework incorporates text-to-vid
This project serves as a comprehensive reference tool for prompt engineering within generative image models. It provides a structured guide for exploring artistic styles, technical parameters, and keyword combinations to assist in achieving specific aesthetic outcomes and consistent visual themes. The resource distinguishes itself by enabling direct comparisons between different model versions, allowing users to observe how specific keywords and settings influence output quality over time. By organizing visual examples and technical data into a hierarchical taxonomy, it facilitates the iterat
This project is a neural network extension for Stable Diffusion that provides spatial control and geometric consistency for text-to-image generation. It functions as an image structure controller and conditioning tool, enabling the use of external inputs to guide the layout and geometry of generated imagery. The framework is distinguished by its ability to transform input images into structural guides through various preprocessors. These include the extraction of depth maps, normal maps, and human pose landmarks, as well as the detection of Canny edges, anime lineart, and straight architectur
Cog is a machine learning packaging tool and containerized model wrapper that bundles models and their dependencies into standardized Docker containers. It functions as an environment manager and inference server, ensuring consistent model execution across different hardware systems by resolving GPU drivers, system libraries, and Python dependencies. The project distinguishes itself by automatically generating RESTful HTTP servers and OpenAPI schemas based on defined model input and output types. It manages large model weights as external fixtures to optimize image size and utilizes a slot-ba