30 open-source projects similar to easydiffusion/easydiffusion, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Easydiffusion alternative.
DiffusionBee is a Stable Diffusion desktop client for macOS that functions as an AI image generator and editor. It allows for the local generation of images from text prompts and the management of diffusion models without requiring external cloud services or technical setup. The application includes a local diffusion model manager for importing and switching between custom trained model files to achieve specific artistic styles. It also features a system for tracking generation history and uploading assets to a public gallery. The software covers several image synthesis and manipulation work
Qwen-Image is a text-to-image model and large language model image generation framework. It functions as an AI image editing suite and a personalized image trainer, capable of producing high-fidelity visuals and accurate typography from natural language descriptions. The system is distinguished by its precision text rendering engine, which integrates multi-script calligraphy and layout-coherent alphabetic text into images. It provides specialized capabilities for subject identity preservation and consistent subject generation across different poses and viewpoints, alongside a training pipelin
Flux is a diffusion model inference engine designed for text-to-image generation and image-to-image manipulation. It provides a system for executing open-weight models to transform natural language descriptions into visual imagery or to modify existing images. The project distinguishes itself through a flow-matching framework for image generation and a structural image controller. This controller allows for guided synthesis by using depth maps and Canny edge detection to constrain the geometry and composition of the output. The toolkit covers a broad range of image editing capabilities, incl
Stable Diffusion Web UI is a browser-based interface for generating, editing, and upscaling images and videos using latent diffusion models. It functions as a text-to-image generator, an AI image editor, and a tool for increasing image resolution and clarity. The system includes capabilities for custom model training, specifically allowing the creation of textual inversion embeddings to teach a model new concepts and visual styles from user photos. It also provides tools for AI video production, generating short clips from text prompts. The software covers image-to-image transformation, imag
ComfyUI is a modular generative AI workflow orchestrator and node-based GUI for designing and executing complex diffusion model pipelines. It functions as both a visual interface for building generative logic graphs and a programmable backend API that exposes diffusion model operations for external integration. The system distinguishes itself through a graph-based execution model that supports differential workflow execution, re-running only modified nodes to reduce computation. It features dynamic model offloading to manage memory between system RAM and GPU VRAM and utilizes metadata-embedde
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
Sygil-webui is a web interface for Stable Diffusion latent diffusion models, providing a creative suite for text-to-image and text-to-video synthesis. It functions as an image generation tool and a latent diffusion image editor, allowing users to create visuals and video sequences from textual descriptions. The project includes a dedicated model training interface for creating custom textual inversion embeddings, which introduces specific new concepts or styles into the diffusion models. It also features specialized tools for generative image editing, including mask-based inpainting, image-to
This project is a containerized deployment for running Stable Diffusion web interfaces. It provides a portable runtime for generative AI that manages dependencies and hardware acceleration to enable text-to-image generation and image-to-image transformations via a browser-based interface. The system uses hardware-specific image tags to support both GPU-accelerated synthesis and CPU-only execution. It ensures environment isolation across different operating systems while utilizing bind-mount data persistence to keep heavy model weights and generated outputs on the host machine. The deployment
This project is a plugin for Krita that integrates Stable Diffusion image generation and editing tools directly into the painting interface. It functions as a remote diffusion backend client, bridging the digital canvas to local or remote servers to handle the computation required for AI image generation. The system distinguishes itself through a real-time painting interface that translates brushstrokes into generated imagery as the artist works. It acts as a structural orchestrator, using sketches, depth maps, and poses to maintain precise composition, and provides a generative inpainting to
OmniGen is a unified image generation model and diffusion framework that processes text, images, and vision tasks through a single system. It functions as a multimodal diffusion framework that treats diverse vision operations as unified image synthesis problems using shared model weights, removing the need for external adapter modules. The system supports subject-driven image generation to preserve the identity of objects from reference photos and allows for multi-reference image synthesis. It also operates as an instruction-based image editor, modifying visual content through natural languag
This project is a plugin for Photoshop that integrates Stable Diffusion backends, allowing users to generate and edit AI images directly within the graphic design workspace. It serves as an interface bridge between the image editor and remote GPU workers to perform generative tasks without requiring local hardware power. The plugin specifically provides connection layers for Automatic1111 and ComfyUI backends. This enables the execution of text-to-image generation, inpainting, and outpainting operations on the design canvas by communicating with these external engines via an API. The system
ComfyUI-nunchaku is a 4-bit diffusion inference engine and a set of nodes for running low-precision quantized diffusion models within ComfyUI visual workflows. It provides a backend that reduces memory overhead and increases generation speed for transformer models. The project includes specialized tools for identity-preserving generation and an image-to-image guidance toolkit that uses depth maps and reference images. It also features a multimodal visual question answering implementation and a utility for merging multiple quantized model files into single unified files. The engine covers a b
stable-diffusion.cpp is a high-performance C++ inference engine designed for generating images and video from text prompts using Stable Diffusion models. It functions as a latent diffusion model runtime and a lightweight machine learning framework that enables local diffusion model execution on consumer hardware. The project distinguishes itself as a CPU-based image generator capable of running without a dedicated GPU. It employs a specialized C++ tensor backend and cross-backend hardware abstraction to dispatch compute tasks across different processor instruction sets and graphics APIs. The
StableCascade is a generative AI system and latent diffusion framework designed for text-to-image synthesis and image-to-image transformations. It utilizes a multi-stage cascade architecture that encodes and decodes images via a latent space to produce high-fidelity visual imagery. The system includes a cascade diffusion pipeline for controlling image structure through inpainting, outpainting, and super-resolution. It also provides a toolkit for image-to-image generation and the creation of image variations using embeddings. The framework supports model optimization through low-rank adaptati
CoAI is an enterprise-grade, self-hostable AI gateway platform that unifies access to over 200 AI models from more than 35 providers through a single OpenAI-compatible API endpoint. It functions as a multi-tenant gateway, routing requests across providers with load balancing, automatic failover, and priority-based routing, while exposing standard OpenAI API endpoints for chat, image generation, model listing, and billing to enable seamless integration with existing tools and clients. The platform distinguishes itself through a comprehensive set of operational capabilities built around the gat
Learn_Prompting is an educational project focused on prompt engineering, providing the principles and techniques required to craft effective inputs and improve the quality of generative AI outputs. The project covers advanced prompting strategies to enhance reasoning, reliability, and output quality. This includes techniques for task decomposition, chain-of-thought reasoning, and the use of few-shot and zero-shot guidance. It also addresses model security through the study of prompt hacking, vulnerability analysis, and privacy auditing to prevent sensitive data leaks. The scope extends to th
TaskMatrix is a visual language model orchestration framework and modular visual pipeline designed to coordinate disparate foundation models. It functions as a multi-model workflow coordinator that sequences visual and textual models through logic paths to handle image processing tasks without requiring additional training. The system integrates large language models with visual foundation models to enable the exchange of image data during interactive chat sessions. It utilizes template-based orchestration to chain specialized models together for complex visual tasks. The framework supports
The Gemini Cookbook is a comprehensive collection of implementation patterns, code samples, and development guides designed for building applications with Google Gemini models. It serves as a central resource for developers to integrate multimodal generative artificial intelligence into their software, providing the necessary frameworks to manage model interactions, stateful workflows, and structured data extraction. The repository distinguishes itself by offering specialized toolkits for autonomous agent orchestration, enabling the construction of agents that can execute code, browse the web
This project is a Go library that provides a programmatic interface for interacting with generative AI services. It serves as a comprehensive software development kit for integrating large language models into applications, enabling developers to perform tasks such as text and chat completion, image generation, and audio transcription. The library distinguishes itself through a unified infrastructure designed for robust network communication and service management. It features structured request mapping and error normalization to ensure type-safe interactions and simplified debugging. Further
Vercel is a cloud platform for building, deploying, and scaling web applications. It provides a unified infrastructure that automates the build process by detecting project frameworks and distributing static and dynamic content through a global content delivery network. The platform executes application logic using serverless functions that scale automatically based on real-time traffic demand. The platform distinguishes itself through a centralized AI gateway that proxies requests to multiple model providers, enabling standardized authentication, observability, and cost tracking. It supports
KoboldCPP is a local large language model inference engine and GGUF model runner designed to execute quantized models on personal hardware. It functions as a multimodal AI server and API gateway, providing OpenAI-compatible endpoints that allow third-party clients to interact with locally hosted models. The project distinguishes itself as an AI storytelling backend, featuring dedicated tools for long-form narrative management through persistent memory, world lore tracking, and character state management. It further extends its capabilities as a multimodal server capable of processing text, im
This project is a framework for running Stable Diffusion image generation models on Apple Silicon using Core ML hardware acceleration. It provides a local generative AI pipeline for producing images from text prompts using Swift and Python without relying on external cloud APIs. The system includes a model converter to transform deep learning checkpoints into Core ML formats and a model optimizer to quantize weights and activations. It features a ControlNet integration layer to guide image generation using external signals such as edge and depth maps. Capabilities cover text-to-image generat
Kolors is a generative model implementation for synthesizing photorealistic images from natural language descriptions and visual references. It utilizes a latent diffusion model framework to produce high-fidelity imagery, operating within a compressed latent space to improve generation efficiency and quality. The system functions as a multilingual image generator, interpreting text prompts in multiple languages to produce semantically accurate visual outputs. It includes a custom model training pipeline that uses low-rank adaptation to teach the model specific subjects or artistic styles from
Sana is a framework for high-resolution image and video synthesis based on a linear diffusion transformer. It provides a toolkit for the training, fine-tuning, and execution of text-to-image and text-to-video models, as well as a video generative world model capable of simulating physical environments with precise spatial control. The project is distinguished by its use of linear complexity layers to handle high resolutions and its support for long-form, minute-length video generation in real time. It implements a two-stage inference paradigm that separates structural generation from visual t
dalle-mini is a text-to-image model and generative AI system designed to transform natural language descriptions into synthetic images. It functions as an image generation training toolkit and a generative model capable of creating visual representations from text prompts. The project provides a containerized deployment for consistent execution across different computing environments. It includes the necessary scripts and configuration files to train custom generative models from datasets. The system utilizes an autoregressive transformer architecture that treats visual data as discrete toke
This repository is a collection of node-based pipeline configurations, examples, and templates for generating AI media. It provides a workflow library and a curated gallery of blueprints designed for creating images, videos, and 3D assets using diffusion models. The project specifically offers a set of pre-configured node graphs for implementing advanced image generation and refinement techniques, with a focus on Stable Diffusion workflows. These examples demonstrate how to interconnect processing nodes to define complex generative logic without writing code. The available templates cover a
Diffusers is a PyTorch-based library and generative AI framework used to build, train, and deploy diffusion pipelines for producing multi-modal media. It provides a suite of tools for generating images, video, and audio from natural language descriptions, as well as specialized systems for text-to-image generation. The project differentiates itself through a modular architecture that separates noise schedulers, pretrained model blocks, and pipeline compositions. This structure allows for the construction of custom generation workflows and the ability to swap individual components of the diffu
IF is a text-to-image diffusion system that translates natural language descriptions into visual imagery. The project provides a generative pipeline for creating images, an inpainting tool for modifying specific image sections, and a super-resolution upscaler to increase pixel density and clarity. The system includes a concept fine-tuning framework that allows for the teaching of new visual concepts by updating a small set of parameters. It also supports image style transfer to apply the aesthetic characteristics of a reference image to a new output.
Deep-daze is a neural image steerable generator and text-to-image synthesis tool. It functions as an image-to-image interpretation engine and an image generator that transforms text prompts and image seeds into visual representations. The system supports long-form text visualization by bypassing standard token limits to process extended narratives or poems. It also provides image-guided prompting, allowing the network to be initialized with a starting image before applying text steering. The framework employs neural network optimization and iterative gradient descent to refine image quality.
Dream Textures is a Stable Diffusion integration for Blender that provides tools for text-to-image generation, depth projection, and node-based processing within a 3D environment. It functions as an AI texture generator capable of producing image textures and concept art from text prompts and scene renders. The system features a depth-to-image projection tool that maps generated imagery onto 3D models using depth data for spatial alignment. It also includes a node-based AI image processor for creating procedural visual effects and a dedicated toolset for AI-assisted inpainting and outpainting