30 open-source projects similar to lllyasviel/omost, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Omost alternative.
ComfyUI is a modular generative AI workflow orchestrator and node-based GUI for designing and executing complex diffusion model pipelines. It functions as both a visual interface for building generative logic graphs and a programmable backend API that exposes diffusion model operations for external integration. The system distinguishes itself through a graph-based execution model that supports differential workflow execution, re-running only modified nodes to reduce computation. It features dynamic model offloading to manage memory between system RAM and GPU VRAM and utilizes metadata-embedde
This project is a plugin for Krita that integrates Stable Diffusion image generation and editing tools directly into the painting interface. It functions as a remote diffusion backend client, bridging the digital canvas to local or remote servers to handle the computation required for AI image generation. The system distinguishes itself through a real-time painting interface that translates brushstrokes into generated imagery as the artist works. It acts as a structural orchestrator, using sketches, depth maps, and poses to maintain precise composition, and provides a generative inpainting to
Flux is a diffusion model inference engine designed for text-to-image generation and image-to-image manipulation. It provides a system for executing open-weight models to transform natural language descriptions into visual imagery or to modify existing images. The project distinguishes itself through a flow-matching framework for image generation and a structural image controller. This controller allows for guided synthesis by using depth maps and Canny edge detection to constrain the geometry and composition of the output. The toolkit covers a broad range of image editing capabilities, incl
Stable Diffusion Web UI is a browser-based interface for generating, editing, and upscaling images and videos using latent diffusion models. It functions as a text-to-image generator, an AI image editor, and a tool for increasing image resolution and clarity. The system includes capabilities for custom model training, specifically allowing the creation of textual inversion embeddings to teach a model new concepts and visual styles from user photos. It also provides tools for AI video production, generating short clips from text prompts. The software covers image-to-image transformation, imag
IP-Adapter is a framework for conditioning pretrained text-to-image diffusion models to use image prompts as visual guides. It serves as a text-to-image model extension that transforms a text-based diffusion model to accept and process image inputs as primary generation sources. The system implements identity preservation to maintain consistent facial features across multiple outputs using a reference photo. It also enables style transfer workflows to produce image variations that preserve the artistic characteristics of a source image. Capabilities cover multi-modal prompting, including the
ComfyUIIPAdapterplus is a node-based extension for ComfyUI that implements IPAdapter models to guide image generation using reference images. It functions as an image prompting tool and a Stable Diffusion image adapter, allowing reference files to serve as visual prompts for controlling style, composition, and subject identity. The project provides specialized capabilities for maintaining facial identity and high-fidelity features across generated portraits. It enables the transfer of visual characteristics and artistic styles from reference images, as well as the extraction of spatial layo
This project is an extension for Stable Diffusion that provides an image-to-image control framework. It serves as a multi-control constraint manager and structural data preprocessor, allowing users to guide the layout and composition of generated images through spatial maps and structural constraints. The system enables multi-constraint image generation by combining several different control inputs to enforce multiple stylistic or spatial rules within a single generation pass. It provides tools for visual image referencing and precise geometric or anatomical templating to ensure generated ima
Jaaz is a self-hosted AI design suite and multimodal workspace used for generating and editing images and videos. It functions as a design workspace where users can produce visual content and assets through a combination of local and cloud-based AI models. The project features a hybrid model orchestrator that routes requests between local model runners and remote APIs to balance data privacy with processing performance. It utilizes an infinite canvas collaborative tool for organizing storyboards and assets, and includes an image prompt optimizer to translate rough ideas into detailed generati
PhotoMaker is a diffusion-based identity generator designed for person-specific image synthesis. It creates high-fidelity photos and avatars of specific individuals using stacked embeddings, which allows for the generation of consistent human identities without the need for custom model training or fine-tuning. The system utilizes zero-shot identity synthesis and identity adapters to maintain recognizable facial features across various visual contexts. It supports artistic style transfer by combining identity information with specialized model weights and integrates external control framework
This project is a neural network extension for Stable Diffusion that provides spatial control and geometric consistency for text-to-image generation. It functions as an image structure controller and conditioning tool, enabling the use of external inputs to guide the layout and geometry of generated imagery. The framework is distinguished by its ability to transform input images into structural guides through various preprocessors. These include the extraction of depth maps, normal maps, and human pose landmarks, as well as the detection of Canny edges, anime lineart, and straight architectur
imaginAIry is a system for generating and refining images and videos using diffusion models. It operates as a web-based server that triggers generation requests through standard API calls, allowing for the creation of visuals and video sequences from text prompts or existing files. The project provides a suite for AI image editing and upscaling, enabling the modification of visuals through natural language instructions and super-resolution tools to increase detail and image size. The system includes capabilities for structural image control using depth maps, edge maps, and body poses to main
Qwen-Image is a text-to-image model and large language model image generation framework. It functions as an AI image editing suite and a personalized image trainer, capable of producing high-fidelity visuals and accurate typography from natural language descriptions. The system is distinguished by its precision text rendering engine, which integrates multi-script calligraphy and layout-coherent alphabetic text into images. It provides specialized capabilities for subject identity preservation and consistent subject generation across different poses and viewpoints, alongside a training pipelin
This project is a comprehensive guide and framework for large language model prompt engineering. It provides a collection of techniques and patterns for optimizing model responses through structured system prompts, context management, and a variety of implementation patterns. The project focuses on several specialized domains, including the creation of autonomous agents through reasoning loops and the implementation of retrieval augmented generation to inject semantic context into prompts. It also provides methods for enforcing structured outputs in serialization formats like JSON or YAML for
Z-Image is an AI image editing engine and generation framework designed for photorealistic synthesis and the refinement of diffusion models. It functions as a multilingual text-to-image renderer and a system for training custom foundation models to generate and edit visuals using natural language instructions. The project distinguishes itself through a reasoning-based prompt enhancer that expands simple descriptions into detailed visual instructions using a structured reasoning chain. It also features specialized capabilities for rendering high-quality Chinese and English typography within ge
This project is a generative art engine designed to create large collections of unique images by layering assets with assigned rarity weights and blending modes. It functions as an art generator that produces unique image sets and corresponding JSON metadata files for use in blockchain-based digital collections. The engine features a trait rarity manager that controls the frequency of specific visual attributes through filename-based weighting. It also includes a pixel art converter that transforms generated image collections into pixelated versions using configurable downsampling ratios. Th
This project is a plugin for Photoshop that integrates Stable Diffusion backends, allowing users to generate and edit AI images directly within the graphic design workspace. It serves as an interface bridge between the image editor and remote GPU workers to perform generative tasks without requiring local hardware power. The plugin specifically provides connection layers for Automatic1111 and ComfyUI backends. This enables the execution of text-to-image generation, inpainting, and outpainting operations on the design canvas by communicating with these external engines via an API. The system
This project provides methodologies and guides for structured prompt engineering, generative workflows, and specialized image generation strategies. It serves as a framework for optimizing inputs to large language models across coding, writing, and analysis tasks, as well as a library of techniques for controlling diffusion models. The project distinguishes itself through an AI-driven software design framework that converts business requirements into technical architectures and code using domain-driven prompting. It also implements generative AI workflow patterns that use sequential prompt pi
This project is an AI image upscaling and high-resolution generation tool. It uses tiled diffusion to create ultra-large images by processing them in smaller, overlapping regions to prevent memory crashes on limited hardware. The system manages spatial composition through regional prompting, which routes specific text prompts to designated areas of an image. It maintains visual stability and global coherence during the upscaling process using noise inversion and structural guidance. Additional capabilities include tiled detail upscaling and memory optimization for the variational autoencoder
IOPaint is an AI image editor and Stable Diffusion inpainting tool providing a web interface for removing objects and replacing image content. It utilizes latent diffusion image processing to synthesize high-resolution replacements for erased sections of an image. The project features a specialized AI background remover for isolating subjects and an AI image upscaler that employs super-resolution models for general photos and anime artwork. The software covers a broad range of capabilities including image segmentation for object isolation, face restoration for improving facial details, and t
Lama Cleaner is an AI-powered image editing application focused on inpainting, object removal, and generative filling. It provides a suite of tools for erasing unwanted elements from photos and filling the resulting gaps using generative artificial intelligence. The project includes specialized capabilities for image outpainting to extend borders, background removal through object segmentation, and face restoration to fix visual defects. It also features an image upscaler to increase resolution and clarity via super-resolution AI, as well as a Stable Diffusion-based editor for replacing speci
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
This project is a diffusion-based AI art generator and animation framework used to create digital images and motion graphics from text prompts. It functions as a system for producing stylized videos and AI art through iterative diffusion sampling and neural network models. The framework distinguishes itself through specialized tools for 3D depth animation, using depth-map transformations to create spatial movement. It also includes neural style transfer capabilities to apply specific artistic looks, such as watercolor or pixel art, and utilizes optical flow frame blending to reduce flickering
Visual-ChatGPT is a visual orchestration framework and multimodal AI pipeline designed to coordinate large language models with visual foundation models. It functions as an integration layer that enables the exchange of text and images between different AI models to automate image analysis and editing tasks without requiring additional model training. The system differentiates itself through model-chain orchestration and prompt-based task dispatching, allowing natural language instructions to trigger specific vision models or tools. It utilizes coordinate-based region mapping and iterative ma
DiffusionBee is a Stable Diffusion desktop client for macOS that functions as an AI image generator and editor. It allows for the local generation of images from text prompts and the management of diffusion models without requiring external cloud services or technical setup. The application includes a local diffusion model manager for importing and switching between custom trained model files to achieve specific artistic styles. It also features a system for tracking generation history and uploading assets to a public gallery. The software covers several image synthesis and manipulation work
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
This project is a comprehensive guide and framework for designing, optimizing, and securing inputs to improve the accuracy and reasoning of large language model outputs. It provides core methodologies for implementing logical reasoning steps, example-based learning, and reusable template systems. The framework distinguishes itself through a focus on security guardrails and ethical auditing, implementing primitives to prevent adversarial prompt injection attacks and identify biases. It also emphasizes structured generation, using persona assignment and negative constraints to control the tone,
This project is a containerized deployment for running Stable Diffusion web interfaces. It provides a portable runtime for generative AI that manages dependencies and hardware acceleration to enable text-to-image generation and image-to-image transformations via a browser-based interface. The system uses hardware-specific image tags to support both GPU-accelerated synthesis and CPU-only execution. It ensures environment isolation across different operating systems while utilizing bind-mount data persistence to keep heavy model weights and generated outputs on the host machine. The deployment
Taming Transformers is a generative system for high-resolution image synthesis that combines a vector-quantized GAN image encoder with an autoregressive transformer. It utilizes a discrete latent space to represent images as codebook tokens, enabling the production of high-fidelity visuals through a hybrid architecture. The project provides specialized capabilities for layout-based scene synthesis, allowing for the creation of complex images by placing objects according to defined bounding box coordinates. It also includes tools for image inpainting to fill missing sections of an image by ana
Facechain is a generative AI toolchain and portrait generator designed to create personalized synthetic identities and consistent digital portraits. It provides a pipeline for training and refining diffusion models to produce subject-driven image synthesis from reference photos. The project focuses on digital twin generation, enabling the creation of a personalized model from a single image to maintain identity consistency across various poses and artistic styles. It utilizes identity fusion and similarity sorting to balance facial accuracy with stylized visual effects. The toolkit covers a