30 open-source projects similar to xuebinqin/u-2-net, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best U 2 Net alternative.
Rembg is a machine learning-based toolkit designed for automated image background removal and subject segmentation. It functions as a versatile engine that identifies and extracts subjects from images, supporting diverse input methods including individual files, directory-based batch processing, and live binary data streams. The project distinguishes itself through its flexible integration options, offering a command-line interface for local automation, a library for programmatic access, and an HTTP service for remote requests. It utilizes deep learning architectures to classify pixels and ge
SegFormer is a semantic segmentation framework and transformer-based model designed for pixel-level image classification. It provides a deep learning architecture that assigns class labels to pixels using a hierarchical transformer encoder and a multi-layer perceptron decoder. The framework utilizes a hierarchical transformer encoder to process multi-scale features through a pyramid of blocks and an all-MLP decoder to aggregate these features without complex attention mechanisms. It incorporates overlap patch embedding to preserve local continuity and sequential self-attention reduction to ma
This is an image segmentation framework and masking toolkit for constructing binary and multi-class neural network architectures. It serves as a deep learning encoder wrapper that integrates pre-trained convolutional neural network architectures into semantic segmentation models. The library enables the use of pre-trained backbones to isolate complex patterns and leverages transfer learning to accelerate training. It provides a collection of overlap-based loss functions and precision metrics specifically designed to evaluate and refine the accuracy of image masks. The toolkit covers the full
RobustVideoMatting is a deep learning video matting tool and PyTorch library designed to remove backgrounds from videos and extract human subjects. It utilizes a temporal video segmentation model to ensure consistent matting and reduce flickering across video frames. The project includes a cross-platform model exporter that converts trained neural networks into various runtime formats. This allows for model deployment across multiple environments, including web and mobile applications. The framework provides capabilities for temporal video background removal and AI video post-production with
This project is a PyTorch-based computer vision library and deep learning image processing framework. It provides a collection of neural network architectures designed for visual analysis tasks, specifically focusing on image classification, object detection, and semantic segmentation. The toolset implements diverse methodologies for visual recognition, including anchor-free object detection, regional proposal networks, and heatmap-based keypoint estimation. It utilizes both convolutional neural networks for spatial feature extraction and transformer-based self-attention mechanisms to compute
BiRefNet is a PyTorch image segmentation framework designed for high-precision binary mask generation. It functions as a bilateral image segmentation model used to isolate foreground objects from complex backgrounds, as well as a specialized tool for camouflaged object detection and industrial defect detection. The project is designed for export to the ONNX format, which facilitates cross-platform deployment and inference. It supports custom model fine-tuning on user-provided image and mask datasets to adapt the model for specialized professional use cases. The system covers high-resolution
nnU-Net is a PyTorch-based deep learning framework for the supervised semantic segmentation of 2D and 3D biomedical images. It functions as an automated medical imaging pipeline that generates predicted masks and labels from clinical images. The system distinguishes itself by using dataset-driven auto-configuration to automatically select the optimal network architecture, preprocessing steps, and training hyperparameters based on the specific properties of the input medical dataset. The framework covers a broad range of capabilities including medical dataset preparation, intensity normalizat
This project is a plugin for OBS Studio that uses neural networks to isolate subjects from backgrounds in real-time video streams. It functions as an AI video segmentation tool that predicts portrait masks to create virtual green-screen effects without the need for physical hardware. The software includes a real-time depth estimation filter that identifies scene depth to produce a blurred background while keeping the foreground subject in focus. It also provides low-light video enhancement to improve visibility and visual quality for portrait video captured in poorly lit environments. The pl
Pytorch-UNet is a deep learning implementation designed for semantic image segmentation. It provides a framework for training convolutional neural networks to perform pixel-wise classification, transforming input images into detailed prediction masks. The project utilizes a symmetric encoder-decoder architecture that employs skip-connection feature fusion to recover fine-grained boundary details. It includes support for mixed-precision training to reduce memory usage and accelerate processing speeds. The framework covers the end-to-end segmentation pipeline, from model training using custom
This project is a deep learning research toolkit and generative model library providing implementations of Variational Autoencoders using the PyTorch framework. It serves as a framework for training and evaluating autoencoder architectures to learn latent representations for data reconstruction and the generation of synthetic data samples. The toolkit focuses on unsupervised feature learning and generative model training, featuring a system for mapping external configuration files to model hyperparameters to ensure reproducible experimental runs. It includes mechanisms for tracking training p
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
This software is a computer vision utility designed for automated subject isolation and background removal. It provides a graphical desktop interface that allows users to extract foreground subjects from static images, video files, and live webcam streams without requiring command-line interaction. The application leverages deep learning models to generate high-fidelity alpha masks, enabling the creation of transparent backgrounds or the application of custom replacements. By utilizing hardware-accelerated tensor processing, the system performs real-time segmentation on live camera feeds and
BackgroundMattingV2 is a deep learning background matting tool and real-time image segmentation framework. It provides a system for isolating foreground subjects from high-resolution images and video feeds in real time. The project includes a deep learning model trainer for optimizing matting models through base convergence and end-to-end refinement. It also functions as a cross-runtime model exporter, converting trained neural networks into interchangeable formats for deployment across different software environments and hardware runtimes. The framework supports streaming processed webcam f
This project is a PyTorch implementation of a research architecture designed for high-resolution representation learning. It serves as a computer vision framework focused on precise keypoint detection, human pose estimation, and semantic image segmentation. The implementation provides specialized tools for identifying anatomical landmarks on the human body and predicting facial keypoint coordinates to analyze orientation and alignment. It utilizes a system of multi-resolution parallel streams and repeated multi-scale fusion to maintain high-resolution representations throughout the network.
YOLOv9 is a real-time computer vision framework and deep learning model designed for image classification, object detection, and instance segmentation. It functions as both a vision model and a trainer, allowing for the optimization of neural network weights on custom datasets using single or multiple GPUs. The framework utilizes programmable gradient information to perform high-speed identification and location of multiple objects within images and video streams. It extends beyond bounding box detection to provide instance segmentation and panoptic segmentation, which labels every pixel in a
InternVL is a vision-language model framework that fuses a visual encoder with a large language model to translate image features into textual tokens for reasoning. It provides a system for multimodal inference and dialogue, enabling the processing of images and text to answer questions or generate descriptions. The project is distinguished by its high-resolution image processing, which uses dynamic tiling to maintain detail for images up to 4K resolution, and its chain-of-thought visual reasoning for solving complex mathematical and spatial problems. It also supports temporal frame sampling
ar-cutpaste is an augmented reality asset extraction tool and prototype designed to isolate objects from a live camera feed and transfer them into image editing software. It functions as a mobile-to-desktop bridge that uses machine learning to remove backgrounds from live images, creating digital cutouts for use in image composition. The system establishes a local server connection to transmit image data and spatial coordinates from a mobile device to a design application. This bridge uses a remote socket mechanism and a secure password to inject captured assets directly into a desktop worksp
HivisionIDPhotos is an AI-powered identification photo generator designed to automate the creation of standardized portraits. It utilizes machine learning to handle alignment, cropping, and background removal, transforming regular images into official identification photographs. The system features a background removal tool that uses offline inference to isolate subjects and a portrait enhancement tool that applies beauty filters to improve facial appearance and skin quality. To prepare photos for physical use, it includes a print layout generator that arranges processed images into standard
This is a PyTorch deep learning implementation for training transformer-based language models. It functions as a distributed GPU trainer and framework designed to optimize text prediction models for increased speed and sample efficiency. The project is distinguished by its use of the Newton-Schulz weight optimizer. This method applies an iterative process to maintain semi-orthogonal parameter updates and weight matrices, which improves sample efficiency and reduces memory overhead during the training process. The framework covers broad capabilities in distributed GPU computing, including dat
IOPaint is an AI image editor and Stable Diffusion inpainting tool providing a web interface for removing objects and replacing image content. It utilizes latent diffusion image processing to synthesize high-resolution replacements for erased sections of an image. The project features a specialized AI background remover for isolating subjects and an AI image upscaler that employs super-resolution models for general photos and anime artwork. The software covers a broad range of capabilities including image segmentation for object isolation, face restoration for improving facial details, and t
This is a PyTorch semantic segmentation library designed for building image masking frameworks. It provides a collection of over 500 pretrained convolutional and transformer-based encoders and various decoder architectures to perform binary and multiclass pixel-level classification. The library features a modular backbone integration that decouples encoder choice from decoder logic. It supports custom input channel configurations and encoder depth tuning, allowing the modification of input layers to accept non-standard channel counts while preserving pretrained weights. Some configurations al
Perfect Green Screen Keys
MMSegmentation is an open-source semantic segmentation toolbox built on PyTorch that provides a modular, configurable framework for building, training, evaluating, and deploying segmentation models. At its core, it offers a config-driven pipeline that assembles training, evaluation, and inference workflows by parsing hierarchical configuration files, with a modular component registry that enables plug-and-play composition of neural network modules, optimizers, datasets, and metrics. The framework supports the full model lifecycle through a unified runner interface that controls training, testi
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
Swin-Transformer is a deep learning framework designed for training and deploying hierarchical vision transformer models. It serves as a research library and toolkit for computer vision tasks, providing the infrastructure to build models that replace standard convolution operations with sliding window self-attention mechanisms. By utilizing a multi-scale feature hierarchy, the framework enables the processing of visual data at varying resolutions and spatial scales. The project distinguishes itself through its implementation of shifted window partitioning, which facilitates global information
MiDaS is a PyTorch computer vision library and monocular depth estimation model designed to predict scene depth from single images. It functions as a scene depth predictor that computes distance maps to determine object proximity to the camera. The project enables zero-shot depth transfer, allowing the model to be applied to new datasets or environments without additional training data. It focuses on relative depth regression to predict scale-invariant depth maps. The library includes a real-time depth visualizer for capturing live camera feeds and displaying corresponding depth maps. It als
This project is a computer vision system for object segmentation and tracking across images and videos. It employs models capable of identifying and masking objects using text prompts, bounding boxes, click points, or image exemplars. The system differentiates itself through memory-based video tracking and shared-memory architectures that maintain consistent object identities over time. It supports multi-object processing in single computation passes to increase frame throughput and utilizes iterative refinement to correct segmentation boundaries through sequential prompts. The software also
smartcrop.js is a JavaScript image processing tool and library designed for content-aware image cropping. It provides a face-aware cropping algorithm that calculates optimal crop coordinates to preserve the most important visual content within an image. The project prioritizes human faces to ensure people remain the central focus of the crop. It utilizes a content-aware approach to determine the best coordinates for a target width and height, allowing for dynamic resizing across different screen sizes and aspect ratios. The toolset includes a command line interface for automating the resizin
This project is a TensorFlow and Keras implementation of the Mask R-CNN architecture. It provides a framework for performing simultaneous object detection and instance segmentation, transforming raw images into segmented masks and bounding boxes for individual object identification. The toolset enables custom computer vision training through fine-tuning pre-trained weights and integrating user-provided datasets. It includes capabilities for distributed GPU training to accelerate the optimization of large vision models. The framework covers model evaluation using standard precision metrics an
Avatarify-python is a real-time face animation tool that uses a PyTorch-based neural network to map facial movements from a live camera feed onto a static image. It creates photorealistic animated avatars that mimic a user's movements for use in video software. The project includes a remote GPU inference client that offloads heavy computational workloads to a remote server, allowing high-performance animations to run on low-spec hardware. It also features a virtual webcam driver to route synthetic video streams into video conferencing applications as a standard camera device. The system prov