30 open-source projects similar to puzzledqs/bbox-label-tool, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best BBox Label Tool alternative.
VoTT is a computer vision annotation software and machine learning dataset preparation tool. It is a desktop application designed for drawing bounding boxes and assigning tags to objects in images and videos to create training datasets for object detection models. The application utilizes a cross-platform desktop interface to manage image and video assets. It features a local-first storage integration to handle large media assets directly from the host machine's file system and includes frame-rate controlled video sampling to extract specific images from video streams for labeling. The softw
labelImg is a desktop image annotation tool and dataset preparation utility used to create labeled datasets for computer vision training. It provides a graphical interface for drawing bounding boxes around objects in images and assigning them class labels to build ground truth data for machine learning models. The software specifically supports the Pascal VOC XML annotation format, exporting image coordinates and class names into standard XML or text structures. It allows users to load predefined class lists from text files to standardize naming across an entire project. Beyond initial label
labelImg is a computer vision labeling tool and image bounding box annotator used to create training datasets for machine learning models. It functions as a desktop utility for drawing rectangular labels on images and saving object coordinates and class names in common machine learning formats. The tool is specifically designed to generate and edit PascalVOC formatted XML files and create image labels in the text-based format required by YOLO object detection pipelines. The software covers object detection annotation and training data preparation, including the ability to manage label catego
This project is a computer vision dataset and image annotation repository designed for training and evaluating machine learning models. It provides a large collection of labeled images, serving as an object detection benchmark and a source of pixel-level segmentation data. The repository distinguishes itself as a multimodal visual dataset by pairing images with synchronized voice, text, and mouse traces to support narrative understanding. It further enables the analysis of model fairness through the inclusion of demographic attributes and exhaustive annotations. The dataset covers a broad ra
Cloud Annotations is a web-based platform designed for collaborative image annotation and the preparation of computer vision datasets. It provides an interface for teams to draw bounding boxes and polygons over digital media, transforming raw images into structured training data for machine learning models. The platform distinguishes itself through a real-time synchronization engine that allows multiple users to edit the same image simultaneously. By utilizing browser-based local storage and standardized data serialization, it supports offline workflows and ensures that exported annotations r
CVAT is an open-source computer vision annotation tool and visual dataset management platform. It provides a self-hosted interface for labeling images, videos, and 3D data to create datasets for vision AI models. The platform features AI-assisted data labeling to automate the creation of masks and bounding boxes, utilizing a plug-in system to connect external machine learning models. It includes a consensus-based quality assurance system that verifies label accuracy by comparing independent annotations. The system covers collaborative team management, project organization through task decomp
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
X-AnyLabeling is an AI-assisted annotation platform and computer vision labeling tool. It provides an interface for annotating images and videos using polygons and rectangles to create training sets for machine learning models. The project distinguishes itself through the integration of external AI models via a plugin-based inference backend, allowing for automated generation of candidate labels and the execution of specialized tasks like pose estimation and object detection. It also functions as an optical character recognition tool for extracting text and layout information from document im
This project is a computer vision training pipeline and image classification framework. It provides a workflow for preparing custom image datasets and fine-tuning pre-trained neural networks to recognize user-defined categories. The system includes a model interpretability toolkit that generates saliency maps to highlight influential image regions and uses dimensionality reduction to project high-dimensional semantic features into 2D or 3D visualizations. The framework covers the full lifecycle of model development, including dataset preparation with proportional class splitting, performance
mmaction2 is a PyTorch video understanding toolbox designed for training and evaluating deep learning models. It serves as a framework for action recognition, temporal localization, and spatio-temporal action detection, providing specialized tools for both pixel-based video analysis and skeleton-based action recognition. The project distinguishes itself through a modular architecture featuring registry-based component discovery and hierarchical, config-driven model assembly. It supports multi-modal feature fusion, integrating RGB frames, optical flow, and audio, and includes capabilities for
This project is a neural network image classifier and a set of tools for building and training convolutional neural networks to recognize and categorize images. It serves as a machine learning educational guide, providing a practical resource for learning neural network fundamentals through an onboarding process. The system includes a dedicated workflow for pretrained model fine-tuning, allowing existing network weights to be adapted to new image categories. This is supported by a transfer learning pipeline that replaces final classification layers and adjusts weights through targeted retrain
img2dataset is a high-performance image dataset pipeline and preprocessing tool designed to download and process millions of images from URLs for machine learning training. It functions as a distributed image downloader and cloud storage data exporter, moving large visual datasets from web sources directly into structured formats. The system prioritizes high-throughput data acquisition by distributing workloads across multiple CPU cores and machines. It integrates directly with remote cloud storage buckets and employs a manifest-based tracking system to resume interrupted downloads without re
Deepchecks is a machine learning model validation framework and MLOps testing library. It serves as an AI data quality suite and performance evaluator designed to verify the integrity and performance of models and datasets from research through production. The project functions as a model monitoring tool for tracking data drift and performance degradation in production environments. It allows for the creation of custom validation suites and utilizes a pluggable check architecture to automate quality checks within continuous integration pipelines. The framework covers a broad range of capabil
Gluon-CV is an MXNet computer vision library that provides a comprehensive collection of pre-implemented vision architectures and training pipelines. It serves as a deep learning research toolkit and a model zoo containing state-of-the-art pre-trained weights for image and video analysis. The project includes a specialized human pose estimation library and a model compression toolkit. These tools allow for the pruning and quantization of deep learning models to increase inference speed and facilitate deployment on constrained edge hardware. The library covers a broad range of vision capabili
Nullboard is a local-first productivity tool and browser-based note organizer. It functions as a Kanban task board for tracking workflows and organizing tasks into draggable notes and lists within a minimalist interface. The project features a customizable Kanban UI that allows users to adjust visual preferences, including themes, font families, font sizes, and layout widths. The system manages data through local browser storage for offline access, supplemented by board file imports and exports and token-based remote synchronization for backups. It includes a revision-based undo system for
This project is an AI-powered visual canvas and collaborative whiteboard framework. It functions as a customizable vector drawing engine and a tool for converting hand-drawn interface sketches and wireframes into functional code using artificial intelligence. The system distinguishes itself through the integration of AI agents that can read, modify, and generate visual diagrams directly on the canvas. It also provides a node-based workflow editor for building automation pipelines and data processing flows by connecting multimodal components. The platform covers a broad range of capabilities,
LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models. The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models. The system covers data pipel
OpenSeadragon is a JavaScript library and tiled image rendering engine designed for high-resolution image viewing. It functions as a deep zoom image viewer that renders massive images using a tiled pyramid approach, enabling smooth panning and zooming without requiring the full image file to be loaded. The project distinguishes itself through broad support for standardized image retrieval protocols, including the International Image Interoperability Framework (IIIF), IIPImage, Iris, and OpenStreetMap. It provides a hardware-accelerated rendering layer via WebGL to apply real-time filters and
mmocr is a PyTorch-based optical character recognition framework designed for training and deploying text detection, recognition, and key information extraction models. It serves as a comprehensive toolbox for scene text detection and recognition, providing specialized libraries for locating text regions and converting visual text into machine-encoded strings. The project distinguishes itself through a research framework for key information extraction and advanced text spotting capabilities. These include point-based spotting using transformers and the use of parameterized Bezier curves to id
EpicEditor is a JavaScript-based Markdown editor designed as an embeddable UI component for web applications. It functions as a local-first content manager, utilizing browser local storage for automatic draft saving and offline content persistence. The editor is distinguished by its use of iframes to isolate styles, preventing CSS leakage between the editor and the host application. It features a customizable Markdown parser that allows developers to replace the default parsing engine with custom functions to transform text into specific output formats. The system provides a split-pane inter
This is a PyTorch object detection framework that implements the Single Shot MultiBox Detector for identifying and localizing multiple objects within images and video. The project provides a neural network architecture designed for single-shot object detection, which predicts bounding boxes and class labels in one pass. The implementation includes a real-time object detector capable of processing live video streams to track and label objects across sequential frames. It also features a complete computer vision training pipeline for preparing image datasets and training model weights. The fra
vue-fabric-editor is a web-based graphic design tool and vector graphics editor built with Vue and Fabric.js. It provides a canvas manipulation framework for creating visual compositions, layouts, and custom vector illustrations. The project features a template-based design system for creating reusable layouts and a bulk image generator that automatically produces multiple image files using data tables or external network interfaces. It also includes a modular plugin system for extending tools and keyboard shortcuts. The editor covers a broad range of capabilities including digital asset man
kohya_ss is a graphical user interface and workbench for fine-tuning diffusion models, specifically designed for Stable Diffusion. It provides a suite of tools for training generative AI models, including specialized interfaces for creating Low-Rank Adaptation weights and training ControlNet spatial control networks. The project distinguishes itself through integrated VRAM usage optimization and hardware acceleration, featuring specific support for Intel GPUs via XPU-accelerated libraries. It implements parameter-efficient training methods and memory-saving techniques like gradient checkpoint
This project is a PyTorch implementation of 3D residual networks designed for video action recognition. It provides a spatiotemporal architecture that analyzes both spatial frames and temporal motion to classify human activities within video clips. The system includes a distributed model training framework to accelerate learning across multiple compute nodes. It supports the deployment and fine-tuning of pre-trained model weights, allowing the adaptation of existing networks to specific new datasets. The codebase covers the full pipeline for spatiotemporal learning, including video dataset p
This project is a deep learning library built for single-image super-resolution and visual enhancement. It provides a framework for training and deploying neural network architectures designed to reconstruct high-resolution images from low-resolution sources, effectively recovering fine details and removing artifacts caused by downscaling or compression. The library distinguishes itself through the implementation of generative adversarial networks and residual block architectures, which work together to improve the realism and clarity of upscaled outputs. It supports training through both pix
Superdesign is an AI-powered design platform that generates UI mockups, wireframes, and multi-page user flows from natural language prompts within a collaborative canvas environment. It functions as a design-to-code exporter, producing production-ready HTML, ZIP archives, or Shopify Liquid templates for direct implementation, and includes an OpenAPI specification importer that automatically generates API documentation and client code from schema definitions. The platform distinguishes itself through a branch-based design exploration system that creates independent design variations from a sin
This project is a web-based graphic design editor and online poster designer. It provides a browser-based environment for creating professional visual layouts, e-commerce graphics, and social media covers using a canvas with drag-and-drop elements. The toolkit includes a specialized PSD template converter that parses Photoshop design files into editable web templates. It also features a custom QR code generator capable of producing styled codes with gradients and embedded logos, alongside a browser-based image manipulation tool for cropping assets and removing backgrounds. The editor covers
An open source online platform for collaborative image labeling
FIAT enables image data annotation, data augmentation, data extraction, and result visualisation/validation.