VoTT is a computer vision annotation software and machine learning dataset preparation tool. It is a desktop application designed for drawing bounding boxes and assigning tags to objects in images and videos to create training datasets for object detection models. The application utilizes a cross-platform desktop interface to manage image and video assets. It features a local-first storage integration to handle large media assets directly from the host machine's file system and includes frame-rate controlled video sampling to extract specific images from video streams for labeling. The softw
labelImg is a desktop image annotation tool and dataset preparation utility used to create labeled datasets for computer vision training. It provides a graphical interface for drawing bounding boxes around objects in images and assigning them class labels to build ground truth data for machine learning models. The software specifically supports the Pascal VOC XML annotation format, exporting image coordinates and class names into standard XML or text structures. It allows users to load predefined class lists from text files to standardize naming across an entire project. Beyond initial label
labelImg is a computer vision labeling tool and image bounding box annotator used to create training datasets for machine learning models. It functions as a desktop utility for drawing rectangular labels on images and saving object coordinates and class names in common machine learning formats. The tool is specifically designed to generate and edit PascalVOC formatted XML files and create image labels in the text-based format required by YOLO object detection pipelines. The software covers object detection annotation and training data preparation, including the ability to manage label catego
This project is a computer vision dataset and image annotation repository designed for training and evaluating machine learning models. It provides a large collection of labeled images, serving as an object detection benchmark and a source of pixel-level segmentation data. The repository distinguishes itself as a multimodal visual dataset by pairing images with synchronized voice, text, and mouse traces to support narrative understanding. It further enables the analysis of model fairness through the inclusion of demographic attributes and exhaustive annotations. The dataset covers a broad ra