The visitor is looking for software tools designed to label, annotate, and manage datasets for machine learning and computer vision model training.

opencv/cvat is the closest match — CVAT is a comprehensive, self-hostable platform that provides robust tools for image and video annotation, team collaboration, and AI-assisted labeling, making it a flagship solution for managing machine learning datasets.. Other strong matches: heartexlabs/label-studio, humansignal/label-studio, wkentaro/labelme, cvat-ai/cvat.

Why does opencv/cvat match “a tool for data labeling and annotation”?

CVAT is a comprehensive, self-hostable platform that provides robust tools for image and video annotation, team collaboration, and AI-assisted labeling, making it a flagship solution for managing machine learning datasets.

Why does heartexlabs/label-studio match “a tool for data labeling and annotation”?

Label Studio is a comprehensive, self-hostable data annotation platform that supports image and video labeling, team collaboration, automated pre-labeling, and active learning workflows, making it a complete solution for your machine learning dataset management needs.

Why does humansignal/label-studio match “a tool for data labeling and annotation”?

Label Studio is a comprehensive, self-hostable data annotation platform that supports multi-modal labeling, team collaboration, automated pre-labeling, and active learning workflows, making it a flagship solution for this category.

Why does wkentaro/labelme match “a tool for data labeling and annotation”?

Labelme is a desktop-based image and video annotation tool that provides robust support for semantic segmentation and format exporting, though it lacks the built-in team collaboration and server-side management features found in enterprise-grade platforms.

Why does cvat-ai/cvat match “a tool for data labeling and annotation”?

CVAT is a comprehensive, self-hostable platform that provides robust tools for image and video annotation, team collaboration, automated labeling via AI integration, and flexible data export, making it a flagship solution for machine learning dataset management.

Data Labeling and Annotation Tools

Open-source software for labeling, annotating, and preparing datasets to train machine learning and computer vision models.

Find the best repos with AI.We'll search the best matching repositories with AI.

opencv/cvat
opencv/cvat
16,086View on GitHub
CVAT is an open-source computer vision annotation tool and visual dataset management platform. It provides a self-hosted interface for labeling images, videos, and 3D data to create datasets for vision AI models. The platform features AI-assisted data labeling to automate the creation of masks and bounding boxes, utilizing a plug-in system to connect external machine learning models. It includes a consensus-based quality assurance system that verifies label accuracy by comparing independent annotations. The system covers collaborative team management, project organization through task decomposition, and remote cloud storage integration. It also provides a REST API for programmatic workflow control and the import and export of data in industry-standard formats.
CVAT is a comprehensive, self-hostable platform that provides robust tools for image and video annotation, team collaboration, and AI-assisted labeling, making it a flagship solution for managing machine learning datasets.
PythonAI-Assisted LabelingAnnotation Project ManagementImage Annotation
View on GitHub16,086
heartexlabs/label-studio
heartexlabs/label-studio
27,626View on GitHub
Label Studio is a multi-type data labeling tool and data annotation workspace designed to prepare datasets for machine learning training. It functions as a cloud-integrated data pipeline that imports raw data from storage, manages the annotation process, and exports labels into standardized formats. The platform features a machine learning model integration framework that connects to external model servers. This enables model-assisted annotation and active learning, allowing the system to perform pre-labeling and refine predictions based on human feedback. The software provides project management tools for organizing datasets and assigning tasks to users via role-based access. It supports various data types and utilizes backend-agnostic storage adapters to connect with local filesystems or cloud storage providers. The application can be installed via manual setup or one-click deployments on cloud infrastructure.
Label Studio is a comprehensive, self-hostable data annotation platform that supports image and video labeling, team collaboration, automated pre-labeling, and active learning workflows, making it a complete solution for your machine learning dataset management needs.
TypeScriptAnnotation Project ManagementModel-Assisted Labelers
View on GitHub27,626
humansignal/label-studio
HumanSignal/label-studio
27,619View on GitHub
Label Studio is a multi-modal data annotation platform designed to create and manage high-quality training datasets for machine learning. It functions as a self-hosted, containerized environment that supports secure, private deployments, including air-gapped configurations. The platform provides a centralized workspace for labeling diverse media types, such as images, text, audio, and time-series data, to support supervised and reinforcement learning workflows. The platform distinguishes itself through deep integration with machine learning backends, enabling active learning loops, automated pre-labeling, and real-time model-assisted annotation. It features a declarative interface configuration system that uses markup to define custom labeling tools, alongside plugin-based extensibility that allows for the injection of custom logic. To support enterprise-scale operations, it includes granular role-based access control, collaborative feedback tools, and automated task distribution management. The system covers a broad capability surface, including automated data ingestion from cloud storage, programmatic pipeline management via REST APIs, and comprehensive data export options. It also provides built-in observability tools to monitor annotator performance, inter-annotator agreement, and model quality. The application is packaged as a portable, container-ready microservice designed for deployment in scalable, cloud-native environments.
Label Studio is a comprehensive, self-hostable data annotation platform that supports multi-modal labeling, team collaboration, automated pre-labeling, and active learning workflows, making it a flagship solution for this category.
TypeScriptAnnotation Project ManagementModel-Assisted LabelersAutomated Visual Data Annotation
View on GitHub27,619
wkentaro/labelme
wkentaro/labelme
15,984View on GitHub
Labelme is a Python-based image annotation tool used to create computer vision datasets. It serves as a visual editor for semantic segmentation, allowing users to define object boundaries using polygons, rectangles, points, and circles. The application also functions as a multispectral image annotator, supporting high-bit depth TIFF files used in satellite and scientific imagery. The tool incorporates AI-assisted labeling capabilities to automate the creation of masks and polygons. These features allow for shape generation driven by text prompts or interactive point selections, which propose boundaries based on user-placed positive and negative points. The software covers a broad range of data management and annotation tasks, including the creation of dense pixel masks, rotated bounding boxes, and video frame sequencing. It includes a pipeline for translating internal JSON state persistence into standard dataset formats such as COCO and Pascal VOC. Additional capabilities include image-level classification flags, geometry refinement tools, and batch image importing.
Labelme is a desktop-based image and video annotation tool that provides robust support for semantic segmentation and format exporting, though it lacks the built-in team collaboration and server-side management features found in enterprise-grade platforms.
PythonAI-Assisted LabelingImage AnnotationImage Annotation Tools
View on GitHub15,984
cvat-ai/cvat
cvat-ai/cvat
15,317View on GitHub
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports complex collaborative workflows by providing role-based access control, organizational workspace management, and consensus-based quality assurance tools that allow teams to merge diverse labeling opinions and resolve annotation conflicts. Beyond manual and automated labeling, the system provides a comprehensive suite of administrative and integration capabilities. It includes support for cloud-native storage mounting, programmatic interaction via a RESTful API, and automated event notifications. The platform is built for scalability, utilizing a microservices architecture that can be deployed across containerized environments or Kubernetes clusters to handle large-scale data processing and distributed annotation tasks.
CVAT is a comprehensive, self-hostable platform that provides robust tools for image and video annotation, team collaboration, automated labeling via AI integration, and flexible data export, making it a flagship solution for machine learning dataset management.
PythonAnnotation Project ManagementAutomated Annotation ToolsModel-Assisted Labelers
View on GitHub15,317
cvhub520/x-anylabeling
CVHub520/X-AnyLabeling
8,193View on GitHub
X-AnyLabeling is an AI-assisted annotation platform and computer vision labeling tool. It provides an interface for annotating images and videos using polygons and rectangles to create training sets for machine learning models. The project distinguishes itself through the integration of external AI models via a plugin-based inference backend, allowing for automated generation of candidate labels and the execution of specialized tasks like pose estimation and object detection. It also functions as an optical character recognition tool for extracting text and layout information from document images. The platform includes capabilities for dataset format conversion, translating annotations between various industry-standard formats to ensure cross-platform compatibility. It further supports visual data annotation and textual analysis through specialized workflows.
This is a comprehensive computer vision annotation platform that supports image and video labeling, automated AI-assisted annotation, and dataset format conversion, though it lacks built-in team collaboration features.
PythonAI-Assisted LabelingVisual Annotation Tools
View on GitHub8,193
lightly-ai/lightly
lightly-ai/lightly
3,684View on GitHub
Lightly is a self-supervised learning framework and computer vision data curation tool designed to manage large image datasets and train models on unlabeled data. It functions as a PyTorch vision library and dataset management SDK, providing tools to convert raw images into high-dimensional vectors for similarity search, visualization, and feature extraction. The project implements a variety of self-supervised architectures, including MoCo, SimCLR, VICReg, Barlow Twins, and masked image modeling. It distinguishes itself by combining these learning frameworks with active learning capabilities, allowing users to identify high-value samples for manual annotation and reduce dataset bias. The platform covers a broad range of capabilities, including multi-view augmentation pipelines, distributed multi-GPU training, and embedding quality monitoring. It also provides utilities for cloud storage integration, label format conversion, and the fine-tuning of pre-trained backbones for downstream tasks. A command-line interface is provided to execute model training, generate embeddings, and synchronize data between local storage and remote platforms.
Lightly is a specialized computer vision data curation and active learning framework that helps you manage and select high-value samples for annotation, though it functions more as a dataset management SDK than a full-featured manual labeling interface.
PythonActive Learning Curation
View on GitHub3,684
doccano/doccano
doccano/doccano
10,674View on GitHub
Doccano is a collaborative data labeling platform and machine learning dataset management system. It provides a web-based interface for teams to import raw text, mark datasets, and export structured annotations for model training. The project specifically supports text annotation for classification and named entity recognition tasks. It enables teams to coordinate multiple users on a single project to maintain consistent labeling guidelines and increase the speed of dataset creation. The system includes tools for data management and team coordination, providing the ability to import raw datasets and export completed annotations into structured formats. A programmatic interface allows for the retrieval of results and the integration of external systems. The application can be installed on cloud infrastructure using automated deployment templates.
This is a collaborative data labeling platform that supports team management, data import/export, and self-hosting, though it is specialized for text rather than the image and video annotation requested.
PythonData Labeling InterfacesData Labeling PlatformsData Labeling Tools
View on GitHub10,674
conardli/easy-dataset
ConardLi/easy-dataset
13,394View on GitHub
Easy-dataset is a comprehensive platform designed for the end-to-end management of machine learning datasets, specifically tailored for language and vision model fine-tuning. It functions as a centralized environment for the entire data lifecycle, encompassing the automated generation of synthetic training data, the structural organization of document collections, and the systematic annotation of individual data points. The platform distinguishes itself through its integrated evaluation and orchestration capabilities. It provides a dedicated suite for benchmarking models, featuring blind side-by-side human testing and automated grading to ensure objective performance metrics. Users can orchestrate complex data pipelines that transform raw documents into structured formats through recursive segmentation, automated taxonomy classification, and customizable text refinement. Beyond core generation and management, the system supports a wide range of data processing tasks, including visual document extraction, content augmentation, and the creation of multi-turn conversational datasets. It offers flexible configuration for model connections and generation parameters, allowing for fine-grained control over output quality and consistency. The platform is designed for local deployment to maintain data privacy and security. It includes built-in tools for programmatic quality assessment and supports the export of processed datasets into standard formats compatible with various fine-tuning pipelines.
This platform provides a centralized environment for managing and annotating datasets with support for local deployment and automated workflows, making it a suitable tool for machine learning data preparation despite its primary focus on language and document-based tasks.
JavaScriptAI Model BenchmarkingModel Evaluation SuitesSynthetic Data Generation
View on GitHub13,394

Data Labeling and Annotation Tools

opencv/cvat

heartexlabs/label-studio

HumanSignal/label-studio

wkentaro/labelme

cvat-ai/cvat

CVHub520/X-AnyLabeling

lightly-ai/lightly

doccano/doccano

ConardLi/easy-dataset