CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export.
The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports complex collaborative workflows by providing role-based access control, organizational workspace management, and consensus-based quality assurance tools that allow teams to merge diverse labeling opinions and resolve annotation conflicts.
Beyond manual and automated labeling, the system provides a comprehensive suite of administrative and integration capabilities. It includes support for cloud-native storage mounting, programmatic interaction via a RESTful API, and automated event notifications. The platform is built for scalability, utilizing a microservices architecture that can be deployed across containerized environments or Kubernetes clusters to handle large-scale data processing and distributed annotation tasks.