Doccano is a collaborative data labeling platform and machine learning dataset management system. It provides a web-based interface for teams to import raw text, mark datasets, and export structured annotations for model training.
The project specifically supports text annotation for classification and named entity recognition tasks. It enables teams to coordinate multiple users on a single project to maintain consistent labeling guidelines and increase the speed of dataset creation.
The system includes tools for data management and team coordination, providing the ability to import raw datasets and export completed annotations into structured formats. A programmatic interface allows for the retrieval of results and the integration of external systems.
The application can be installed on cloud infrastructure using automated deployment templates.