30 open-source projects similar to doccano/doccano, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Doccano alternative.
Doccano is a collaborative labeling platform and text annotation tool designed to create training data for machine learning. It provides a specialized interface for performing sequence labeling and text classification on natural language datasets. The system functions as a supervised learning dataset manager, allowing multiple users to coordinate within a shared workspace to label datasets for natural language processing tasks. It supports the preparation of raw text data for model training by converting unstructured documents into structured labeled examples. The platform includes capabilit
Argilla is a collaborative AI feedback tool and data curation management system. It serves as a human-in-the-loop dataset platform designed to coordinate workforce annotators and domain experts in labeling, rating, and refining data samples for machine learning projects. The platform focuses on large language model dataset curation and reinforcement learning from human feedback workflows. It provides a shared workspace for integrating human expertise into AI development to validate model outputs and correct data errors. The system manages the end-to-end machine learning data pipeline, includ
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
CVAT is an open-source computer vision annotation tool and visual dataset management platform. It provides a self-hosted interface for labeling images, videos, and 3D data to create datasets for vision AI models. The platform features AI-assisted data labeling to automate the creation of masks and bounding boxes, utilizing a plug-in system to connect external machine learning models. It includes a consensus-based quality assurance system that verifies label accuracy by comparing independent annotations. The system covers collaborative team management, project organization through task decomp
Akaunting is a modular business enterprise resource planning system and self-hosted accounting software. It provides a comprehensive platform for small business financial management, centering on a double-entry bookkeeping system with a general ledger and chart of accounts. The platform is designed for extensibility through a module-based architecture and a dedicated marketplace for procuring third-party applications. It supports multi-tenant data isolation and utilizes role-based access control to manage granular user permissions. Its capability surface covers a wide range of business opera
InvenTree is an open-source inventory management platform built on Django, designed for tracking parts, stock levels, and supply chain operations through a web interface and REST API. The system uses barcodes—including QR codes, 1D barcodes, and Data Matrix codes—as primary identifiers for scanning, linking, and triggering inventory actions, and extends core functionality through a Python plugin framework supporting custom actions, UI panels, barcode handlers, and scheduled tasks. The platform distinguishes itself through a comprehensive plugin-based extensibility system that allows custom in
Mealie is a self-hosted recipe management platform designed for personal data ownership and household meal planning. It functions as a digital kitchen assistant that allows users to import, organize, and digitize culinary content from websites, images, and videos into a structured, searchable database. The application supports multi-user collaboration through household management, enabling shared access to recipes and meal plans while maintaining distinct permissions. The platform distinguishes itself through extensive automation and integration capabilities. It features a programmatic interf
Label Studio is a multi-modal data annotation platform designed to create and manage high-quality training datasets for machine learning. It functions as a self-hosted, containerized environment that supports secure, private deployments, including air-gapped configurations. The platform provides a centralized workspace for labeling diverse media types, such as images, text, audio, and time-series data, to support supervised and reinforcement learning workflows. The platform distinguishes itself through deep integration with machine learning backends, enabling active learning loops, automated
Dokploy is a self-hosted platform-as-a-service designed to simplify the deployment and management of containerized applications and databases. It provides a centralized control plane that decouples administrative management from application workloads, allowing users to oversee infrastructure across multiple server nodes through a unified web interface or a command-line tool. The platform distinguishes itself through an extensive library of pre-configured application templates, enabling the rapid deployment of databases, identity providers, and various productivity or development tools. It sup
Solidtime is time tracking software designed for freelancers and agencies to record work durations, manage billable hours, and monitor labor allocation. It serves as a professional services automation tool that organizes work into clients and projects while managing team member assignments. The system features a billable rate manager that defines hourly costs at the organizational, member, and project levels using hierarchical overrides to calculate total billing. It includes a project management tool for organizing clients and tasks into hierarchies with role-based access permissions. The p
unopim is an AI-powered product information management system that serves as a centralized repository for managing product attributes, categories, and variations. It functions as a containerized product repository and a multi-channel data distributor, synchronizing consistent product information and pricing across diverse external sales platforms and marketplaces. The platform distinguishes itself through an LLM-based catalog manager that provides a conversational interface for executing data management tasks. This allows users to perform item creation, content enrichment, and quality scans u
DocsGPT is a retrieval-augmented generation platform and private knowledge base used to build AI agents that perform grounded search and analysis. It functions as a multi-model AI orchestrator and enterprise agent builder, allowing for the integration of various local and cloud language models to customize reasoning and text generation. The project provides a visual environment for developing automated assistants using conditional logic and third-party API connectivity. It enables the creation of private AI agents capable of performing enterprise search and detailed document analysis using pr
WooCommerce is a comprehensive eCommerce framework for WordPress that transforms websites into fully functional online stores for physical and digital goods. It serves as a digital storefront manager for product catalogs, inventory, and customer orders across retail and wholesale business models. The system functions as a payment gateway integrator, connecting shops to diverse processors for credit cards, digital wallets, and subscriptions. It also operates as an order fulfillment system for calculating shipping rates, generating labels, and coordinating delivery via third-party couriers, whi
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
This project is a comprehensive dataset and archive of classical Chinese poetry, prose, and Confucian classics. It serves as a digital humanities corpus, providing machine-readable access to hundreds of thousands of poems and detailed poet biographies, specifically spanning the Tang and Song dynasties. The collection is distinguished by its scholarly depth, incorporating textual variation annotations to track disputed characters across different source editions. It also includes tonal pattern mapping to describe the rhythmic and phonetic structures of the verse, alongside a popularity ranking
Wekan is an open-source, self-hosted Kanban project management tool used for organizing workflows through boards, lists, and cards. It is a real-time web application that allows teams to manage tasks on private infrastructure. The platform distinguishes itself with extensive data migration tools, specifically for importing boards and cards from Trello. It supports enterprise-grade identity integration via LDAP, OpenID Connect, and OAuth2, and offers flexible storage options including PostgreSQL as a primary relational backend and pluggable cloud storage for attachments. The system covers a w
Filament is a full-stack framework for building administrative panels and management interfaces within the Laravel ecosystem. It provides a declarative, component-based architecture that allows developers to construct complex, data-driven applications using server-side configuration objects rather than manual HTML. By inspecting database model structures and relationships, the framework automates the generation of CRUD interfaces, forms, and data tables, significantly reducing boilerplate code. The project distinguishes itself through a highly modular and extensible design that supports custo
PeerTube is a decentralized, open-source video hosting platform that enables users to operate independent, interoperable servers. By utilizing the ActivityPub protocol, it connects these servers into a global, federated network where users can follow channels, discover content, and interact across different instances. The platform is designed to function as a self-hosted video content management system, providing a community-driven alternative to centralized media services. What distinguishes PeerTube is its hybrid approach to content delivery and infrastructure management. It integrates peer
Kanboard is a self-hosted Kanban project management tool and productivity suite designed for tracking software tasks and team collaboration. It provides a visual system for managing workflows through the use of boards, columns, and cards. The project features an extensible plugin framework and a comprehensive API for programmatic task and project administration. It includes specialized identity management through LDAP integration, allowing for the synchronization of user accounts and group permissions from directory servers. The system covers a wide range of capabilities, including event-dri
Jetstream is an application scaffold for Laravel that provides a pre-built identity system and team collaboration framework. It serves as a starter kit that integrates user authentication, profile management, and organizational tools into a unified project structure. The project is distinguished by its comprehensive team management capabilities, which include shared workspace organization, member invitation workflows, and role-based access control. It also features an integrated API token manager for issuing and controlling secure access tokens for external clients. The platform covers a bro
VoTT is a computer vision annotation software and machine learning dataset preparation tool. It is a desktop application designed for drawing bounding boxes and assigning tags to objects in images and videos to create training datasets for object detection models. The application utilizes a cross-platform desktop interface to manage image and video assets. It features a local-first storage integration to handle large media assets directly from the host machine's file system and includes frame-rate controlled video sampling to extract specific images from video streams for labeling. The softw
Label Studio is a multi-type data labeling tool and data annotation workspace designed to prepare datasets for machine learning training. It functions as a cloud-integrated data pipeline that imports raw data from storage, manages the annotation process, and exports labels into standardized formats. The platform features a machine learning model integration framework that connects to external model servers. This enables model-assisted annotation and active learning, allowing the system to perform pre-labeling and refine predictions based on human feedback. The software provides project manag
Kaneo is an open-source project management platform built around a kanban board interface for organizing tasks into columns with drag-and-drop status management. It functions as a self-hosted task manager that supports multiple workspaces, organizations, and role-based access control, with all persistent data stored in a PostgreSQL relational database and exposed through a RESTful JSON API. The platform distinguishes itself through deep external integration capabilities, connecting project workflows to GitHub, Gitea, Slack, Discord, and Telegram with automated event-driven actions. A webhook
Cloud Annotations is a web-based platform designed for collaborative image annotation and the preparation of computer vision datasets. It provides an interface for teams to draw bounding boxes and polygons over digital media, transforming raw images into structured training data for machine learning models. The platform distinguishes itself through a real-time synchronization engine that allows multiple users to edit the same image simultaneously. By utilizing browser-based local storage and standardized data serialization, it supports offline workflows and ensures that exported annotations r
Easy-dataset is a comprehensive platform designed for the end-to-end management of machine learning datasets, specifically tailored for language and vision model fine-tuning. It functions as a centralized environment for the entire data lifecycle, encompassing the automated generation of synthetic training data, the structural organization of document collections, and the systematic annotation of individual data points. The platform distinguishes itself through its integrated evaluation and orchestration capabilities. It provides a dedicated suite for benchmarking models, featuring blind side
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Apostrophe is an open-source Node.js headless content management system that delivers structured content through REST APIs while providing a visual in-context page editor for live editing. It is built on a module-based plugin architecture that extends CMS functionality through reusable modules, each encapsulating logic, configuration, and templates. The system uses schema-driven content modeling to define data structures and validation rules through configurable schemas and custom field types, with all content stored as flexible JSON-like documents in MongoDB. The platform distinguishes itsel
Dependency-Track is a software composition analysis tool and vulnerability management system designed to track dependencies and supply chain risk. It functions as a platform for ingesting and analyzing CycloneDX software bills of materials to identify known vulnerabilities and license compliance issues within third-party software components. The system distinguishes itself by mirroring external vulnerability databases locally to enable fast offline analysis and using VEX documents to differentiate between technical vulnerabilities and actual contextual risks. It also integrates with identity
CRMEB is a comprehensive e-commerce platform built on ThinkPHP 6, designed as a headless system that delivers standardized APIs to various frontend clients. It provides a unified backend to synchronize product catalogs, orders, and customer data across web browsers, mobile applications, and mini-programs. The platform supports diverse commerce models, including multi-vendor marketplaces where independent merchants manage their own stores, centralized chain store networks, and social commerce frameworks featuring affiliate distribution and community group buying. It also integrates specialized