Hatchet is an open-source durable workflow engine and task orchestration platform. It provides a framework for building and executing fault-tolerant, multi-step pipelines as directed acyclic graphs (DAGs), with automatic retries, scheduling, and real-time observability. The system is built around durable task checkpointing, which persists execution state after each step so work can resume from the last checkpoint after a worker crash or restart, and it supports event-driven task resumption that pauses a task until a matching external event arrives.
The platform distinguishes itself through its support for polyglot workers connected over gRPC, allowing task code to be written in any language and scaled independently from the orchestration services. It offers a comprehensive set of capabilities for modeling workflows as DAGs with typed data passing between dependent tasks, parallel execution, and conditional task skipping or cancellation based on parent output. Hatchet also provides a multi-step human-in-the-loop orchestrator that pauses workflows for human input or external events and resumes from checkpoints without custom recovery logic, and it exposes durable tasks as callable tools for AI agents through the Model Context Protocol (MCP) or SDKs with retries and observability.
The system includes a web-based observability dashboard for monitoring workflow runs, logs, metrics, and traces with real-time status and debugging capabilities. It supports event-driven task execution triggered by external webhooks, Slack commands, and custom events, as well as scheduled and cron-based automation for running one-off or recurring tasks. Hatchet can be self-hosted on your own infrastructure using Kubernetes or Docker, with PostgreSQL as the primary state store and optional RabbitMQ for message queuing.