30 open-source projects similar to dotflow-io/dotflow, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Dotflow alternative.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality. The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows.
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
Pinball is a scalable workflow manager
Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work. The project distinguishes itself through a robust state-tracking mechanism t
Kestra is a declarative workflow orchestrator designed to manage complex task dependencies and automated processes through versioned configuration files. It functions as a distributed platform that decouples task scheduling from execution by offloading computational workloads to a fleet of worker nodes. The system uses a reactive, event-driven engine to initiate workflows automatically in response to external signals, webhooks, schedules, or file system changes. The platform distinguishes itself through a modular plugin architecture that allows for the integration of custom tasks and external
Kedro is a data science pipeline framework and production toolbox designed to build reproducible, modular workflows using software engineering best practices. It functions as a data engineering orchestrator and catalog manager, bridging the gap between interactive analysis and maintainable production pipelines. The framework distinguishes itself by using a data catalog to decouple data access from processing logic and providing tools to transition analysis from interactive notebooks into structured workflows. It includes a workflow visualization tool that generates visual maps of data pipelin
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Argo Workflows is a container-native workflow engine that functions as a Kubernetes custom resource controller. It orchestrates complex sequences of containerized tasks by executing them as directed acyclic graphs, allowing for dependency management and parallel processing within a cluster. The system extends the native Kubernetes control plane to manage the full lifecycle of automated processes, from initial triggering to final resource cleanup. The platform distinguishes itself through its controller-pattern reconciliation, which continuously monitors workflow states to align them with desi
Argo is a cloud native CI/CD platform and Kubernetes workflow engine. It functions as a container pipeline orchestrator and job scheduler, managing multi-step sequences of containers as jobs using directed acyclic graphs within a cluster. The system acts as a progressive delivery controller, reducing release risk through automated Canary and Blue-Green deployment strategies. It provides declarative GitOps synchronization to mirror the state of a git repository directly into the cluster environment for continuous delivery automation. The platform covers a broad range of capabilities including
Airflow is a workflow orchestration platform for authoring, scheduling, and monitoring complex data pipelines as code using Python. It employs a DAG-based task scheduler to manage execution timing and dependencies via directed acyclic graphs, utilizing a distributed task execution engine to run workloads across a cluster of worker nodes. The platform provides a data pipeline monitor for tracking the health and execution history of programmatic workflows. This includes a web interface for workflow progress visualization and health monitoring to identify and troubleshoot pipeline failures. The
An engine for managing the execution of container-based workflows.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
The open agent control plane. Govern autonomous AI agents with pre-execution policy enforcement, approval gates, and audit trails. Works with LangChain, CrewAI, MCP, and any framework.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
A framework (comand line tool libraries) for creating flexible compute pipelines
Vendor-agnostic orchestration for training, inference and agentic workloads across NVIDIA, AMD, TPU, and Tenstorrent on clouds, Kubernetes, and bare metal.