30 open-source projects similar to alpha-unito/streamflow, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Streamflow alternative.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Airflow is a workflow orchestration platform for authoring, scheduling, and monitoring complex data pipelines as code using Python. It employs a DAG-based task scheduler to manage execution timing and dependencies via directed acyclic graphs, utilizing a distributed task execution engine to run workloads across a cluster of worker nodes. The platform provides a data pipeline monitor for tracking the health and execution history of programmatic workflows. This includes a web interface for workflow progress visualization and health monitoring to identify and troubleshoot pipeline failures. The
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
An engine for managing the execution of container-based workflows.
Cocoindex is an incremental data processing engine that builds and maintains live indexes for AI agents, with a core focus on codebase indexing and knowledge graph extraction. The engine uses a function-graph execution model where user-defined Python functions are composed into a directed acyclic graph, and it processes data incrementally so only changed source records or code paths are re-computed, avoiding full recomputation at any scale. It supports automatic schema inference from transformation pipeline type annotations and provides full data lineage tracing, tagging every output record wi
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality. The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows.
Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
🎲 Dotflow turns an idea into flow! — Lightweight Python library for execution pipelines
A framework (comand line tool libraries) for creating flexible compute pipelines
GNU-Make-like utility for managing builds and complex workflows
Workflows for Fission: Fast, reliable and lightweight function composition for serverless functions
A scala based DSL and framework for writing and executing bioinformatics pipelines as Directed Acyclic GRaphs