30 open-source projects similar to apache/dolphinscheduler, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Dolphinscheduler alternative.
Prefect is a workflow orchestration platform designed to define, schedule, and monitor complex data pipelines as Python code. It functions as a container-native engine that wraps individual tasks in isolated environments, ensuring consistent dependencies and resource allocation across diverse infrastructure. By utilizing a state-machine-based orchestration model, the system tracks execution progress through discrete transitions and persistent event logs to maintain reliable and observable task processing. The platform distinguishes itself through a decoupled worker-API architecture, which sep
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work. The project distinguishes itself through a robust state-tracking mechanism t
Hatchet is an open-source durable workflow engine and task orchestration platform. It provides a framework for building and executing fault-tolerant, multi-step pipelines as directed acyclic graphs (DAGs), with automatic retries, scheduling, and real-time observability. The system is built around durable task checkpointing, which persists execution state after each step so work can resume from the last checkpoint after a worker crash or restart, and it supports event-driven task resumption that pauses a task until a matching external event arrives. The platform distinguishes itself through it
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Nextflow is a dataflow workflow engine and distributed computing framework used to build and execute data-intensive pipelines. It serves as a scientific workflow language that allows users to define reproducible data processing sequences, supporting any scripting language through shebang declarations. The system functions as a containerized pipeline orchestrator, utilizing container technologies to ensure software dependencies remain consistent across different environments. It decouples workflow logic from the underlying infrastructure, enabling the same pipeline to run on local machines, cl
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
Argo Workflows is a container-native workflow engine that functions as a Kubernetes custom resource controller. It orchestrates complex sequences of containerized tasks by executing them as directed acyclic graphs, allowing for dependency management and parallel processing within a cluster. The system extends the native Kubernetes control plane to manage the full lifecycle of automated processes, from initial triggering to final resource cleanup. The platform distinguishes itself through its controller-pattern reconciliation, which continuously monitors workflow states to align them with desi
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
This project is a learning curriculum and programming guide for Apache Spark, providing a structured set of educational resources and practical code examples for mastering distributed data processing. It serves as a course for building scalable data workflows and big data engineering pipelines. The repository provides practical source code and project layouts that demonstrate how to connect external data stores, process streaming data, and organize code for distributed environments. It includes implementation examples for scaling machine learning algorithms across clusters to handle large tra
Enterprise job scheduling middleware with distributed computing ability.
Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality. The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows.
This project is a Python workflow orchestration platform and programmatic data pipeline engine used to author, schedule, and monitor complex data pipelines. It functions as a directed acyclic graph manager and scheduler, allowing users to define data movement and transformation tasks as code to ensure precise execution order and maintainability. The platform distinguishes itself by treating workflows as code, enabling pipelines to be versioned and tested through a standard programming language. It utilizes a system of extensible operators to encapsulate integration logic and employs a templat
Flower is a monitoring and administration tool for Celery task queues. It provides a real-time web dashboard and a REST API to monitor distributed task clusters, manage worker instances, and observe message broker health. The project distinguishes itself by offering centralized control over the task lifecycle, allowing users to trigger, revoke, or terminate tasks and apply execution rate limits. It also includes a Prometheus metrics exporter to surface internal performance and status data for external monitoring and alerting systems. The tool covers a broad range of observability and managem
Azkaban is a distributed workflow manager and DAG-based job orchestrator designed as an enterprise batch processor. It serves as a Java-based workflow engine that schedules and executes complex job sequences across a cluster of executor servers, with specific functionality for managing big data workloads on Hadoop clusters. The system distinguishes itself through a distributed executor model that coordinates state via a shared database to ensure high availability. It employs a plugin-based architecture that allows for custom job types and system functionality extensions, including the ability
Otter is a distributed database synchronization system and change data capture tool designed to replicate data between databases across multiple geographic regions. It functions as a synchronization orchestrator and ETL data pipeline that mirrors records and associated files in real time. The system employs incremental log parsing to capture database changes and utilizes a consistency-based convergence algorithm and loop-avoidance logic to manage bi-directional replication. It processes data through a pipeline of selection, extraction, transformation, and loading to handle joins and format co
Great Expectations is a data quality testing framework and observability platform designed to monitor the reliability of data pipelines. It provides a structured environment for defining, documenting, and automating data quality assertions, allowing teams to validate datasets against expected structure and content before they move through downstream processes. The project distinguishes itself through a declarative domain-specific language that stores quality rules as version-controlled configuration files. It utilizes an execution engine abstraction to translate these high-level assertions in
Inngest is a durable execution framework and event-driven automation engine designed to orchestrate background workflows. It enables developers to build resilient, stateful processes by memoizing function steps, ensuring that long-running tasks can automatically resume from the last successful operation after failures, timeouts, or infrastructure restarts. The platform distinguishes itself through its event-driven architecture, which uses a schema-validated bus to trigger functions and coordinate complex, multi-step logic. It employs an onion-model middleware approach for cross-cutting concer
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Temporal is a distributed workflow orchestration engine designed to manage fault-tolerant, stateful, and long-running background processes. It functions as a platform for coordinating complex cross-service operations, ensuring consistency and reliability in distributed environments by decoupling workflow orchestration from task execution. The platform distinguishes itself through a deterministic, event-sourced execution model that reconstructs workflow state by re-executing code from an immutable event log. This approach isolates non-deterministic side effects into managed activities, allowin
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Conductor is a durable workflow engine designed to orchestrate complex, long-running business processes and autonomous agent loops. It functions as a stateful execution platform that persists the entire history of a process, ensuring that workflows remain reliable and recoverable across infrastructure failures, system restarts, and transient network errors. By managing task lifecycles, worker polling, and state transitions, it provides a centralized coordination layer for distributed systems. The platform distinguishes itself through its specialized support for AI agent orchestration, allowin
Ajenti is a Linux server management panel that provides a centralized web-based dashboard for administering system configurations, services, and hardware resources. It functions as a system monitor and a remote administration tool for managing server infrastructure through a browser. The project includes a web-based terminal emulator for executing shell commands and a file manager for browsing and organizing the server filesystem. It also features a dedicated manager for deploying and controlling the lifecycle of Docker containers. The platform covers a broad range of administrative capabili
This project is a modular, terminal-based dashboard framework designed to aggregate and display real-time information within a grid-aligned interface. It functions as a centralized monitoring tool that translates data from local system resources, infrastructure services, and external web APIs into a unified, text-based display. The dashboard is distinguished by its plugin-based architecture, which allows users to encapsulate distinct data sources and display logic into isolated, independently managed modules. Users define their workspace through declarative configuration files or an interacti
Elsa Core is a workflow engine framework designed for defining, executing, and managing long-running business processes. It functions as a distributed workflow orchestrator and event-driven trigger system, capable of operating as a multi-tenant platform with secure data isolation. The project distinguishes itself through a flexible approach to workflow definitions, supporting a visual drag-and-drop designer, programmatic C# definitions, and portable JSON specifications. It provides a highly extensible architecture allowing for the development of custom activities and the use of a dynamic expr
This project is a Python framework for building autonomous, event-driven agent systems. It provides a unified runtime for orchestrating multi-agent workflows, managing persistent conversation state, and executing code within secure, isolated sandbox environments. The framework is designed to handle complex task delegation, allowing agents to invoke other agents as tools while maintaining context across multi-turn interactions. The framework distinguishes itself through its deep integration with the Model Context Protocol, enabling agents to connect to external data sources and remote services
Quarkus is a Kubernetes-native Java framework designed for building high-performance, memory-efficient applications. It utilizes ahead-of-time native compilation to transform Java code into standalone, optimized binaries that eliminate the need for a virtual machine, enabling rapid startup and reduced memory consumption. By performing code augmentation during the build phase, it shifts heavy processing tasks away from runtime, ensuring that applications are optimized for cloud-native environments. The framework distinguishes itself through a unified approach to reactive and imperative program
FreeScout is a self-hosted, open-source help desk and ticket management system built with PHP. It functions as a customer support platform that enables teams to manage shared inboxes and organize customer email conversations into tickets. The platform distinguishes itself as a multi-channel support dashboard that integrates customer relationship management and a knowledge base system. It provides tools for tracking detailed customer profiles and organizational records while offering a programmable interface to manage and expose support articles. The system covers a broad range of support cap
Forem is an open-source platform designed for building and managing technical communities. It functions as a social publishing engine that enables members to share long-form content, participate in threaded discussions, and engage through social interactions. The platform provides tools for organizations to maintain branded profiles, host community hackathons, and facilitate collaborative learning through structured educational tracks. Beyond its social features, Forem integrates advanced capabilities for AI agent workflow orchestration and codebase knowledge graphing. It allows developers to