1 repo
Systems for orchestrating and distributing complex data processing workflows across computing clusters.
Distinguishing note: Focuses on data pipeline orchestration and DAG-based task scheduling rather than generic system-level job scheduling.
Explore 1 awesome GitHub repository matching data & databases · Distributed Task Schedulers. Refine with filters or upvote what's useful.
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Distributing and managing the execution of batch processing jobs across large clusters to ensure reliable data transformation and efficient resource utilization.