2 repos
Platforms for automating complex sequences of data processing tasks.
Distinguishing note: Focuses on data-specific orchestration rather than general workflow automation.
Explore 2 awesome GitHub repositories matching data & databases · Data Pipeline Orchestrators. Refine with filters or upvote what's useful.
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
A platform that schedules, monitors, and manages complex sequences of data processing tasks across distributed computing environments.
This project is an open-source educational curriculum designed to provide comprehensive training in data engineering. It focuses on building scalable data pipelines and managing cloud-native infrastructure through a structured, self-paced program that combines technical explanations with hands-on practical exercises. The curriculum distinguishes itself by emphasizing industry-standard methodologies, specifically teaching students how to implement infrastructure as code and manage data workflows through orchestration tools. By utilizing container-based environment isolation and declarative con
"Teaches the use of automated workflow tools to schedule, monitor, and manage the execution of complex data processing tasks."