1 repo
Systems designed to distribute and manage large-scale analytical queries and batch transformation jobs across computing clusters.
Distinguishing note: Focuses on the submission and management of remote batch jobs rather than local data processing.
Explore 1 awesome GitHub repository matching data & databases · Distributed Processing Engines. Refine with filters or upvote what's useful.
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Submit and manage analytical queries and batch transformation jobs on remote clusters to handle large-scale data workloads efficiently and reliably.