# apache/airflow

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/apache-airflow).**

44,326 stars · 16,514 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/apache/airflow
- Homepage: https://airflow.apache.org/
- awesome-repositories: https://awesome-repositories.com/repository/apache-airflow.md

## Topics

`airflow` `apache` `apache-airflow` `automation` `dag` `data-engineering` `data-integration` `data-orchestrator` `data-pipelines` `data-science` `elt` `etl` `machine-learning` `mlops` `orchestration` `python` `scheduler` `workflow` `workflow-engine` `workflow-orchestration`

## Description

Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments.

The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external service integrations. This extensibility allows users to connect diverse cloud services, databases, and storage systems through custom plugins and packages. The system utilizes a distributed task queue to enable horizontal scaling, while a centralized scheduler and metadata-driven state management ensure fault tolerance and visibility across large-scale infrastructure.

Beyond core scheduling, the project provides comprehensive observability through a web-based interface for pipeline visualization, status tracking, and source code inspection. It supports secure operations by integrating with external secret management services and offers robust administrative control through both a command-line interface and a programmatic API. The system is designed for containerized deployment, providing tools for building optimized images and managing complex dependency environments.

## Tags

### Data & Databases

- [Data Pipeline Orchestrators](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestrators.md) — A platform that schedules, monitors, and manages complex sequences of data processing tasks across distributed computing environments. ([source](https://cdn.jsdelivr.net/gh/apache/airflow@main/README.md))
- [Workflow Orchestration](https://awesome-repositories.com/f/data-databases/workflow-orchestration.md) — Describe the sequence and dependencies of automated tasks using a structured configuration format to manage complex business processes across distributed environments. ([source](https://github.com/apache/airflow))
- [Workflow Orchestration Engines](https://awesome-repositories.com/f/data-databases/workflow-orchestration-engines.md) — Managing the lifecycle of recurring business processes by executing code-defined task dependencies and handling state persistence across distributed environments.
- [Workflow Orchestrators](https://awesome-repositories.com/f/data-databases/workflow-orchestrators.md) — Provides a platform for authoring, scheduling, and monitoring complex data pipelines using directed acyclic graphs.
- [Batch Processing Schedulers](https://awesome-repositories.com/f/data-databases/batch-processing-schedulers.md) — Define and monitor complex data pipelines using code-based configurations that support dynamic task generation to automate recurring business processes. ([source](https://airflow.apache.org/docs/apache-airflow/stable/index.html))
- [Data Processing Workflows](https://awesome-repositories.com/f/data-databases/data-processing-workflows.md) — Execute complex data analysis and graph traversals against distributed stores to incorporate advanced insights directly into automated data processing workflows. ([source](https://airflow.apache.org/docs/apache-airflow-providers-apache-tinkerpop/stable/index.html))
- [Distributed Task Schedulers](https://awesome-repositories.com/f/data-databases/distributed-task-schedulers.md) — Distributing and managing the execution of batch processing jobs across large clusters to ensure reliable data transformation and efficient resource utilization.
- [Batch Processing Engines](https://awesome-repositories.com/f/data-databases/batch-processing-engines.md) — Orchestrates batch workflows defined as code with centralized monitoring. ([source](https://airflow.apache.org/docs/apache-airflow/stable/))
- [Data Integration Tools](https://awesome-repositories.com/f/data-databases/data-integration-tools.md) — Automate the movement of information between disparate storage locations and distributed file systems to ensure data pipelines remain consistent and up to date. ([source](https://airflow.apache.org/docs/apache-airflow-providers-airbyte/stable/index.html))
- [Distributed Processing Engines](https://awesome-repositories.com/f/data-databases/distributed-processing-engines.md) — Submit and manage analytical queries and batch transformation jobs on remote clusters to handle large-scale data workloads efficiently and reliably. ([source](https://airflow.apache.org/docs/apache-airflow-providers-apache-livy/stable/index.html))
- [Database Connectors](https://awesome-repositories.com/f/data-databases/database-connectors.md) — Execute queries and perform data operations across multi-model and distributed database instances to interact with persistent storage layers during task execution. ([source](https://airflow.apache.org/docs/apache-airflow-providers-arangodb/stable/index.html))
- [Connection Management](https://awesome-repositories.com/f/data-databases/connection-management.md) — Create custom connection types with specialized forms and field handling logic to manage external service credentials and configuration settings securely. ([source](http://airflow.apache.org/docs/apache-airflow-providers/index.html))
- [Data Lake Management](https://awesome-repositories.com/f/data-databases/data-lake-management.md) — Perform data retrieval and metadata operations within distributed file systems and data lakes to maintain organized and accessible information repositories. ([source](https://airflow.apache.org/docs/apache-airflow-providers-apache-hdfs/stable/index.html))
- [Metadata Management Systems](https://awesome-repositories.com/f/data-databases/metadata-management-systems.md) — Ensures fault tolerance and state persistence by tracking task execution status in a relational database.

### DevOps & Infrastructure

- [Task Schedulers](https://awesome-repositories.com/f/devops-infrastructure/task-schedulers.md) — A distributed execution environment that manages task distribution and resource allocation across containerized clusters and cloud-native infrastructure.
- [Distributed Task Queues](https://awesome-repositories.com/f/devops-infrastructure/distributed-task-queues.md) — Enables horizontal scaling by dispatching tasks to a pool of distributed workers.
- [Cloud Infrastructure Orchestration](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure-orchestration.md) — Authenticate and manage resource allocation across cloud infrastructure providers to control remote computing tasks from a single centralized point. ([source](https://airflow.apache.org/docs/apache-airflow-providers-alibaba/stable/index.html))
- [Integration Frameworks](https://awesome-repositories.com/f/devops-infrastructure/integration-frameworks.md) — Connecting diverse external cloud services, databases, and storage systems through a modular architecture that supports custom plugins and provider packages.
- [Cloud Service Integrations](https://awesome-repositories.com/f/devops-infrastructure/cloud-service-integrations.md) — Connecting automated workflows to diverse cloud services and managed platforms to handle resource allocation, data movement, and job execution.
- [Provider Integrations](https://awesome-repositories.com/f/devops-infrastructure/provider-integrations.md) — Contribute to provider packages within the monorepo by understanding distribution structures, dependency management, and the integration of optional extras into the core system. ([source](https://github.com/apache/airflow/blob/main/contributing-docs/README.rst))

### Software Engineering & Architecture

- [Plugin Architectures](https://awesome-repositories.com/f/software-engineering-architecture/plugin-architectures.md) — Build custom commands, task links, and connection types to add specialized features and third-party service integrations that meet unique operational requirements. ([source](https://airflow.apache.org/docs/apache-airflow-providers/index.html))
- [Plugin Frameworks](https://awesome-repositories.com/f/software-engineering-architecture/plugin-frameworks.md) — Decouples core logic from external services using a modular provider-based framework.

### Development Tools & Productivity

- [Workflow Authoring Frameworks](https://awesome-repositories.com/f/development-tools-productivity/workflow-authoring-frameworks.md) — Define workflows and execute tasks in isolated subprocesses using native interfaces that separate business logic from the underlying execution environment. ([source](https://airflow.apache.org/docs/))
- [Command Line Interfaces](https://awesome-repositories.com/f/development-tools-productivity/command-line-interfaces.md) — Perform system operations and monitor workflow status using a command-line interface to control environments directly from a terminal window. ([source](https://airflow.apache.org/docs/apache-airflow-ctl/stable/index.html))

### Security & Cryptography

- [Secret Management Integrations](https://awesome-repositories.com/f/security-cryptography/secret-management-integrations.md) — Connect external secret management services to securely store and retrieve credentials and configuration settings instead of using an internal database. ([source](http://airflow.apache.org/docs/apache-airflow-providers/index.html))
- [Secret Management](https://awesome-repositories.com/f/security-cryptography/secret-management.md) — Retrieve and handle sensitive credentials from external security services during task execution to ensure authentication tokens remain protected throughout the workflow lifecycle. ([source](https://airflow.apache.org/docs/apache-airflow-providers-akeyless/stable/index.html))

### System Administration & Monitoring

- [Workflow Monitoring Systems](https://awesome-repositories.com/f/system-administration-monitoring/workflow-monitoring-systems.md) — Tracking the status of automated processes through centralized logging, custom alert notifications, and system dashboards for improved visibility and troubleshooting.
- [Pipeline Monitoring Dashboards](https://awesome-repositories.com/f/system-administration-monitoring/pipeline-monitoring-dashboards.md) — Provides a web interface for visualizing pipeline status, asset dependencies, and source code. ([source](https://cdn.jsdelivr.net/gh/apache/airflow@main/README.md))
- [Administrative APIs](https://awesome-repositories.com/f/system-administration-monitoring/administrative-apis.md) — Exposes comprehensive programmatic interfaces for managing system operations and workflow configurations.
- [Distributed Logging](https://awesome-repositories.com/f/system-administration-monitoring/distributed-logging.md) — Write and retrieve task execution logs using external storage services instead of local file systems to simplify log management across distributed environments. ([source](http://airflow.apache.org/docs/apache-airflow-providers/index.html))
- [Alerting Systems](https://awesome-repositories.com/f/system-administration-monitoring/alerting-systems.md) — Define custom notification channels to receive automated alerts and status updates regarding the execution progress of tasks and workflows. ([source](http://airflow.apache.org/docs/apache-airflow-providers/index.html))
- [Centralized Logging Systems](https://awesome-repositories.com/f/system-administration-monitoring/centralized-logging-systems.md) — Save and retrieve task execution logs using centralized external services to simplify troubleshooting and log management across complex distributed computing environments. ([source](https://airflow.apache.org/docs/apache-airflow-providers/index.html))
