# spotify/luigi

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/spotify-luigi).**

18,676 stars · 2,450 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/spotify/luigi
- awesome-repositories: https://awesome-repositories.com/repository/spotify-luigi.md

## Topics

`hadoop` `luigi` `orchestration-framework` `python` `scheduling`

## Description

Luigi is a Python framework designed for building and managing complex batch data pipelines. It functions as a workflow orchestration engine that organizes tasks into directed acyclic graphs, ensuring that jobs execute in the correct logical order based on their dependencies. By utilizing a centralized scheduler, the system coordinates task execution across distributed environments, tracks global workflow state, and prevents redundant processing by verifying the existence of output targets before triggering any work.

The project distinguishes itself through a robust state-tracking mechanism that uses atomic file system abstractions to ensure data integrity. It enforces strict parameter-driven task definitions with type checking, allowing for dynamic configuration and flexible job execution. To maintain stability in large-scale environments, the system includes resource-constrained task throttling, which uses shared tokens to prevent infrastructure overload, and provides a comprehensive web-based dashboard for visualizing dependency graphs and monitoring real-time pipeline progress.

Beyond core orchestration, the framework supports a wide range of data processing capabilities, including integration with distributed storage systems, relational databases, and various cluster-based compute engines. It handles the full lifecycle of a pipeline through event-driven hooks, automated retry logic for transient failures, and historical auditing of task execution. The architecture is highly extensible, allowing for custom file system implementations and specialized job types to be integrated into existing workflows.

## Tags

### Data & Databases

- [Python Data Pipeline Frameworks](https://awesome-repositories.com/f/data-databases/python-data-pipeline-frameworks.md) — Provides a Python-based framework for building complex batch workflows and managing task dependencies.
- [Workflow Orchestration Engines](https://awesome-repositories.com/f/data-databases/workflow-orchestration-engines.md) — Coordinates multi-step data processing tasks, handles retries, and ensures atomic output generation.
- [Batch Processing Schedulers](https://awesome-repositories.com/f/data-databases/batch-processing-schedulers.md) — Automates and manages the execution of complex batch data processing pipelines across distributed environments. ([source](https://cdn.jsdelivr.net/gh/spotify/luigi@master/README.md))
- [Data Pipeline Orchestration](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration.md) — Orchestrates complex sequences of data processing tasks by defining dependencies and automating execution. ([source](https://luigi.readthedocs.io/en/stable/_sources/index.rst.txt))
- [Distributed Task Schedulers](https://awesome-repositories.com/f/data-databases/distributed-task-schedulers.md) — Acts as a centralized service for tracking dependencies and scheduling distributed batch tasks. ([source](https://luigi.readthedocs.io/en/stable/example_top_artists.html))
- [Atomic File Operations](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-management-governance/data-integrity-validation/data-integrity/atomic-file-operations.md) — Ensures data integrity by verifying output existence before task execution to prevent redundant processing.
- [Atomic Write Normalizers](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-management-governance/data-integrity-validation/data-integrity/atomic-file-operations/atomic-write-normalizers.md) — Ensures atomic data writes by finalizing outputs only after successful completion to prevent downstream consumption of corrupted data. ([source](https://luigi.readthedocs.io/en/stable/luigi_patterns.html))
- [Batch Processing Utilities](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/batch-processing-systems/batch-processing-utilities.md) — Ensures data integrity through atomic output handling and automated retry logic for batch processing.
- [Data Processing Tasks](https://awesome-repositories.com/f/data-databases/data-processing-tasks.md) — Encapsulates units of computation by specifying input requirements and output targets. ([source](https://luigi.readthedocs.io/en/stable/workflows.html))
- [Data Storage Layers](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage-layers.md) — Provides a unified abstraction layer for interacting with various storage systems including local files and databases. ([source](https://luigi.readthedocs.io/en/stable/workflows.html))
- [Data Exporters](https://awesome-repositories.com/f/data-databases/data-exporters.md) — Facilitates exporting processed data into relational databases with support for schema reflection. ([source](https://luigi.readthedocs.io/en/latest/api/luigi.contrib.sqla.html))
- [Data Processing Workflows](https://awesome-repositories.com/f/data-databases/data-processing-workflows.md) — Tracks data table and partition existence to coordinate dependencies within complex data processing workflows. ([source](https://luigi.readthedocs.io/en/latest/api/luigi.contrib.hive.html))
- [Distributed Storage](https://awesome-repositories.com/f/data-databases/distributed-storage.md) — Integrates with distributed storage systems to read and write files within automated batch processing tasks. ([source](https://luigi.readthedocs.io/en/latest/api/luigi.contrib.hdfs.html))
- [External Storage Integrations](https://awesome-repositories.com/f/data-databases/external-storage-integrations.md) — Connects to cloud storage and databases to maintain consistent data access across environments. ([source](https://luigi.readthedocs.io/en/stable/py-modindex.html))

### Software Engineering & Architecture

- [Workflow Schedulers](https://awesome-repositories.com/f/software-engineering-architecture/system-internals/centralization-patterns/workflow-schedulers.md) — Coordinates task execution and tracks global workflow state through a centralized server.
- [Directed Acyclic Graph Engines](https://awesome-repositories.com/f/software-engineering-architecture/directed-acyclic-graph-engines.md) — Organizes tasks into dependency graphs to determine execution order based on upstream data requirements.
- [Workflow Execution Managers](https://awesome-repositories.com/f/software-engineering-architecture/system-internals/centralization-patterns/workflow-execution-managers.md) — Coordinates task execution through a central server to track dependencies and prevent concurrent job execution. ([source](https://luigi.readthedocs.io/en/stable/execution_model.html))
- [Dynamic Task Graphs](https://awesome-repositories.com/f/software-engineering-architecture/dynamic-task-graphs.md) — Constructs and executes task graphs at runtime based on logic and data dependencies. ([source](https://luigi.readthedocs.io/en/stable/tasks.html))
- [File System Abstractions](https://awesome-repositories.com/f/software-engineering-architecture/file-system-abstractions.md) — Implements file system abstractions that ensure atomic operations and prevent data corruption. ([source](https://luigi.readthedocs.io/en/stable/design_and_limitations.html))
- [Task Execution Engines](https://awesome-repositories.com/f/software-engineering-architecture/task-execution-engines.md) — Triggers defined data processing tasks from the command line or programmatically. ([source](https://luigi.readthedocs.io/en/stable/running_luigi.html))
- [Workflow Task Definitions](https://awesome-repositories.com/f/software-engineering-architecture/workflow-task-definitions.md) — Enables dynamic configuration by injecting type-checked parameters into task definitions.
- [Pipeline Task Grouping](https://awesome-repositories.com/f/software-engineering-architecture/asynchronous-task-managers/task-group-orchestration/pipeline-task-grouping.md) — Wraps multiple independent tasks into a single parent task to trigger complex dependency chains. ([source](https://luigi.readthedocs.io/en/stable/luigi_patterns.html))
- [Concurrent Task Runners](https://awesome-repositories.com/f/software-engineering-architecture/concurrent-task-runners.md) — Combines identical pending tasks into single execution runs to improve throughput and resource efficiency. ([source](https://luigi.readthedocs.io/en/stable/luigi_patterns.html))
- [Concurrent Task Limiters](https://awesome-repositories.com/f/software-engineering-architecture/concurrent-task-runners/concurrent-task-limiters.md) — Limits concurrent execution using shared tokens to prevent infrastructure overload.
- [Failure Handling Policies](https://awesome-repositories.com/f/software-engineering-architecture/failure-handling-policies.md) — Detects and handles job interruptions to allow for recovery in multi-step workflows. ([source](https://luigi.readthedocs.io/en/stable/))
- [Pipeline Lifecycle Hooks](https://awesome-repositories.com/f/software-engineering-architecture/pipeline-lifecycle-hooks.md) — Registers custom callbacks to monitor and respond to specific lifecycle events during data processing. ([source](https://luigi.readthedocs.io/en/stable/api/luigi.html))
- [Task Retry Policies](https://awesome-repositories.com/f/software-engineering-architecture/task-retry-policies.md) — Defines automatic retry strategies for failed tasks to improve pipeline resilience. ([source](https://luigi.readthedocs.io/en/stable/configuration.html))
- [Event-Driven Hooks](https://awesome-repositories.com/f/software-engineering-architecture/event-driven-hooks.md) — Triggers custom logic through registered callbacks during specific stages of the task lifecycle.
- [Event Handling](https://awesome-repositories.com/f/software-engineering-architecture/event-handling.md) — Registers callbacks for lifecycle events like success or failure to trigger custom logic upon task completion. ([source](https://luigi.readthedocs.io/en/stable/tasks.html))

### Development Tools & Productivity

- [Task Dependency Managers](https://awesome-repositories.com/f/development-tools-productivity/task-dependency-managers.md) — Manages task requirements and completion status to enforce logical execution order in multi-step workflows. ([source](https://luigi.readthedocs.io/en/stable/execution_model.html))
- [Workflow State Managers](https://awesome-repositories.com/f/development-tools-productivity/build-tooling/build-orchestration-logic/build-orchestration-configuration/build-automation-systems/workflow-orchestration/workflow-state-managers.md) — Tracks task completion using atomic file targets to ensure data integrity and prevent redundant execution of finished work. ([source](https://luigi.readthedocs.io/en/stable/example_top_artists.html))
- [Task Dependency Management](https://awesome-repositories.com/f/development-tools-productivity/task-dependency-management.md) — Specifies upstream task dependencies to resolve complex execution graphs automatically. ([source](https://luigi.readthedocs.io/en/stable/example_top_artists.html))
- [Task Pipeline Managers](https://awesome-repositories.com/f/development-tools-productivity/task-pipeline-managers.md) — Encapsulates units of work by specifying input dependencies, computation logic, and output targets. ([source](https://luigi.readthedocs.io/en/stable/api/luigi.html))
- [State Tracking Utilities](https://awesome-repositories.com/f/development-tools-productivity/change-tracking/state-tracking-utilities.md) — Tracks task completion by checking for the existence of output targets to prevent redundant work.
- [Workflow Schedulers](https://awesome-repositories.com/f/development-tools-productivity/workflow-schedulers.md) — Automates the execution of periodic tasks over time, including backfilling historical data. ([source](https://luigi.readthedocs.io/en/stable/luigi_patterns.html))
- [Notification Integrations](https://awesome-repositories.com/f/development-tools-productivity/notification-integrations.md) — Integrates pipeline status updates with external messaging platforms for automated failure and completion alerts. ([source](https://luigi.readthedocs.io/en/latest/))
- [Execution Logic Overrides](https://awesome-repositories.com/f/development-tools-productivity/parallel-execution/custom-parallel-task-execution/execution-logic-overrides.md) — Supports overriding default worker and scheduler implementations for specialized requirements. ([source](https://luigi.readthedocs.io/en/stable/running_luigi.html))

### System Administration & Monitoring

- [Pipeline Monitoring Dashboards](https://awesome-repositories.com/f/system-administration-monitoring/pipeline-monitoring-dashboards.md) — Provides a web-based dashboard for visualizing dependency graphs and monitoring real-time pipeline execution status. ([source](https://cdn.jsdelivr.net/gh/spotify/luigi@master/README.md))
- [Workflow Monitoring Systems](https://awesome-repositories.com/f/system-administration-monitoring/workflow-monitoring-systems.md) — Tracks the progress and failure history of distributed workflows through a centralized monitoring interface. ([source](https://luigi.readthedocs.io/en/stable/))
- [Task Dependency Visualizers](https://awesome-repositories.com/f/system-administration-monitoring/task-monitoring/task-dependency-visualizers.md) — Provides a web-based interface for visualizing task dependency graphs and managing execution locking. ([source](https://luigi.readthedocs.io/en/stable/design_and_limitations.html))
- [Activity Progress Monitors](https://awesome-repositories.com/f/system-administration-monitoring/activity-monitors/activity-progress-monitors.md) — Reports real-time progress and status updates for long-running tasks to the central scheduler. ([source](https://luigi.readthedocs.io/en/stable/tasks.html))
- [Task Progress Monitors](https://awesome-repositories.com/f/system-administration-monitoring/activity-monitors/activity-progress-monitors/task-progress-monitors.md) — Tracks and visualizes the real-time status and execution flow of active data processing tasks. ([source](https://luigi.readthedocs.io/en/stable/central_scheduler.html))
- [Execution History Auditors](https://awesome-repositories.com/f/system-administration-monitoring/execution-history-auditors.md) — Archives detailed task execution metadata in a database for historical analysis and auditing. ([source](https://luigi.readthedocs.io/en/stable/central_scheduler.html))
- [Resource Constraints](https://awesome-repositories.com/f/system-administration-monitoring/resource-constraints.md) — Enforces resource constraints via shared tokens to maintain system stability during parallel job execution. ([source](https://luigi.readthedocs.io/en/stable/luigi_patterns.html))
- [Metric and Performance Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors.md) — Collects and reports performance metrics like execution time and memory usage during task lifecycles. ([source](https://luigi.readthedocs.io/en/stable/luigi_patterns.html))
- [Search Filters](https://awesome-repositories.com/f/system-administration-monitoring/real-time-monitoring-dashboards/search-filters.md) — Allows searching and filtering through active or historical jobs within the workflow monitoring interface. ([source](https://luigi.readthedocs.io/en/latest/))

### Business & Productivity Software

- [Task Execution Controllers](https://awesome-repositories.com/f/business-productivity-software/activity-trackers/task-execution-controllers.md) — Provides administrative control over task execution, including concurrency limits and retry policies for complex data pipelines. ([source](https://luigi.readthedocs.io/en/stable/api/luigi.html))
- [Workflow Parameterization](https://awesome-repositories.com/f/business-productivity-software/workflow-parameterization.md) — Enables dynamic data handling within workflows by injecting runtime parameters into task logic. ([source](https://luigi.readthedocs.io/en/stable/workflows.html))

### DevOps & Infrastructure

- [Distributed Task Schedulers](https://awesome-repositories.com/f/devops-infrastructure/distributed-task-schedulers.md) — Prevents multiple instances of the same task from running simultaneously across distributed environments. ([source](https://luigi.readthedocs.io/en/stable/central_scheduler.html))
- [Workflow Orchestration](https://awesome-repositories.com/f/devops-infrastructure/workflow-orchestration.md) — Manages dependencies between multiple data processing tasks to ensure correct execution order and automatic failure handling. ([source](https://luigi.readthedocs.io/en/stable/py-modindex.html))
- [Task & Job Management](https://awesome-repositories.com/f/devops-infrastructure/automation-orchestration/task-execution-frameworks/task-job-management.md) — Manages task outputs and tracks completion status to verify dependencies before triggering downstream work. ([source](https://luigi.readthedocs.io/en/stable/tasks.html))
- [Job Concurrency Controllers](https://awesome-repositories.com/f/devops-infrastructure/job-concurrency-controllers.md) — Limits concurrent execution of tasks using shared resource keys to prevent infrastructure overload. ([source](https://luigi.readthedocs.io/en/stable/configuration.html))
- [Task Schedulers](https://awesome-repositories.com/f/devops-infrastructure/automation-orchestration/task-execution-frameworks/task-job-management/task-schedulers.md) — Allows assigning weights to tasks to influence the order in which the scheduler processes available jobs. ([source](https://luigi.readthedocs.io/en/stable/tasks.html))
- [Job Execution Engines](https://awesome-repositories.com/f/devops-infrastructure/job-execution-engines.md) — Orchestrates the submission of applications to clusters by mapping task parameters to execution arguments. ([source](https://luigi.readthedocs.io/en/latest/api/luigi.contrib.spark.html))
- [Remote Workspace Command Execution](https://awesome-repositories.com/f/devops-infrastructure/execution-environments/remote-workspace-command-execution.md) — Executes shell commands on remote machines as if they were local processes. ([source](https://luigi.readthedocs.io/en/latest/api/luigi.contrib.ssh.html))
- [Remote File System Mounts](https://awesome-repositories.com/f/devops-infrastructure/remote-file-system-mounts.md) — Enables standard file operations on remote hosts through a unified interface. ([source](https://luigi.readthedocs.io/en/latest/api/luigi.contrib.ssh.html))

### Artificial Intelligence & ML

- [Declarative Task Signatures](https://awesome-repositories.com/f/artificial-intelligence-ml/declarative-task-signatures.md) — Supports declarative task signatures with type checking for input variables and argument parsing. ([source](https://luigi.readthedocs.io/en/stable/parameters.html))
- [Execution Parameter Configurators](https://awesome-repositories.com/f/artificial-intelligence-ml/model-parameter-configurations/execution-parameter-configurators.md) — Enables configuration of task parameters via command line or external files to override default behaviors. ([source](https://luigi.readthedocs.io/en/stable/example_top_artists.html))

### User Interface & Experience

- [Workflow Extenders](https://awesome-repositories.com/f/user-interface-experience/customizable-workspaces/workflow-extenders.md) — Provides a modular architecture for extending workflow capabilities with custom file systems and job types. ([source](https://luigi.readthedocs.io/en/stable/design_and_limitations.html))

### Programming Languages & Runtimes

- [Task Parameter Validators](https://awesome-repositories.com/f/programming-languages-runtimes/language-features-paradigms/type-system-tools/type-safety/static-type-validation/task-parameter-validators.md) — Enforces strict type checking on task parameters to ensure data inputs match defined requirements before execution. ([source](https://luigi.readthedocs.io/en/stable/mypy.html))
