# dagster-io/dagster

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/dagster-io-dagster).**

14,974 stars · 1,986 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/dagster-io/dagster
- Homepage: https://dagster.io
- awesome-repositories: https://awesome-repositories.com/repository/dagster-io-dagster.md

## Topics

`analytics` `dagster` `data-engineering` `data-integration` `data-orchestrator` `data-pipelines` `data-science` `etl` `metadata` `mlops` `orchestration` `python` `scheduler` `workflow` `workflow-automation`

## Description

Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality.

The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows. Its architecture is built on a pluggable execution engine that decouples orchestration logic from the underlying compute, allowing tasks to run across diverse cloud-native, serverless, and containerized environments. Furthermore, it supports partition-aware scheduling, which enables incremental processing and efficient management of high-volume datasets.

Beyond core orchestration, the system provides a comprehensive suite of tools for data platform management, including automated quality governance, infrastructure cost optimization, and centralized asset cataloging. It integrates with enterprise identity providers for access control and offers robust observability features, such as streaming logs and visual lineage tracking, to ensure system health and compliance.

The platform supports a variety of deployment models, ranging from self-hosted and hybrid configurations to a fully managed control plane. It includes specialized utilities for migrating legacy pipelines and operationalizing interactive scripts into production-ready components.

## Tags

### Data & Databases

- [Data Pipeline Orchestration](https://awesome-repositories.com/f/data-databases/data-pipeline-orchestration.md) — Builds and schedules complex data workflows as version-controlled code to ensure reliable execution.
- [Workflow Orchestration Engines](https://awesome-repositories.com/f/data-databases/workflow-orchestration-engines.md) — Coordinates distributed tasks and data dependencies across heterogeneous cloud environments and external infrastructure.
- [Data Asset Lifecycle Management](https://awesome-repositories.com/f/data-databases/asset-management/data-asset-lifecycle-management.md) — Models data assets as first-class primitives to manage lineage, dependencies, and quality from ingestion through delivery.
- [Data Asset Modeling](https://awesome-repositories.com/f/data-databases/data-asset-modeling.md) — Models data as declarative assets to track lineage, quality, and dependencies throughout the entire lifecycle.
- [Data Ingestion](https://awesome-repositories.com/f/data-databases/data-ingestion.md) — Provides modular tools to ingest and transform raw information from various sources into usable assets for downstream analysis. ([source](https://dagster.io/use-case/retail-e-commerce))
- [Data Partitioning](https://awesome-repositories.com/f/data-databases/data-partitioning.md) — Divides large datasets into logical slices to enable incremental processing, targeted re-runs, and efficient management of high-volume workflows.
- [Data Quality Frameworks](https://awesome-repositories.com/f/data-databases/data-quality-frameworks.md) — Integrates validation checks and automated policies directly into pipelines to ensure data integrity throughout the asset lifecycle. ([source](https://dagster.io/vs/dagster-vs-prefect))
- [Data Storage](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage.md) — Connects to cloud object stores, data warehouses, and lakehouse architectures to read, write, and version data assets securely. ([source](https://dagster.io/integrations))
- [Event-Driven Data Pipelines](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/event-driven-data-pipelines.md) — Executes data tasks based on custom schedules or external events to automate pipeline runs in response to real-time data changes. ([source](https://dagster.io/vs/dagster-vs-azure-data-factory))
- [Distributed Computing](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks/distributed-computing.md) — Launches and manages code execution across cloud-native, serverless, and containerized infrastructure to scale processing power. ([source](https://dagster.io/integrations))

### Web Development

- [Declarative Orchestration](https://awesome-repositories.com/f/web-development/web-infrastructure-deployment/asset-management-build-tools/asset-lifecycle-orchestration/declarative-orchestration.md) — Models data pipelines as a graph of versioned assets where the system automatically determines execution order based on dependency requirements.

### Artificial Intelligence & ML

- [Data Lineage](https://awesome-repositories.com/f/artificial-intelligence-ml/data-lineage.md) — Provides visual and searchable mapping of data flows to document relationships between source inputs and final downstream assets. ([source](https://dagster.io/solutions/etl-elt-pipleines))
- [Data Workflow Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/integrated-development-platforms/machine-learning-platforms/data-workflow-integrations.md) — Incorporates conversational models and lifecycle management tools directly into data workflows to automate complex processing. ([source](https://dagster.io/integrations))

### DevOps & Infrastructure

- [Data Pipeline Definitions](https://awesome-repositories.com/f/devops-infrastructure/infrastructure/infrastructure-as-code/orchestration-and-workflows/infrastructure-as-code-workflows/data-pipeline-definitions.md) — Expresses data workflows, resources, and scheduling logic as version-controlled code to enable standard software engineering practices. ([source](https://dagster.io/vs/dagster-vs-azure-data-factory))
- [Orchestration Engines](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure/cloud-computing-serverless/serverless-execution-environments/orchestration-engines.md) — Decouples the orchestration logic from the compute layer to allow tasks to run across diverse environments like containers or serverless.
- [Managed Control Planes](https://awesome-repositories.com/f/devops-infrastructure/enterprise-hosting-platforms/managed-control-planes.md) — Provides a hosted control plane with enterprise-grade features and insights for teams requiring fully managed infrastructure. ([source](https://dagster.io/vs/dagster-vs-dbt-cloud))
- [Infrastructure as Code](https://awesome-repositories.com/f/devops-infrastructure/infrastructure-as-code.md) — Defines deployment environments and infrastructure configurations using declarative files to integrate with existing workflows. ([source](https://dagster.io/blog))
- [Infrastructure as Code Practices](https://awesome-repositories.com/f/devops-infrastructure/infrastructure-as-code-practices.md) — Manages data pipelines and infrastructure configurations using software engineering practices like testing and CI/CD.
- [Deployment Environments](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies/execution-platforms-and-targets/deployment-environments.md) — Supports self-hosted, hybrid, and managed infrastructure deployment models to avoid vendor lock-in. ([source](https://dagster.io/vs/dagster-vs-aws-step-functions))
- [Resource Cost Management](https://awesome-repositories.com/f/devops-infrastructure/resource-cost-management.md) — Tracks and attributes resource consumption and compute expenses to specific data assets and pipeline runs.
- [Secret Management](https://awesome-repositories.com/f/devops-infrastructure/secret-management.md) — Retrieves and rotates sensitive API keys and configuration parameters from secure vaults automatically during pipeline execution. ([source](https://dagster.io/integrations))
- [Cloud Storage](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure/cloud-computing-serverless/cloud-storage.md) — Reads and writes data objects to cloud storage buckets to provide a resilient persistence layer for pipeline execution. ([source](https://dagster.io/use-case/software-technology))

### Software Engineering & Architecture

- [Configuration-as-Code Frameworks](https://awesome-repositories.com/f/software-engineering-architecture/configuration-as-code-frameworks.md) — Expresses entire data workflows and infrastructure resources as version-controlled code to enable standard software engineering practices.
- [Automated Quality Workflows](https://awesome-repositories.com/f/software-engineering-architecture/automated-quality-workflows.md) — Embeds validation checks and automated testing directly into data pipelines to ensure data integrity.
- [Legacy Migration Strategies](https://awesome-repositories.com/f/software-engineering-architecture/legacy-migration-strategies.md) — Converts existing workflow definitions into modern data assets while maintaining backward compatibility during system transitions. ([source](https://dagster.io/solutions/data-modernization))

### System Administration & Monitoring

- [Observability Platforms](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms.md) — Monitors the health, performance, and lineage of data workflows and assets through a unified interface.
- [Pipeline Observability Tools](https://awesome-repositories.com/f/system-administration-monitoring/pipeline-observability-tools.md) — Streams execution logs and performance metrics to external monitoring platforms to maintain unified observability and system health. ([source](https://dagster.io/integrations))
- [Execution Metadata](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/distributed-tracing-execution-analysis/execution-metadata/execution-metadata.md) — Captures and indexes execution logs, lineage, and asset state in a centralized store to provide unified visibility.
- [Automated Alerting Workflows](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/operational-health-alerting/automated-alerting-workflows.md) — Notifies teams immediately when data quality checks fail or budget thresholds are exceeded. ([source](https://dagster.io/platform-overview/data-quality))

### Development Tools & Productivity

- [Data Catalogs](https://awesome-repositories.com/f/development-tools-productivity/open-source-software/data-catalogs.md) — Maintains a centralized view of data assets, workflows, and lineage to help teams discover and reuse components. ([source](https://dagster.io/platform-overview))
- [Isolated Execution Environments](https://awesome-repositories.com/f/development-tools-productivity/isolated-execution-environments.md) — Creates ephemeral, production-mirroring sandboxes for pull requests to validate changes end-to-end. ([source](https://dagster.io/vs/dagster-vs-airflow))
- [Local Development Tools](https://awesome-repositories.com/f/development-tools-productivity/local-development-tools.md) — Provides isolated interfaces and testing frameworks that allow developers to simulate production environments and validate pipeline logic locally.
- [Production Operationalization](https://awesome-repositories.com/f/development-tools-productivity/platform-script-execution/production-operationalization.md) — Converts interactive notebooks and shell scripts into production-ready pipeline components to eliminate manual deployment overhead. ([source](https://dagster.io/integrations))
- [Local Development Environments](https://awesome-repositories.com/f/development-tools-productivity/local-development-environments.md) — Provides isolated interfaces for building, debugging, and mocking data workflows locally before production deployment. ([source](https://dagster.io/solutions/etl-elt-pipleines))

### Security & Cryptography

- [Data Access Governance](https://awesome-repositories.com/f/security-cryptography/data-access-governance.md) — Centralizes visibility across teams and environments while enforcing fine-grained access controls and maintaining audit-ready lineage for compliance. ([source](https://dagster.io/solutions/data-products))
- [Audit Logging](https://awesome-repositories.com/f/security-cryptography/audit-logging.md) — Maintains comprehensive logs of user actions and system changes to ensure compliance and visibility into operational history. ([source](https://dagster.io/enterprise))
- [User Identity Management](https://awesome-repositories.com/f/security-cryptography/user-identity-management.md) — Integrates with enterprise identity providers to enforce role-based permissions and automate user provisioning through standard authentication protocols. ([source](https://dagster.io/enterprise))

### Part of an Awesome List

- [Data Analysis and Processing](https://awesome-repositories.com/f/awesome-lists/data/data-analysis-and-processing.md) — Orchestration for data assets and pipelines.
- [Data Pipelines](https://awesome-repositories.com/f/awesome-lists/data/data-pipelines.md) — Orchestrates data pipelines for machine learning and ETL.
- [Data Processing](https://awesome-repositories.com/f/awesome-lists/data/data-processing.md) — Data orchestrator for ML and ETL workflows.
- [Workflow Orchestration](https://awesome-repositories.com/f/awesome-lists/data/workflow-orchestration.md) — Library for building and orchestrating data-intensive applications.
- [Data Engineering](https://awesome-repositories.com/f/awesome-lists/devops/data-engineering.md) — Data orchestrator for machine learning and ETL workflows.
- [Job Schedulers](https://awesome-repositories.com/f/awesome-lists/devops/job-schedulers.md) — Orchestration platform for data assets.
- [Scheduling](https://awesome-repositories.com/f/awesome-lists/devops/scheduling.md) — Orchestrates data pipelines for analytics and machine learning.
- [General Purpose Orchestration](https://awesome-repositories.com/f/awesome-lists/devtools/general-purpose-orchestration.md) — Data orchestrator for machine learning, analytics, and ETL pipelines.

### Testing & Quality Assurance

- [Unit Testing](https://awesome-repositories.com/f/testing-quality-assurance/software-testing/testing-frameworks/unit/unit-testing.md) — Verifies the logic and reliability of data processing code using unit, integration, and mock testing frameworks before deployment. ([source](https://dagster.io/learn/data-engineering))

### Networking & Communication

- [Data Synchronization and Consistency](https://awesome-repositories.com/f/networking-communication/distributed-systems-p2p/distributed-computing/data-synchronization-consistency.md) — Ensures reliable and uniform data outputs across different sources and timing intervals to maintain a single source of truth. ([source](https://dagster.io/use-case/retail-e-commerce))
