Dagster | Awesome Repository

Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality.

The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows. Its architecture is built on a pluggable execution engine that decouples orchestration logic from the underlying compute, allowing tasks to run across diverse cloud-native, serverless, and containerized environments. Furthermore, it supports partition-aware scheduling, which enables incremental processing and efficient management of high-volume datasets.

Beyond core orchestration, the system provides a comprehensive suite of tools for data platform management, including automated quality governance, infrastructure cost optimization, and centralized asset cataloging. It integrates with enterprise identity providers for access control and offers robust observability features, such as streaming logs and visual lineage tracking, to ensure system health and compliance.

The platform supports a variety of deployment models, ranging from self-hosted and hybrid configurations to a fully managed control plane. It includes specialized utilities for migrating legacy pipelines and operationalizing interactive scripts into production-ready components.

Features

Data Pipeline Orchestration - Builds and schedules complex data workflows as version-controlled code to ensure reliable execution.
Workflow Orchestration Engines - Coordinates distributed tasks and data dependencies across heterogeneous cloud environments and external infrastructure.
Declarative Orchestration - Models data pipelines as a graph of versioned assets where the system automatically determines execution order based on dependency requirements.
Data Lineage - Provides visual and searchable mapping of data flows to document relationships between source inputs and final downstream assets.

Features

Data Pipeline Orchestration - Builds and schedules complex data workflows as version-controlled code to ensure reliable execution.
Workflow Orchestration Engines - Coordinates distributed tasks and data dependencies across heterogeneous cloud environments and external infrastructure.
Declarative Orchestration - Models data pipelines as a graph of versioned assets where the system automatically determines execution order based on dependency requirements.
Data Lineage - Provides visual and searchable mapping of data flows to document relationships between source inputs and final downstream assets.