datastacktv/data-engineer-roadmap — A structured roadmap providing a clear path for individuals to learn the skills required for data engineering.

datatalksclub/data-engineering-zoomcamp — A comprehensive, open-source curriculum specifically designed for training in data engineering and pipeline construction.

argoproj/argo-workflows — A container-native workflow engine that is a standard tool for orchestrating complex data engineering pipelines.

Data Engineering

Data engineering tools for building scalable pipelines, distributed processing engines, workflow orchestration, and large-scale data transformation systems.

Find the best repos with AI.We'll search the best matching repositories with AI.

datastacktv/data-engineer-roadmap
datastacktv/data-engineer-roadmap
12,747View on GitHub
Roadmap to becoming a data engineer in 2021
A structured roadmap providing a clear path for individuals to learn the skills required for data engineering.
AI and Data ScienceData Engineering
View on GitHub12,747
datatalksclub/data-engineering-zoomcamp
DataTalksClub/data-engineering-zoomcamp
42,483View on GitHub
This project is an open-source educational curriculum designed to provide comprehensive training in data engineering. It focuses on building scalable data pipelines and managing cloud-native infrastructure through a structured, self-paced program that combines technical explanations with hands-on practical exercises. The curriculum distinguishes itself by emphasizing industry-standard methodologies, specifically teaching students how to implement infrastructure as code and manage data workflows through orchestration tools. By utilizing container-based environment isolation and declarative configuration, the program ensures that learners gain experience with reproducible deployments and consistent development environments across distributed systems. The training covers a broad range of technical topics, including the design of automated data processing tasks and the configuration of cloud resources. The materials are organized into modular, progressive units that build foundational knowledge before advancing to complex engineering workflows. The course materials are hosted in a centralized repository, which facilitates community-supported updates and collaborative improvements to the educational assets.
A comprehensive, open-source curriculum specifically designed for training in data engineering and pipeline construction.
Jupyter NotebookData EngineeringData Engineering
View on GitHub42,483
argoproj/argo-workflows
argoproj/argo-workflows
16,466View on GitHub
Argo Workflows is a container-native workflow engine that functions as a Kubernetes custom resource controller. It orchestrates complex sequences of containerized tasks by executing them as directed acyclic graphs, allowing for dependency management and parallel processing within a cluster. The system extends the native Kubernetes control plane to manage the full lifecycle of automated processes, from initial triggering to final resource cleanup. The platform distinguishes itself through its controller-pattern reconciliation, which continuously monitors workflow states to align them with desired configurations. It supports event-driven execution, enabling workflows to trigger based on external signals or time-based schedules. Users can define reusable operational patterns through a centralized template management system, ensuring consistency across distributed environments. The engine provides a comprehensive suite of tools for managing multi-step pipelines, including sidecar-based artifact management for data transfer between steps and external storage providers. It includes built-in administrative interfaces for visualizing execution progress, monitoring performance metrics, and enforcing security through standard authentication and authorization protocols. The system is designed to handle diverse operational requirements, ranging from automated batch processing and data engineering to infrastructure maintenance and software delivery pipelines.
A container-native workflow engine that is a standard tool for orchestrating complex data engineering pipelines.
GoData Engineering PipelinesData Pipelines
View on GitHub16,466

Data Engineering

datastacktv/data-engineer-roadmap

DataTalksClub/data-engineering-zoomcamp

argoproj/argo-workflows