What are the best Awesome Data Processing Workflows GitHub Repositories?

Question 1

Accepted Answer

Systems for defining, scheduling, and executing complex sequences of data analysis and transformation tasks.

**Distinguishing note:** Focuses on the orchestration of analytical queries within automated pipelines rather than the storage engine itself.

Explore 10 awesome GitHub repositories matching data & databases · Data Processing Workflows. Refine with filters or upvote what's useful. Top picks: apache/airflow, lfnovo/open-notebook, spotify/luigi, heibaiying/bigdata-notes, mikefarah/yq, apa…

Question 2

Why is apache/airflow a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Execute complex data analysis and graph traversals against distributed stores to incorporate advanced insights directly into automated data processing workflows.

Question 3

Why is lfnovo/open-notebook a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Organizes complex information processing tasks into collaborative workflows to simplify project tracking and team productivity.

Question 4

Why is spotify/luigi a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Tracks data table and partition existence to coordinate dependencies within complex data processing workflows.

Question 5

Why is heibaiying/bigdata-notes a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Covers the execution and definition of batch and stream processing tasks using distributed computing engines.

Question 6

Why is mikefarah/yq a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Automates complex data manipulation and aggregation tasks within shell-based scripting workflows.

Question 7

Why is apache/dolphinscheduler a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Coordinates large-scale data processing jobs across diverse infrastructure to ensure reliable data movement.

Question 8

Why is automatisch/automatisch a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Processes trigger events and action responses within the system to facilitate data movement between workflow steps.

Question 9

Why is great-expectations/great_expectations a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Integrates validation steps directly into data processing workflows to ensure reliability during scheduled jobs.

Question 10

Why is cue-lang/cue a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Orchestrates sequences of data processing steps driven by constraint unification.

Question 11

Why is databricks/learning-spark a recommended Data Processing Workflows GitHub Repositories repository?

Accepted Answer

Guides the definition and execution of complex sequences of data analysis and transformation tasks.

Awesome GitHub RepositoriesData Processing Workflows

apache/airflow

lfnovo/open-notebook

spotify/luigi

heibaiying/BigData-Notes

mikefarah/yq

apache/dolphinscheduler

automatisch/automatisch

great-expectations/great_expectations

cue-lang/cue

databricks/learning-spark

Explorer les sous-tags