1 repo
Systems designed to automate the movement, transformation, and synchronization of data between disparate storage locations and distributed environments.
Distinguishing note: Focuses on the orchestration of data movement and synchronization pipelines rather than raw database storage or query execution.
Explore 1 awesome GitHub repository matching data & databases · Data Integration Tools. Refine with filters or upvote what's useful.
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Automate the movement of information between disparate storage locations and distributed file systems to ensure data pipelines remain consistent and up to date.