ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
DataX Web is a web-based management platform for scheduling, building, executing, and monitoring distributed data synchronization jobs powered by DataX. It provides a visual console for creating and managing DataX tasks without manual JSON configuration, with a distributed executor cluster that auto-registers worker nodes and supports configurable routing and blocking strategies for task distribution. The platform offers cron-based task scheduling with dynamic start, stop, and immediate status changes, along with incremental sync capabilities that pass dynamic parameters to extract only new o
Azkaban is a distributed workflow manager and DAG-based job orchestrator designed as an enterprise batch processor. It serves as a Java-based workflow engine that schedules and executes complex job sequences across a cluster of executor servers, with specific functionality for managing big data workloads on Hadoop clusters. The system distinguishes itself through a distributed executor model that coordinates state via a shared database to ensure high availability. It employs a plugin-based architecture that allows for custom job types and system functionality extensions, including the ability
dlt is a Python data ingestion tool and ETL pipeline framework designed to fetch data from diverse sources and persist it into structured destinations. It functions as a schema inference engine that automatically detects data types and flattens nested JSON structures into relational tables, moving data from sources to lakehouses, warehouses, or vector databases. The project distinguishes itself through AI-powered pipeline generation, using large language models to scaffold extraction code and connectors for REST APIs. It also supports multimodal vector storage and specialized population of ve