# Machine Learning Pipeline Orchestrators

> Search results for `orchestrate machine learning pipelines as DAGs` on awesome-repositories.com. 117 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/orchestrate-machine-learning-pipelines-as-dags

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/orchestrate-machine-learning-pipelines-as-dags).**

## Results

- [jack-cherish/machine-learning](https://awesome-repositories.com/repository/jack-cherish-machine-learning.md) (10,333 ⭐) — This project is a collection of supervised and unsupervised machine learning algorithms implemented from scratch using Python. It serves as an educational resource for studying model training, parameter optimization, and the implementation of core predictive models.

The library provides a variety of supervised learning tools, including linear and logistic regression, decision trees, and support vector machines. It also features unsupervised learning capabilities for discovering patterns in unlabeled datasets through clustering algorithms.

Broad capability areas include ensemble learning thro
- [aladdinpersson/machine-learning-collection](https://awesome-repositories.com/repository/aladdinpersson-machine-learning-collection.md) (8,465 ⭐) — This project is a machine learning educational repository providing a collection of implementations and guides for machine learning and deep learning algorithms. It serves as a deep learning model library and a reference for training workflows, covering foundational machine learning, convolutional, recurrent, and transformer architectures.

The collection includes a generative adversarial network suite for synthesizing realistic images and performing image-to-image translation. It also functions as a computer vision implementation guide for object detection and semantic segmentation, alongside
- [deepset-ai/haystack](https://awesome-repositories.com/repository/deepset-ai-haystack.md) (24,253 ⭐) — Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis.

The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
- [google-ai-edge/mediapipe](https://awesome-repositories.com/repository/google-ai-edge-mediapipe.md) (35,660 ⭐) — MediaPipe is a cross-platform machine learning framework designed for deploying vision, audio, and text processing models across mobile, desktop, and web environments. It functions as an on-device inference engine that executes complex models locally on edge hardware, ensuring low latency and privacy without requiring a constant internet connection.

The framework utilizes a graph-based pipeline orchestration system where data flows through a directed network of modular calculators to ensure synchronized and deterministic processing. It distinguishes itself through a unified runtime that provi
- [gokumohandas/made-with-ml](https://awesome-repositories.com/repository/gokumohandas-made-with-ml.md) (48,343 ⭐) — Made-With-ML is an automated documentation generator and developer experience platform designed to transform source code into structured, searchable reference websites. It functions as a codebase intelligence tool that parses implementation details to provide clear explanations of logic and data requirements.

The system distinguishes itself by leveraging language-level type annotations and structured code comments to generate interface specifications. By utilizing static analysis to extract metadata, it automates the transformation of docstrings into web-ready documentation, ensuring that tec
- [kubeflow/pipelines](https://awesome-repositories.com/repository/kubeflow-pipelines.md) (4,154 ⭐) — Machine Learning Pipelines for Kubeflow
- [ethen8181/machine-learning](https://awesome-repositories.com/repository/ethen8181-machine-learning.md) (3,445 ⭐) — :earth_americas: machine learning tutorials (mainly in Python3)
- [flyteorg/flyte](https://awesome-repositories.com/repository/flyteorg-flyte.md) (7,095 ⭐) — Flyte is a Kubernetes-based machine learning orchestrator and containerized pipeline manager designed for coordinating AI workflows and data pipelines. It functions as an engine for defining and executing resilient pipelines, utilizing a data lineage tracker to maintain immutable execution states and ensure reproducible outputs.

The platform distinguishes itself by packaging individual tasks into separate containers to ensure dependency isolation and environment consistency. It provides specialized capabilities for machine learning, including the transformation of trained models into scalable
- [aws/aws-cdk](https://awesome-repositories.com/repository/aws-aws-cdk.md) (12,817 ⭐) — The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane.

The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
- [mastra-ai/mastra](https://awesome-repositories.com/repository/mastra-ai-mastra.md) (21,221 ⭐) — Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention.

The framework distinguishes itself through its focus on observability and secure, isolated execut
- [avelino/awesome-go](https://awesome-repositories.com/repository/avelino-awesome-go.md) (175,576 ⭐) — This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains.

The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing,
- [shunliz/machine-learning](https://awesome-repositories.com/repository/shunliz-machine-learning.md) (1,424 ⭐) — 机器学习原理笔记整理. Gitbook地址https://shunliz.gitbooks.io/machine-learning/content/ 前半部分关注数学基础，机器学习和深度学习的理论部分，详尽的公式推导。 后半部分关注工程实践和理论应用部分
- [jeff1evesque/machine-learning](https://awesome-repositories.com/repository/jeff1evesque-machine-learning.md) (258 ⭐) — Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)
- [flowiseai/flowise](https://awesome-repositories.com/repository/flowiseai-flowise.md) (53,641 ⭐) — Flowise is a low-code platform designed for building and deploying complex language model workflows through a visual, node-based interface. It functions as an orchestrator for autonomous multi-agent systems, allowing users to construct conversational pipelines by connecting language models, memory stores, and external tools on a drag-and-drop canvas.

The platform distinguishes itself through its support for sophisticated agentic patterns, including supervisor-worker delegation and iterative reasoning strategies. Users can design directed acyclic graphs to manage conditional branching, state p
- [techindicium/dbt-dag-monitoring](https://awesome-repositories.com/repository/techindicium-dbt-dag-monitoring.md) (30 ⭐) — This package allows you to easily monitor your DAGs from well known orchestration tools, providing helpful info to improve your data pipeline.
- [vllm-project/llm-compressor](https://awesome-repositories.com/repository/vllm-project-llm-compressor.md) (2,764 ⭐) — llm-compressor is a quantization toolkit and post-training library designed to reduce the memory footprint and size of large language models. It provides a framework for compressing models using weight and activation quantization to enable more efficient deployment.

The project distinguishes itself through a distributed quantization framework that utilizes data-parallel processing and disk-based weight offloading to handle massive model checkpoints that exceed available system memory. It includes specialized compressors for diverse architectures, including Mixture-of-Experts, Vision-Language,
- [harvard-edge/cs249r_book](https://awesome-repositories.com/repository/harvard-edge-cs249r-book.md) (20,217 ⭐) — This project is a comprehensive educational framework designed to teach the design, deployment, and performance optimization of machine learning systems. It provides a structured curriculum that covers the full stack of artificial intelligence engineering, ranging from the construction of core framework components like tensors and automatic differentiation engines to the orchestration of large-scale distributed training clusters.

The platform distinguishes itself through its integration of physics-grounded systems modeling and interactive simulation environments. Users can experiment with dis
- [instillai/machine-learning-course](https://awesome-repositories.com/repository/instillai-machine-learning-course.md) (7,043 ⭐) — ################################################### A Machine Learning Course with Python ###################################################
- [thedotmack/claude-mem](https://awesome-repositories.com/repository/thedotmack-claude-mem.md) (82,698 ⭐) — Claude-mem is an agentic memory persistence system designed to provide AI assistants with long-term context across multiple development sessions. It functions as a background orchestrator that captures, summarizes, and indexes interaction history, allowing models to maintain continuity and recall technical decisions from past tasks. By utilizing a vector-augmented context engine, the system injects relevant historical observations into active sessions, ensuring that AI agents remain informed without exceeding finite token budgets.

The project distinguishes itself through an endless memory arc
- [allegroai/clearml](https://awesome-repositories.com/repository/allegroai-clearml.md) (6,733 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving.

The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating r
- [machinelearningmindset/machine-learning-course](https://awesome-repositories.com/repository/machinelearningmindset-machine-learning-course.md) (7,043 ⭐) — ################################################### A Machine Learning Course with Python ###################################################
- [clearml/clearml](https://awesome-repositories.com/repository/clearml-clearml.md) (6,740 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts.

The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and
- [netflix/metaflow](https://awesome-repositories.com/repository/netflix-metaflow.md) (9,764 ⭐) — Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments.

The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
- [crewaiinc/crewai](https://awesome-repositories.com/repository/crewaiinc-crewai.md) (53,687 ⭐) — CrewAI is a multi-agent orchestration framework designed for building autonomous systems that execute complex, multi-step workflows. It provides a development platform where specialized agents are defined with specific roles, goals, and tool sets to perform tasks collaboratively. By leveraging a declarative workflow engine, the system manages task dependencies, state transitions, and execution logic, allowing for the creation of structured, stateful sequences of operations.

The framework distinguishes itself through its hierarchical management capabilities, which utilize manager agents to coo
- [ajaymache/machine-learning-yearning](https://awesome-repositories.com/repository/ajaymache-machine-learning-yearning.md) (1,135 ⭐) — Machine Learning Yearning book by  🅰️𝓷𝓭𝓻𝓮𝔀 🆖
- [gfx-rs/wgpu](https://awesome-repositories.com/repository/gfx-rs-wgpu.md) (17,382 ⭐) — This project is a cross-platform graphics and compute framework that provides a unified, hardware-agnostic abstraction layer for rendering and parallel processing. It enables developers to build high-performance applications that execute consistently across diverse operating systems and hardware backends, including Vulkan, Metal, and DirectX. By mapping high-level graphics commands to native APIs, it serves as a portable foundation for both real-time 3D rendering and general-purpose GPU computing.

The framework distinguishes itself through a robust architecture that supports both native deskt
- [dataelement/bisheng](https://awesome-repositories.com/repository/dataelement-bisheng.md) (11,455 ⭐) — Bisheng is an enterprise AI framework and LLM DevOps platform designed to manage the full lifecycle of large language models. It provides a unified system for dataset curation, supervised fine-tuning, model versioning, and performance evaluation.

The platform features a visual workflow orchestrator for building retrieval-augmented generation pipelines and complex task sequences using flowcharts with conditional logic and human intervention points. It also includes an AI agent framework that uses a specialized guidance language to embed domain expertise and professional business logic into aut
- [huggingface/transformers](https://awesome-repositories.com/repository/huggingface-transformers.md) (161,630 ⭐) — Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference.

The library features extensive support for model optimization and
- [timzhang642/3d-machine-learning](https://awesome-repositories.com/repository/timzhang642-3d-machine-learning.md) (10,176 ⭐) — A resource repository for 3D machine learning
- [dagster-io/dagster](https://awesome-repositories.com/repository/dagster-io-dagster.md) (14,974 ⭐) — Dagster is a data orchestration platform designed to manage the entire lifecycle of data assets through declarative modeling and version-controlled code. It functions as a workflow engine that treats data assets as first-class primitives, allowing teams to define, schedule, and monitor complex pipelines while maintaining clear visibility into lineage, dependencies, and data quality.

The platform distinguishes itself by using a code-as-configuration framework that enables standard software engineering practices, such as unit testing and local mocking, to be applied directly to data workflows.
- [dformoso/machine-learning-mindmap](https://awesome-repositories.com/repository/dformoso-machine-learning-mindmap.md) (6,254 ⭐) — A Mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
- [labring/fastgpt](https://awesome-repositories.com/repository/labring-fastgpt.md) (27,132 ⭐) — FastGPT is a comprehensive platform for building, deploying, and managing context-aware artificial intelligence applications. It provides a unified environment that integrates custom data sources with language models, utilizing a retrieval-augmented generation engine to ground responses in accurate, domain-specific information. The system is designed for enterprise-scale use, featuring multi-tenant architecture, administrative controls, and secure authentication protocols including OAuth 2.0 and custom single sign-on integration.

The platform distinguishes itself through a visual, node-based
- [bristolmyerssquibb/blockr.dag](https://awesome-repositories.com/repository/bristolmyerssquibb-blockr-dag.md) (0 ⭐) — status](https://www.r-pkg.org/badges/version/blockr.dag)](https://CRAN.R-project.org/package=blockr.dag)
- [datawhalechina/prompt-engineering-for-developers](https://awesome-repositories.com/repository/datawhalechina-prompt-engineering-for-developers.md) (24,267 ⭐) — This project is a technical curriculum and development guide focused on large language model prompt engineering, fine-tuning, and the creation of retrieval augmented generation applications. It serves as a comprehensive resource for developers to master crafting precise instructions and textual patterns to improve the quality and predictability of model outputs.

The material covers the end-to-end workflow of adapting open-source models to specific datasets and integrating language models with vector databases to generate responses based on private information. It also provides a systematic ap
- [bytebytegohq/system-design-101](https://awesome-repositories.com/repository/bytebytegohq-system-design-101.md) (83,491 ⭐) — This project is a centralized engineering knowledge repository that provides a structured curriculum for mastering system design, architectural patterns, and fundamental software development workflows. It serves as a professional development resource for engineers, offering foundational knowledge and real-world case studies to support the design of scalable, secure, and efficient distributed systems.

The repository distinguishes itself through a visual-first approach to knowledge synthesis, distilling complex technical concepts into high-density graphical diagrams and succinct illustrations.
- [openbmb/ultrarag](https://awesome-repositories.com/repository/openbmb-ultrarag.md) (5,220 ⭐) — UltraRAG is an LLM RAG orchestration platform and AI agent research framework designed to coordinate complex retrieval-augmented generation workflows. It functions as a multimodal RAG engine capable of retrieving and generating responses using text, images, and diverse data types, while providing tools for vector database management and RAG performance evaluation.

The platform features a visual RAG pipeline builder that uses a canvas interface to construct and debug data flows, synchronizing visual designs directly with underlying code. It distinguishes itself through an autonomous research s
- [awslabs/machine-learning-samples](https://awesome-repositories.com/repository/awslabs-machine-learning-samples.md) (881 ⭐) — Sample applications built using AWS' Amazon Machine Learning.
- [josephmisiti/machine-learning-module](https://awesome-repositories.com/repository/josephmisiti-machine-learning-module.md) (477 ⭐) — the best machine learning tutorials on the web
- [hkuds/lightrag](https://awesome-repositories.com/repository/hkuds-lightrag.md) (36,651 ⭐) — LightRAG is a graph-based retrieval framework designed to build retrieval-augmented generation pipelines. It structures unstructured text into knowledge graphs, enabling multi-hop reasoning and complex query synthesis across large document collections. By integrating dense vector embeddings with structured knowledge graphs, the system facilitates both similarity-based and relationship-aware information retrieval.

The framework distinguishes itself through a dual-level retrieval strategy that combines low-level keyword matching with high-level semantic graph traversal to capture both specific
- [clickhouse/clickhouse](https://awesome-repositories.com/repository/clickhouse-clickhouse.md) (48,229 ⭐) — ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring.

The platform distinguishes itself through ad
- [serengil/deepface](https://awesome-repositories.com/repository/serengil-deepface.md) (22,226 ⭐) — Deepface is a comprehensive deep learning library for facial recognition and demographic analysis. It provides a modular pipeline that handles the entire lifecycle of facial processing, including detection, geometric alignment, and the transformation of facial images into high-dimensional numerical vector embeddings for identity verification and similarity comparison.

The library distinguishes itself through a model ensemble approach, which combines predictions from multiple pre-trained neural networks to improve classification accuracy and reduce bias. It also integrates advanced security fe
- [ente-io/ente](https://awesome-repositories.com/repository/ente-io-ente.md) (27,281 ⭐) — Ente is a privacy-focused platform for end-to-end encrypted storage and two-factor authentication management. It functions as a zero-knowledge identity provider, ensuring that all cryptographic operations, key derivation, and data encryption occur locally on the user's device. By maintaining this architecture, the service provider remains unable to access or decrypt any stored personal information or authentication credentials.

The platform distinguishes itself through a combination of on-device intelligence and resilient data distribution. It utilizes a local machine learning engine to perfo
- [1094401996/machine-learning-coursera](https://awesome-repositories.com/repository/1094401996-machine-learning-coursera.md) (95 ⭐) — Lecture notes and assignments for coursera machine learning class
- [agentscope-ai/agentscope](https://awesome-repositories.com/repository/agentscope-ai-agentscope.md) (26,895 ⭐) — Agentscope is a comprehensive toolkit for developing and orchestrating autonomous multi-agent systems. It provides a unified framework for building agents that can reason, execute tools, and manage memory, enabling the creation of complex, collaborative workflows where multiple specialized agents interact to solve multi-step objectives.

The platform distinguishes itself through a robust orchestration engine that supports both sequential and concurrent agent pipelines. It utilizes a centralized event bus for real-time telemetry, allowing developers to track agent reasoning, tool usage, and sys
- [deeppavlov/deeppavlov](https://awesome-repositories.com/repository/deeppavlov-deeppavlov.md) (6,985 ⭐) — DeepPavlov is a conversational AI framework and deep learning NLP library designed for building end-to-end dialogue systems and chatbots. It functions as an NLP pipeline orchestrator that allows users to compose pre-trained models and text processing components into sequential data flows for complex linguistic tasks.

The system is distinguished by its ability to act as a chatbot deployment server, exposing trained conversational models as web services via REST and Socket APIs. It utilizes JSON-based pipeline configurations and dynamic variable interpolation to decouple model logic from infras
- [sahith02/machine-learning-algorithms](https://awesome-repositories.com/repository/sahith02-machine-learning-algorithms.md) (376 ⭐) — A curated list of all machine learning algorithms and deep learning algorithms grouped by category.
- [alirezadir/machine-learning-interviews](https://awesome-repositories.com/repository/alirezadir-machine-learning-interviews.md) (8,455 ⭐) — This project is a comprehensive machine learning interview guide and technical study resource designed for individuals preparing for machine learning and AI engineering roles. It provides a collection of materials and practice problems covering core algorithms, theoretical fundamentals, and the implementation of neural network architectures.

The resource serves as a technical reference for generative AI development, focusing on the design and optimization of large language models and diffusion systems. It includes frameworks for system design, covering the architecture of production machine l
- [rvc-boss/gpt-sovits](https://awesome-repositories.com/repository/rvc-boss-gpt-sovits.md) (58,724 ⭐) — GPT-SoVITS is a text-to-speech synthesis engine and voice cloning toolkit designed for generating natural-sounding human speech. It functions as a neural audio processing pipeline that maps input text to high-fidelity audio waveforms, utilizing conditional variational autoencoders and flow-based decoders to ensure expressive output.

The platform distinguishes itself through its ability to perform few-shot voice cloning and cross-lingual speech generation, allowing users to maintain a specific speaker's vocal identity and emotional delivery across multiple languages. By employing cross-modal l
- [jakevdp/pythondatasciencehandbook](https://awesome-repositories.com/repository/jakevdp-pythondatasciencehandbook.md) (48,561 ⭐) — This project is an interactive data science environment that combines code execution, rich media visualization, and narrative documentation into a persistent, browser-based platform. It serves as a comprehensive educational resource for scientific computing, providing a framework for iterative data analysis and machine learning prototyping.

The environment is distinguished by its focus on high-performance numerical computing, utilizing vectorized array operations and memory-mapped data structures to handle large-scale computations efficiently. It features a unified estimator interface that st
- [hashicorp/terraform](https://awesome-repositories.com/repository/hashicorp-terraform.md) (48,720 ⭐) — Terraform is a declarative infrastructure-as-code tool designed to manage the lifecycle of cloud and on-premises resources. It functions as a workflow engine that reconciles a defined desired state against real-world infrastructure, using a persistent state-tracking layer to maintain consistency and visibility across distributed environments. By mapping infrastructure components into a directed acyclic graph, the system calculates the optimal order for provisioning, updating, or destroying resources.

The platform is distinguished by its extensible plugin-based architecture, which decouples co
