30 open-source projects similar to iterative/cml, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Cml alternative.
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
DVC is a data versioning tool and pipeline orchestrator designed to track large datasets and machine learning models. It functions as a system for managing large data artifacts by storing lightweight metadata in version control while keeping the actual binaries in a separate cache. The project serves as an experiment tracker and remote storage synchronizer, enabling the execution and comparison of machine learning iterations based on hyperparameters and performance metrics. It provides a bridge for pushing and pulling these large data artifacts between local environments and cloud or on-premi
🏕️ Reproducible development environment for humans and agents
TransformerLab is an MLOps orchestration platform and research environment designed for the training, fine-tuning, and evaluation of large language models. It serves as a centralized control plane for managing machine learning jobs and coordinating distributed GPU compute across hybrid cloud and on-premise providers. The platform distinguishes itself through agent-driven model optimization, using AI assistants to analyze metrics and automatically propose and queue hyperparameter experiments. It provides a remote development environment that allows users to launch interactive notebooks, code e
ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving. The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating r
Prefect is a workflow orchestration platform designed to define, schedule, and monitor complex data pipelines as Python code. It functions as a container-native engine that wraps individual tasks in isolated environments, ensuring consistent dependencies and resource allocation across diverse infrastructure. By utilizing a state-machine-based orchestration model, the system tracks execution progress through discrete transitions and persistent event logs to maintain reliable and observable task processing. The platform distinguishes itself through a decoupled worker-API architecture, which sep
Inngest is a durable execution framework and event-driven automation engine designed to orchestrate background workflows. It enables developers to build resilient, stateful processes by memoizing function steps, ensuring that long-running tasks can automatically resume from the last successful operation after failures, timeouts, or infrastructure restarts. The platform distinguishes itself through its event-driven architecture, which uses a schema-validated bus to trigger functions and coordinate complex, multi-step logic. It employs an onion-model middleware approach for cross-cutting concer
🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
The Data Science Lifecycle Process is a process for taking data science teams from Idea to Value repeatedly and sustainably. The process is documented in this repo.
Curated set of transformers that make your work with steppy faster and more effective :telescope:
Examples of Machine Learning code using Comet.ml
Template repository for data science lifecycle project
Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardrails. The platform distinguishes itself through a combination of model-based scoring and heuristic metrics to quantify output quality and detect hallucinations. It includes a dedicated prompt and agent optimizer with an interactive playground for refining templates and tool configurations. For retri
Hopsworks - Data-Intensive AI platform with a Feature Store
Lightweight, Python library for fast and reproducible experimentation :microscope:
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Axiom is a cloud infrastructure orchestrator and distributed security scanning framework. It serves as a manager for deploying, snapshotting, and destroying disposable virtual machine fleets across multiple cloud providers and regions. The project distinguishes itself by automating the provisioning of vulnerability toolsets and security auditing software across these remote servers. It features a mechanism for distributing security scans by sharding target lists across a fleet of instances and aggregating the resulting data into unified files and HTML reports. The system covers a broad range
Tau is a unified development and operations environment designed as a DevOps coordination platform. It functions as a fullstack development workspace that synchronizes coding and infrastructure management tasks for both human users and automated machine processes. The system acts as a hybrid infrastructure orchestrator, employing modular drivers to deploy and manage software across both public cloud providers and self-hosted private servers. It provides a shared workspace that integrates command line interface operations with infrastructure deployment to unify development and operational work
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Osmedeus is a security workflow orchestration engine that coordinates AI agents, shell commands, and scanning tools through declarative YAML pipelines. It functions as a distributed security scanner, a declarative workflow automator, and an AI agent framework for security, enabling automated multi-step security analysis with conditional branching, parallel execution, and distributed workers. The engine distinguishes itself through a hybrid runner model that executes workflow steps on the local host, inside Docker containers, or over SSH to remote machines, selected per step or module. It supp
Allure is a test reporting framework that normalizes execution data from multiple test frameworks across different programming languages into a common intermediate format. It aggregates results from multiple sources into a shared directory of JSON files and generates self-contained HTML reports through a modular plugin pipeline. The architecture includes a hierarchical step tree model to represent test execution, metadata annotation injection to enrich results at runtime, and directory-watch incremental rendering that regenerates reports in real time as new data arrives. Unlike generic report
PyCaret is a Python AutoML platform and MLOps lifecycle manager designed to automate machine learning workflows. It functions as a low-code environment that leverages a scikit-learn native engine to execute preprocessing, training, and evaluation for tabular data. The platform distinguishes itself as an LLM-powered ML copilot, using large language model agents to analyze datasets, design experiment configurations, and explain model results. It also serves as a Kubernetes ML orchestrator and model registry, enabling the versioning of trained pipelines and their promotion to production API endp
Aim is an open-source platform for logging, visualizing, and comparing machine learning training runs and LLM traces. It provides a remote tracking server and a comparison UI, functioning as an ML experiment tracker, AI workflow logger, and LLM trace recorder that captures prompts, generations, and tool calls from AI applications. The platform distinguishes itself through a run-based data model with local SQLite storage, real-time metric streaming, and a plugin-based explorer system that supports specialized visual analysis of metrics, images, audio, and text. It offers a Python SDK with cont
This project is a containerized build automation system and self-hosted DevOps platform provided as a Docker image. It serves as a distributed build orchestrator and a Dockerized continuous integration and delivery server, ensuring consistent execution environments across different infrastructure. The system distinguishes itself through a distributed execution model that separates a primary controller from multiple remote agents connected via SSH, TCP, or web sockets. It utilizes a modular extensibility framework that allows the core system functionality to be augmented through the installati
Ludwig is a multimodal machine learning platform and low-code framework designed for building, training, and deploying neural networks. It enables the construction of models that process text, images, audio, and tabular data through a unified interface using declarative configuration files rather than custom code. The system features a specialized low-code framework for large language models, supporting supervised fine-tuning, preference alignment, and a constrained decoding tool to force structured data output via logit extraction. It also includes an automated model architecture search to i
Airflow is a platform for programmatically authoring, scheduling, and monitoring complex data pipelines. It functions as a workflow automation engine that manages the lifecycle of recurring business processes by executing code-defined task dependencies. By representing workflows as directed acyclic graphs, the system ensures that task execution order and data flow are explicitly defined and reliably maintained across distributed computing environments. The platform distinguishes itself through a highly modular, provider-based architecture that decouples core orchestration logic from external
Cleanlab is a data-centric AI library and toolkit designed to improve machine learning model performance by detecting label errors and increasing overall dataset quality. It implements a confident learning framework that iteratively refines label noise estimates by comparing model predictions with estimated label probabilities to identify mislabeled examples. The project provides specialized utilities for active learning optimization, allowing for the selection of the most impactful examples for labeling or re-labeling. It also includes an outlier detection tool to identify atypical data poin
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)
The official implementation of "The Shapley Value of Classifiers in Ensemble Games" (CIKM 2021).