# Machine Learning Experiment Tracking

> Search results for `experiment tracking for ML training runs` on awesome-repositories.com. 115 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/experiment-tracking-for-ml-training-runs

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/experiment-tracking-for-ml-training-runs).**

## Results

- [microsoft/ml-for-beginners](https://awesome-repositories.com/repository/microsoft-ml-for-beginners.md) (86,919 ⭐) — This project is an open-source educational curriculum designed to provide a structured path for developers to master machine learning and generative AI. It functions as a technical skill development platform, offering comprehensive study materials that guide learners through fundamental concepts, algorithms, and the practical implementation of artificial intelligence models from scratch.

The curriculum distinguishes itself through a pedagogy centered on interactive Jupyter Notebooks, which allow students to execute code cells directly within narrative documents for immediate visual feedback.
- [allegroai/clearml](https://awesome-repositories.com/repository/allegroai-clearml.md) (6,733 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving.

The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating r
- [abhineet123/deep-learning-for-tracking-and-detection](https://awesome-repositories.com/repository/abhineet123-deep-learning-for-tracking-and-detection.md) (2,508 ⭐) — This project is a curated research repository and structured index focused on deep learning techniques for object detection and tracking. It serves as a centralized archive for academic papers, datasets, and software implementations, providing a cohesive resource for studying methodologies used in image and video analysis.

The repository distinguishes itself through a systematic approach to knowledge management, utilizing hierarchical file organization and metadata-driven tagging to categorize technical literature. By indexing domain-specific datasets and cross-referencing academic resources,
- [facebookresearch/audiocraft](https://awesome-repositories.com/repository/facebookresearch-audiocraft.md) (23,379 ⭐) — Audiocraft is a deep learning audio library and machine learning framework designed for training, fine-tuning, and evaluating generative models for music and sound effects. It functions as a text-to-music generative model and a neural audio codec, providing the tools necessary to compress audio signals into discrete representations and synthesize high-fidelity waveforms from textual descriptions.

The framework is distinguished by its ability to combine multiple conditioning signals, allowing for the generation of audio based on text prompts, melodic excerpts, or style-based audio clips. It al
- [fastshift/x-track](https://awesome-repositories.com/repository/fastshift-x-track.md) (6,250 ⭐) — X-Track is a firmware project for an embedded bicycle computer that combines GPS-based speed and ride metrics with offline map navigation. It functions as a GPS bicycle speedometer, displaying speed, distance, altitude, and other ride data on a handlebar-mounted screen, while also serving as an offline map viewer that renders locally stored map tiles without an internet connection.

The project distinguishes itself by including a firmware emulator that runs the embedded code on a PC, enabling development and testing without physical hardware. It also provides GPS-based clock calibration to aut
- [eleutherai/gpt-neox](https://awesome-repositories.com/repository/eleutherai-gpt-neox.md) (7,392 ⭐) — gpt-neox is a distributed training system and framework for building large-scale autoregressive language models. It implements the transformer architecture and provides a toolkit for training models with billions of parameters by distributing weights across compute clusters.

The framework distinguishes itself through extensive support for distributed model parallelism, including pipeline and sequence parallelism, to overcome single-device memory limits. It further supports sparse model architectures using a mixture of experts system with Sinkhorn-based routing.

The project covers a broad ran
- [gokumohandas/made-with-ml](https://awesome-repositories.com/repository/gokumohandas-made-with-ml.md) (48,343 ⭐) — Made-With-ML is an automated documentation generator and developer experience platform designed to transform source code into structured, searchable reference websites. It functions as a codebase intelligence tool that parses implementation details to provide clear explanations of logic and data requirements.

The system distinguishes itself by leveraging language-level type annotations and structured code comments to generate interface specifications. By utilizing static analysis to extract metadata, it automates the transformation of docstrings into web-ready documentation, ensuring that tec
- [paulescu/hands-on-train-and-deploy-ml](https://awesome-repositories.com/repository/paulescu-hands-on-train-and-deploy-ml.md) (885 ⭐) — Train and Deploy an ML REST API to predict crypto prices, in 10 steps
- [apple/ml-ferret](https://awesome-repositories.com/repository/apple-ml-ferret.md) (8,680 ⭐) — ml-ferret is a multimodal large language model framework and visual reasoning engine designed to reason about images and user interfaces. It functions as a UI grounding model and referring expression comprehension tool that maps natural language descriptions to precise pixel coordinates.

The system focuses on high-resolution image analysis to identify and locate specific interface components. It employs multi-resolution image processing and region-aware visual encoding to preserve detail across different aspect ratios, enabling the model to analyze spatial relationships and functional layouts
- [volcengine/verl](https://awesome-repositories.com/repository/volcengine-verl.md) (22,015 ⭐) — verl is a distributed training system designed for large language model alignment and reinforcement learning. It provides a framework for executing post-training pipelines, including supervised fine-tuning and reinforcement learning from human feedback, to refine model behavior and agentic capabilities.

The system utilizes a hybrid training and inference engine that optimizes memory and communication when switching between model generation and gradient updates. It supports multi-modal reinforcement learning for models processing both image and text data, and implements algorithms such as PPO
- [run-house/kubetorch](https://awesome-repositories.com/repository/run-house-kubetorch.md) (1,212 ⭐) — Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.
- [mastra-ai/mastra](https://awesome-repositories.com/repository/mastra-ai-mastra.md) (21,221 ⭐) — Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention.

The framework distinguishes itself through its focus on observability and secure, isolated execut
- [meta-llama/llama-cookbook](https://awesome-repositories.com/repository/meta-llama-llama-cookbook.md) (18,375 ⭐) — This project is a collection of implementation guides, recipes, and developer resources for building applications with Llama models. It serves as a comprehensive kit for developing autonomous agents, establishing retrieval-augmented generation systems, and executing model fine-tuning.

The resource provides specific patterns for multimodal workflows that process text, images, and audio. It includes specialized guidance on adapting pre-trained model weights for targeted tasks and implementing tool-calling orchestration to connect models with external APIs and functions.

The codebase covers a b
- [hubspot/react-experiments](https://awesome-repositories.com/repository/hubspot-react-experiments.md) (319 ⭐) — React components for implementing UI experiments
- [hiyouga/llama-factory](https://awesome-repositories.com/repository/hiyouga-llama-factory.md) (72,241 ⭐) — LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models.

The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models.

The system covers data pipel
- [tracksapp/tracks](https://awesome-repositories.com/repository/tracksapp-tracks.md) (1,235 ⭐) — Tracks is a GTD™ web application, built with Ruby on Rails
- [hiyouga/llamafactory](https://awesome-repositories.com/repository/hiyouga-llamafactory.md) (72,213 ⭐) — LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface.

The project distinguishes itself by offering a low-code visual dashboard that enables users to configure experiments and monitor performance metrics in real time without writing extensive custom scripts. It also features a configuration-driven orchestration system that decouples experim
- [clearml/clearml](https://awesome-repositories.com/repository/clearml-clearml.md) (6,740 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts.

The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and
- [raaminz/training](https://awesome-repositories.com/repository/raaminz-training.md) (28 ⭐) — This Repository is all about my training classes
- [ageron/handson-ml](https://awesome-repositories.com/repository/ageron-handson-ml.md) (25,608 ⭐) — This is a machine learning educational repository consisting of a collection of notebooks and code examples. It provides practical implementations of diverse machine learning algorithms and workflows, ranging from traditional scientific computing to deep learning.

The project features specific implementations of Scikit-Learn models, such as decision trees, random forests, and support vector machines, as well as TensorFlow examples for building neural networks, convolutional layers, and recurrent architectures. It also includes tutorials on reinforcement learning development and the creation o
- [hiyouga/llama-efficient-tuning](https://awesome-repositories.com/repository/hiyouga-llama-efficient-tuning.md) (72,239 ⭐) — This project is a fine-tuning framework and training pipeline designed to optimize and adapt large language and vision models. It provides a specialized toolkit for parameter-efficient tuning and supervised learning, serving as both a trainer for multimodal models and a deployment tool for serving fine-tuned models via high-performance inference engines.

The framework focuses on reducing memory and compute requirements by updating a small subset of model parameters. It supports a wide range of adaptation strategies, including vision-language model training to align text, image, video, and aud
- [maquannene/track](https://awesome-repositories.com/repository/maquannene-track.md) (268 ⭐) — Track is a thread safe cache write by Swift. Composed of DiskCache and MemoryCache which support LRU.
- [rednaga/training](https://awesome-repositories.com/repository/rednaga-training.md) (431 ⭐) — Training materials crafted and publicly provided by Red Naga members
- [datahub-project/datahub](https://awesome-repositories.com/repository/datahub-project-datahub.md) (12,141 ⭐) — DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations.

The platform distinguishes itself through its focus on grounding artificial intelligence and autono
- [ai4finance-foundation/finrl](https://awesome-repositories.com/repository/ai4finance-foundation-finrl.md) (13,964 ⭐) — FinRL is a reinforcement learning framework designed for the development, training, and backtesting of automated trading strategies. It functions as a quantitative finance toolkit that integrates deep learning algorithms with financial market simulations to address complex portfolio management and asset allocation tasks. The platform provides an end-to-end pipeline for transforming raw market data into actionable trading models.

The project distinguishes itself through a layered, modular architecture that separates data processing, environment simulation, and agent training. This design allow
- [ultralytics/ultralytics](https://awesome-repositories.com/repository/ultralytics-ultralytics.md) (58,468 ⭐) — Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications.

The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
- [huggingface/ml-intern](https://awesome-repositories.com/repository/huggingface-ml-intern.md) (10,521 ⭐) — This project is an autonomous AI agent framework and workflow orchestrator designed to automate machine learning engineering. It functions as a reasoning engine that reads research papers and writes code to train and deploy machine learning models through iterative reasoning loops and tool execution.

The system distinguishes itself by integrating a GPU-accelerated sandboxed execution environment, allowing it to run and verify machine learning scripts in isolated remote containers. It utilizes a model provider integration gateway to route inference requests across various hosted or local endpo
- [hyperopt/hyperopt](https://awesome-repositories.com/repository/hyperopt-hyperopt.md) (7,582 ⭐) — Hyperopt is a Python library for hyperparameter optimization designed to minimize scalar-valued objective functions. It operates as a stochastic search space engine that finds optimal input parameters by searching through real-valued, discrete, and conditional spaces.

The framework distinguishes itself through its support for complex search space configurations, allowing for conditional parameter hierarchies where specific hyperparameters are sampled only if their parent parameters meet certain criteria. It is built as an asynchronous optimization framework, decoupling the generation of searc
- [trekhleb/machine-learning-experiments](https://awesome-repositories.com/repository/trekhleb-machine-learning-experiments.md) (1,814 ⭐) — This is a collection of interactive machine-learning experiments. Each experiment consists of 🏋️ Jupyter/Colab notebook (to see how a model was trained) and 🎨 demo page (to see a model in action right in your browser).
- [born-ml/born](https://awesome-repositories.com/repository/born-ml-born.md) (100 ⭐) — Born is a modern ML framework for Go — train and deploy models as single binaries. Pure Go, zero CGO, GPU accelerated.
- [dlr-rm/rl-baselines3-zoo](https://awesome-repositories.com/repository/dlr-rm-rl-baselines3-zoo.md) (2,725 ⭐) — This project is a collection of pretrained reinforcement learning agents and training scripts built on Stable Baselines3 and Gymnasium. It provides a framework for training agents to solve specific tasks, managing experiment reproducibility, and deploying pretrained models.

The system includes a specialized benchmarking suite and optimization tools for tuning agent settings. It utilizes automated search spaces and distributed trials to maximize performance, while employing bootstrap sampling to generate statistically robust performance metrics and confidence intervals.

Broad capabilities cov
- [eugeneyan/applied-ml](https://awesome-repositories.com/repository/eugeneyan-applied-ml.md) (29,783 ⭐) — This project is a comprehensive, curated knowledge base designed to support the development and maintenance of production-grade machine learning systems. It serves as a centralized repository of industry-standard technical literature, engineering case studies, and research papers, providing a structured reference for practitioners navigating the complexities of modern data science and machine learning engineering.

The resource distinguishes itself through a cross-domain approach that bridges the gap between academic research and practical implementation. By synthesizing proven industry archit
- [mrdbourke/pytorch-deep-learning](https://awesome-repositories.com/repository/mrdbourke-pytorch-deep-learning.md) (17,195 ⭐) — This project is a structured educational resource and training platform designed for mastering deep learning development. It provides a comprehensive curriculum focused on building, evaluating, and refining predictive models through hands-on coding exercises and standard industry workflows.

The curriculum emphasizes practical implementation, guiding users through the construction of neural network architectures and the application of transfer learning to adapt pretrained models for custom tasks. It includes methodologies for tracking and comparing model experiment results, allowing for the sy
- [raphamorim/canvas-experiments](https://awesome-repositories.com/repository/raphamorim-canvas-experiments.md) (51 ⭐) — raphamorim.com's canvas experiments. The goal is make most simple. So the project is only using pure html, css and js.
- [mlflow/mlflow](https://awesome-repositories.com/repository/mlflow-mlflow.md) (26,554 ⭐)
- [rudnik275/tracked-instance](https://awesome-repositories.com/repository/rudnik275-tracked-instance.md) (5 ⭐) — Build large forms and track all changes
- [gaomingqi/track-anything](https://awesome-repositories.com/repository/gaomingqi-track-anything.md) (6,936 ⭐) — Track-Anything is an AI-driven video object segmentation and tracking system. It utilizes the Segment Anything Model to isolate and mask multiple objects across video frames, providing tools for automated mask propagation and background-filling inpainting.

The system distinguishes itself through a multi-object segmentation pipeline that can follow several distinct targets simultaneously. It includes a video inpainting utility to remove tracked objects and replace them with synthesized background content, as well as temporal mask refinement to correct tracking drift.

The project covers broad
- [orchestra-research/ai-research-skills](https://awesome-repositories.com/repository/orchestra-research-ai-research-skills.md) (3,641 ⭐) — This project is an LLM research orchestrator and autonomous AI agent framework designed to automate the scientific lifecycle. It functions as an end-to-end research pipeline and model training toolkit, managing everything from initial literature reviews and hypothesis testing to the final drafting of academic papers.

The system is distinguished by its ability to convert unstructured academic PDFs into machine-executable knowledge layers, allowing agents to reproduce and extend research findings. It employs a two-loop orchestration architecture and a specialized research engineering skill libr
- [crewaiinc/crewai](https://awesome-repositories.com/repository/crewaiinc-crewai.md) (53,687 ⭐) — CrewAI is a multi-agent orchestration framework designed for building autonomous systems that execute complex, multi-step workflows. It provides a development platform where specialized agents are defined with specific roles, goals, and tool sets to perform tasks collaboratively. By leveraging a declarative workflow engine, the system manages task dependencies, state transitions, and execution logic, allowing for the creation of structured, stateful sequences of operations.

The framework distinguishes itself through its hierarchical management capabilities, which utilize manager agents to coo
- [azure/machinelearningnotebooks](https://awesome-repositories.com/repository/azure-machinelearningnotebooks.md) (4,354 ⭐) — Azure Machine Learning Notebooks is a cloud-based environment for developing and executing interactive Jupyter notebooks within a managed machine learning workspace. It provides managed machine learning compute through cloud-based workstations and containerized environments pre-configured with GPU drivers and kernels for high-performance model training.

The project functions as a distributed GPU training platform and an ML experiment tracking system to monitor training metrics and version data assets. It also serves as an MLOps pipeline orchestrator for automating modular workflows and a mode
- [gilbox/react-track](https://awesome-repositories.com/repository/gilbox-react-track.md) (342 ⭐) — Track the position of DOM elements. Create cool animations.
- [microsoft/onnxruntime](https://awesome-repositories.com/repository/microsoft-onnxruntime.md) (19,347 ⭐) — This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management.

The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
- [eduardolundgren/tracking.js](https://awesome-repositories.com/repository/eduardolundgren-tracking-js.md) (9,472 ⭐) — tracking.js is a browser computer vision library written in JavaScript for performing real-time image analysis and object tracking directly within a web browser. It functions as a real-time object tracker, a color tracking tool, and a face detection utility.

The library enables the detection and monitoring of specific color ranges, human faces, and known visual patterns across consecutive video frames. It extracts visual features and descriptors from images to identify distinct landmarks for matching and tracking.

The project covers broad computer vision capabilities, including the ability t
- [livekit/livekit](https://awesome-repositories.com/repository/livekit-livekit.md) (19,358 ⭐) — LiveKit is a comprehensive framework for building and orchestrating real-time, multimodal AI agents that interact with users through voice, video, and text. It provides a centralized, event-driven architecture to manage the entire lifecycle of automated participants, from initialization and session state management to graceful shutdown. By utilizing a selective forwarding unit, the platform efficiently routes media streams between participants and agents, ensuring low-latency communication and secure, token-based authentication for all connections.

The platform distinguishes itself through it
- [niuiic/track.nvim](https://awesome-repositories.com/repository/niuiic-track-nvim.md) (26 ⭐) — Neovim plugin to track the thought process of reading source code.
- [axolotl-ai-cloud/axolotl](https://awesome-repositories.com/repository/axolotl-ai-cloud-axolotl.md) (12,059 ⭐) — Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies.

The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation,
- [netflix/metaflow](https://awesome-repositories.com/repository/netflix-metaflow.md) (9,764 ⭐) — Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments.

The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
- [kubernetes/kubernetes](https://awesome-repositories.com/repository/kubernetes-kubernetes.md) (123,197 ⭐) — Kubernetes is a distributed container orchestration platform that automates the deployment, scaling, and management of containerized applications across clusters of computing nodes. It functions as a declarative infrastructure controller, utilizing a control loop architecture that continuously monitors the current system state against user-defined configurations to ensure desired operational outcomes. The system relies on a centralized API-driven interface and a replicated key-value store to maintain a consistent source of truth for all cluster objects.

The platform distinguishes itself throu
- [ultralytics/yolov3](https://awesome-repositories.com/repository/ultralytics-yolov3.md) (10,571 ⭐) — This is a real-time object detection framework built on the YOLOv3 architecture, implemented in PyTorch. It provides a complete pipeline for identifying and localizing objects in images and video using a single neural network pass, combining a Darknet-53 backbone with multi-scale feature pyramids and anchor-based bounding box prediction.

The framework extends beyond basic detection to include instance segmentation, human pose estimation, and multi-object tracking across video frames. It offers a model export toolkit that converts trained models through ONNX to CoreML, TensorFlow Lite, and Ten
- [actransitorg/actransit.training](https://awesome-repositories.com/repository/actransitorg-actransit-training.md) (8 ⭐) — AC Transit Training and Education Department (TED) application