Open-source tools for logging, visualizing, and comparing metrics and parameters from machine learning training runs.
This project is a collection of utilities designed for machine learning experiment tracking, data versioning, and the observability of large language model applications. It provides a client for recording hyperparameters and metrics during training to visualize performance trends and compare different model versions. The tool includes a model evaluation framework that uses custom scorers and automated judges to assess the quality of generated text outputs. It also provides observability tools to monitor and debug the execution flow and runtime behavior of language model applications. The sys
This is a comprehensive experiment tracking system that provides the necessary Python SDK, metric logging, hyperparameter tracking, and artifact versioning to monitor and visualize the entire machine learning lifecycle.
MLflow is a comprehensive platform for the machine learning lifecycle that provides robust metric logging, hyperparameter tracking, artifact versioning, and a built-in visualization dashboard, making it a flagship tool for experiment tracking.
Azure Machine Learning Notebooks is a cloud-based environment for developing and executing interactive Jupyter notebooks within a managed machine learning workspace. It provides managed machine learning compute through cloud-based workstations and containerized environments pre-configured with GPU drivers and kernels for high-performance model training. The project functions as a distributed GPU training platform and an ML experiment tracking system to monitor training metrics and version data assets. It also serves as an MLOps pipeline orchestrator for automating modular workflows and a mode
This repository provides a collection of notebooks and examples for using the Azure Machine Learning platform, which includes built-in experiment tracking, metric logging, and artifact versioning capabilities for your machine learning workflows.
Wandb is a centralized platform for machine learning experiment tracking, model registry management, and workflow orchestration. It provides a comprehensive suite of tools for logging, visualizing, and versioning training metrics, model artifacts, and hyperparameter sweeps to ensure reproducibility across development cycles. The platform also functions as an observability tool for large language model applications, enabling the tracing of execution steps, token usage, and reasoning processes. The project distinguishes itself through its event-driven automation capabilities, which allow users
This is a comprehensive experiment tracking platform that provides a Python SDK for logging metrics, tracking hyperparameters, versioning artifacts, and visualizing training data, covering all your requirements.
DVC is a data versioning tool and pipeline orchestrator designed to track large datasets and machine learning models using external storage and metadata pointers. It integrates with Git by utilizing placeholders to keep heavy artifacts out of the repository while maintaining a versioned link between code and data. The system manages remote data caches through a synchronization layer that connects local environments to cloud storage or network filesystems. It also functions as an experiment tracker, recording hyperparameters and metrics to compare the performance of different model iterations.
DVC is a machine learning experiment tracking system that provides robust artifact versioning, metric logging, and hyperparameter tracking through its Python SDK and Git-integrated workflow.
This project is a machine learning experiment tracker and event file generator that enables the recording of scalars, images, and histograms to monitor model performance. It functions as an integration bridge that allows training metrics from PyTorch to be logged into files compatible with the TensorBoard dashboard. The system includes a remote log synchronizer designed to stream experiment data to cloud services. This allows for the remote management and analysis of training results and the comparison of datasets across different training runs. The utility covers a broad range of monitoring
This tool functions as an experiment tracker by generating the event files necessary to visualize and compare training metrics and artifacts within the TensorBoard dashboard. While it relies on an external dashboard for the UI, it provides the essential Python SDK and logging capabilities required to track metrics and hyperparameters during model training.
This project is a collection of pretrained reinforcement learning agents and training scripts built on Stable Baselines3 and Gymnasium. It provides a framework for training agents to solve specific tasks, managing experiment reproducibility, and deploying pretrained models. The system includes a specialized benchmarking suite and optimization tools for tuning agent settings. It utilizes automated search spaces and distributed trials to maximize performance, while employing bootstrap sampling to generate statistically robust performance metrics and confidence intervals. Broad capabilities cov
This is a reinforcement learning training and benchmarking framework rather than a general-purpose experiment tracking system, though it integrates with external trackers to log its metrics.
DVC is a data versioning tool and pipeline orchestrator designed to track large datasets and machine learning models. It functions as a system for managing large data artifacts by storing lightweight metadata in version control while keeping the actual binaries in a separate cache. The project serves as an experiment tracker and remote storage synchronizer, enabling the execution and comparison of machine learning iterations based on hyperparameters and performance metrics. It provides a bridge for pushing and pulling these large data artifacts between local environments and cloud or on-premi
DVC is a robust tool for experiment tracking and artifact versioning that integrates directly with Python workflows to manage metrics, hyperparameters, and model states.
tensorboardX is a machine learning experiment tracking library used to log metrics and visual data from training processes. It enables the creation of event files that store scalars, images, audio, and graphs for monitoring model performance and behavior. The project provides framework-agnostic logging, allowing users to write visualization data from PyTorch, NumPy, or Chainer. It decouples data recording from specific deep learning engines by using a standardized set of writers to generate binary protobuf files. The library supports model visualization and training data analysis, including
This is a logging library designed to generate event files for TensorBoard rather than a comprehensive experiment tracking system that includes its own built-in visualization dashboard and artifact management.