1 repo
Infrastructure support for executing model training across single-node, multi-GPU, and multi-node clusters.
Distinguishing note: Focuses on the execution environment and hardware scaling for training, rather than the training logic itself.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Distributed Training Runtimes. Refine with filters or upvote what's useful.
This project is a modular research toolkit designed for developing, training, and evaluating deep learning models for object detection, segmentation, and video instance tracking. It provides a flexible training engine that manages complex neural network execution, including distributed training, custom lifecycle hooks, and weight optimization. The framework is built around a hierarchical configuration system that allows users to define architectures, data pipelines, and training hyperparameters through composable, inheritable files. The project distinguishes itself through its highly modular
Executes training across various hardware environments including single GPUs, multi-GPU clusters, and multi-node setups.