awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Training Diagnostic Tools · Awesome GitHub Repositories

1 repo

Awesome GitHub RepositoriesTraining Diagnostic Tools

Utilities for testing and identifying optimal training hyperparameters before full-scale model execution.

Distinguishing note: Focuses on the diagnostic phase of finding stable hyperparameters rather than the execution of the training itself.

Explore 1 awesome GitHub repository matching artificial intelligence & ml · Training Diagnostic Tools. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Training Diagnostic Tools

Awesome Training Diagnostic Tools GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • deepspeedai/DeepSpeed

    deepspeedai/DeepSpeed

    41,638View on GitHub↗

    DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading. The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization

    The framework identifies the maximum stable learning rate for model training to enable faster convergence and effective use of large batch sizes.

    Pythonbillion-parameterscompressiondata-parallelism
    41,638View on GitHub↗