1 repository
Techniques to reduce communication overhead and memory bottlenecks specifically within gradient-based optimizers.
Distinguishing note: Focuses on optimizer-specific performance (fused/distributed) rather than general software performance optimization
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Optimizer Performance Optimizations. Refine with filters or upvote what's useful.
Apex is a high-performance toolkit for PyTorch designed to coordinate distributed training, execute fused GPU kernels, manage mixed precision, and implement optimized distributed optimizers. It provides specialized tools for scaling model training across multiple GPUs and nodes to increase processing speed and throughput. The library features high-performance implementations of Adam and LAMB optimizers to reduce synchronization overhead and memory bottlenecks. It utilizes fused CUDA kernels to combine neural network operations, reducing memory overhead and increasing execution speed. The too
Reduces synchronization overhead and memory bottlenecks using fused and distributed versions of Adam and LAMB.