0 repos
Memory-efficient training technique that shards model parameters, gradients, and optimizer states across data-parallel processes.
No awesome GitHub repositories for artificial intelligence & ml · Fully Sharded Data Parallelism yet. Submit a GitHub URL or browse the filters below.
No repositories listed yet — be the first to submit one.