1 dépôt
Quantization workloads distributed across multiple processors using data-parallel processing to accelerate large model compression.
Distinct from Model Quantization: None of the candidates capture the distributed, data-parallel nature of the quantization process.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Distributed Quantization Processing. Refine with filters or upvote what's useful.
llm-compressor is a quantization toolkit and post-training library designed to reduce the memory footprint and size of large language models. It provides a framework for compressing models using weight and activation quantization to enable more efficient deployment. The project distinguishes itself through a distributed quantization framework that utilizes data-parallel processing and disk-based weight offloading to handle massive model checkpoints that exceed available system memory. It includes specialized compressors for diverse architectures, including Mixture-of-Experts, Vision-Language,
Uses distributed data parallel processing to accelerate the quantization of massive models.