2 repository-uri
Software libraries providing various precision reduction methods for neural network weights and optimizer states.
Distinct from Quantized Training: Shortlist candidates are specific quantization types (block-wise/vector-wise) rather than the identity of the tool as a whole.
Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Deep Learning Quantization Tools. Refine with filters or upvote what's useful.
bitsandbytes is a deep learning quantization tool and library designed to reduce the memory footprint of large language models. It serves as a GPU memory optimizer and quantization framework, compressing model weights and features to 8-bit and 4-bit precision to enable inference and training on hardware with limited memory. The project provides a framework for low-rank adaptation, allowing the fine-tuning of quantized models by combining 4-bit weights with small trainable matrices. It further distinguishes itself through memory paging, which moves optimizer states between CPU and GPU memory t
Provides a comprehensive set of vector-wise and block-wise quantization methods for memory-efficient inference and training.
Neural Compressor is a deep learning model compression toolkit and AI inference acceleration engine. It functions as an automated model quantization tool and hardware-aware model compiler designed to reduce the memory footprint of neural networks and decrease execution latency. The project provides specialized frameworks for optimizing large language models, utilizing weight-only quantization and hardware-specific kernels to improve the operational efficiency of generative AI workloads. It maps neural network operators to specialized CPU and GPU vector instructions to accelerate model executi
Provides a comprehensive library of precision reduction methods for neural network weights and optimizer states.