1 repo
Utilities that reduce the precision of model weights to decrease memory usage and accelerate inference speeds.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Model Quantization Tools. Refine with filters or upvote what's useful.
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU archi
Compresses model weights into quantized formats to significantly reduce memory footprint and boost inference speed.