AutoGPTQ

AutoGPTQ - quantize LLMs for faster inference | Awesome Repos

Open-source alternatives to AutoGPTQ

Similar open-source projects, ranked by how many features they share with AutoGPTQ.

intel/neural-compressor
intel/neural-compressor
2,585View on GitHub
Neural Compressor is a deep learning model compression toolkit and AI inference acceleration engine. It functions as an automated model quantization tool and hardware-aware model compiler designed to reduce the memory footprint of neural networks and decrease execution latency. The project provides specialized frameworks for optimizing large language models, utilizing weight-only quantization and hardware-specific kernels to improve the operational efficiency of generative AI workloads. It maps neural network operators to specialized CPU and GPU vector instructions to accelerate model executi
Pythonauto-tuningawqfp4
View on GitHub2,585
autogptq/autogptq
AutoGPTQ/AutoGPTQ
5,070View on GitHub
AutoGPTQ is a model compression toolkit and post-training quantization framework designed to reduce the memory footprint of large language models. It utilizes the GPTQ algorithm to compress neural network weights, lowering hardware requirements and reducing VRAM usage. The project serves as an inference accelerator by providing optimized kernels that increase token generation speed. It features model architecture extensibility, allowing quantization capabilities to be added to new model structures through configurable patterns. The framework covers a comprehensive quantization pipeline, incl
Python
View on GitHub5,070
huggingface/peft
huggingface/peft
21,274View on GitHub
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
Pythonadapterdiffusionfine-tuning
View on GitHub21,274
intel-analytics/bigdl
intel-analytics/BigDL
8,845View on GitHub
BigDL is a PyTorch acceleration framework and distributed inference engine designed for large language models. It provides a toolkit for running models on Intel hardware, integrating quantization tools and libraries for parameter-efficient fine-tuning. The project distinguishes itself through the use of pipeline parallelism to distribute model workloads across multiple hardware accelerators. It utilizes low-bit integer quantization and speculative decoding to reduce memory footprints and decrease text generation latency. The system covers broad capabilities in model optimization, including w
Python
View on GitHub8,845

See all 30 alternatives to AutoGPTQ

PanQiWeiAutoGPTQArchived

Features

Open-source alternatives to AutoGPTQ

intel/neural-compressor

AutoGPTQ/AutoGPTQ

huggingface/peft

intel-analytics/BigDL

Star history

Open-source alternatives to AutoGPTQ

intel/neural-compressor

AutoGPTQ/AutoGPTQ

huggingface/peft

intel-analytics/BigDL