This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models.
The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM.
The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset loading. It also provides automated generation scoring to evaluate model performance against benchmarks.