# squeezeailab/kvquant

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/squeezeailab-kvquant).**

427 stars · 46 forks · Python

## Links

- GitHub: https://github.com/SqueezeAILab/KVQuant
- Homepage: https://arxiv.org/abs/2401.18079
- awesome-repositories: https://awesome-repositories.com/repository/squeezeailab-kvquant.md

## Description

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

## Tags

### Part of an Awesome List

- [Attention Optimization](https://awesome-repositories.com/f/awesome-lists/ai/attention-optimization.md) — Quantizes KV cache to support extremely long context lengths.
