←BackSqueezeAILab/KVQuant0Copy as MarkdownView on GitHub↗427 stars·46 forks·Python·0 viewsarxiv.org/abs/2401.18079↗KVQuantFeaturesAttention Optimization - Quantizes KV cache to support extremely long context lengths.