# thu-ml/sageattention

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/thu-ml-sageattention).**

3,425 stars · 434 forks · Cuda · Apache-2.0

## Links

- GitHub: https://github.com/thu-ml/SageAttention
- Homepage: https://arxiv.org/abs/2410.02367
- awesome-repositories: https://awesome-repositories.com/repository/thu-ml-sageattention.md

## Description

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

## Tags

### Part of an Awesome List

- [Attention Optimization](https://awesome-repositories.com/f/awesome-lists/ai/attention-optimization.md) — Accurate 8-bit attention for plug-and-play acceleration.
