# mit-han-lab/qserve

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/mit-han-lab-qserve).**

844 stars · 65 forks · C++ · Apache-2.0

## Links

- GitHub: https://github.com/mit-han-lab/qserve
- awesome-repositories: https://awesome-repositories.com/repository/mit-han-lab-qserve.md

## Description

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

## Tags

### Part of an Awesome List

- [Model Quantization Tools](https://awesome-repositories.com/f/awesome-lists/ai/model-quantization-tools.md) — W4A8KV4 quantization and system co-design for serving.
