# ist-daslab/marlin

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/ist-daslab-marlin).**

1,088 stars · 88 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/IST-DASLab/marlin
- awesome-repositories: https://awesome-repositories.com/repository/ist-daslab-marlin.md

## Description

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

## Tags

### Part of an Awesome List

- [Tensor Core Optimization](https://awesome-repositories.com/f/awesome-lists/ai/tensor-core-optimization.md) — Mixed-precision kernels for parallel autoregressive inference on large models.
