# fminference/flexgen

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/fminference-flexgen).**

9,366 stars · 591 forks · Python · Apache-2.0 · archived

## Links

- GitHub: https://github.com/FMInference/FlexGen
- awesome-repositories: https://awesome-repositories.com/repository/fminference-flexgen.md

## Description

Running large language models on a single GPU for throughput-oriented scenarios.

## Tags

### Part of an Awesome List

- [Hardware Optimized Inference](https://awesome-repositories.com/f/awesome-lists/ai/hardware-optimized-inference.md) — High-throughput generative inference optimized for single GPU environments.
- [Inference Frameworks](https://awesome-repositories.com/f/awesome-lists/ai/inference-frameworks.md) — High-throughput generative inference on single-GPU systems.
- [Model Serving Engines](https://awesome-repositories.com/f/awesome-lists/ai/model-serving-engines.md) — Throughput-oriented inference engine for running models on single GPUs.
