Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Features

Speculative Decoding - Uses multiple decoding heads to speed up generation.

pytorch-labs/gpt-fast

gpt-fast is a PyTorch transformer inference engine designed for low-latency text generation. It functions as a distributed GPU inference library, a quantized model runner, and a speculative decoding framework. The system utilizes a speculative decoding workflow where a small draft model predicts token sequences for verification by a larger model to accelerate generation. It supports quantized model execution to reduce memory footprint and implements tensor parallelism to split computations across multiple GPUs. The project includes a standardized evaluation harness to measure the accuracy an

meta-pytorch/gpt-fast

6,223View on GitHub

gpt-fast is a PyTorch transformer inference engine designed for text generation using a native tensor library implementation. It provides a runtime for executing large language models without the need for external C++ extensions. The project implements speculative decoding to accelerate generation by using a small draft model for token prediction and a larger model for verification. It further optimizes performance through a compiled prefill stage and a multi-GPU tensor parallelism library that shards linear layers across multiple graphics processing units. Memory efficiency is managed throu

hao-ai-lab/LookaheadDecoding

1,336View on GitHub

ICML 2024 Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Infini-AI-Lab/MagicDec

152View on GitHub

ICLR2025 Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

ICML 2024 Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Infini-AI-Lab/MagicDec

152View on GitHub

ICLR2025 Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

FasterDecodingMedusa

Features

Open-source alternatives to Medusa

pytorch-labs/gpt-fast

meta-pytorch/gpt-fast

hao-ai-lab/LookaheadDecoding

Infini-AI-Lab/MagicDec

Star history

Open-source alternatives to Medusa

pytorch-labs/gpt-fast

meta-pytorch/gpt-fast

hao-ai-lab/LookaheadDecoding

Infini-AI-Lab/MagicDec