LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Features

Speculative Decoding - Breaks sequential dependencies to parallelize token generation.

pytorch-labs/gpt-fast

gpt-fast is a PyTorch transformer inference engine designed for low-latency text generation. It functions as a distributed GPU inference library, a quantized model runner, and a speculative decoding framework. The system utilizes a speculative decoding workflow where a small draft model predicts token sequences for verification by a larger model to accelerate generation. It supports quantized model execution to reduce memory footprint and implements tensor parallelism to split computations across multiple GPUs. The project includes a standardized evaluation harness to measure the accuracy an

meta-pytorch/gpt-fast

6,223View on GitHub

gpt-fast is a PyTorch transformer inference engine designed for text generation using a native tensor library implementation. It provides a runtime for executing large language models without the need for external C++ extensions. The project implements speculative decoding to accelerate generation by using a small draft model for token prediction and a larger model for verification. It further optimizes performance through a compiled prefill stage and a multi-GPU tensor parallelism library that shards linear layers across multiple graphics processing units. Memory efficiency is managed throu

FasterDecoding/Medusa

2,751View on GitHub

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Infini-AI-Lab/MagicDec

152View on GitHub

ICLR2025 Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Infini-AI-Lab/MagicDec

152View on GitHub

ICLR2025 Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding

hao-ai-labLookaheadDecoding

Features

Open-source alternatives to LookaheadDecoding

pytorch-labs/gpt-fast

meta-pytorch/gpt-fast

FasterDecoding/Medusa

Infini-AI-Lab/MagicDec

Star history

Open-source alternatives to LookaheadDecoding

pytorch-labs/gpt-fast

meta-pytorch/gpt-fast

FasterDecoding/Medusa

Infini-AI-Lab/MagicDec