Liger-Kernel is a collection of pre-built fused Triton kernels and patching utilities designed to accelerate large language model training. It provides drop-in kernel replacements for common LLM operations such as RMSNorm, cross-entropy loss, and attention, enabling increased throughput and reduced memory usage while preserving bitwise-exact gradients. The project serves as a toolkit for composing custom model architectures from individual optimized kernels and for patching pre-existing models with minimal code changes.
The project distinguishes itself through its ability to perform runtime model surgery via monkey-patching, allowing specific layers in Hugging Face or Megatron-LM models to be swapped for optimized kernels without altering source code. It also offers fused kernel techniques that combine multiple operations into single GPU kernels, including preference optimization loss fusion that reduces memory usage during fine-tuning by up to 80%. Multi-stream residual wrapping stabilizes deep network training by wrapping arbitrary layers with doubly-stochastic residual streams.
The kernel library covers a broad range of operations including RMS layer normalization, rotary position embeddings, softmax and sparsemax computation, multi-token attention, and fused linear cross-entropy. It supports alignment loss computation for methods such as DPO, ORPO, SimPO, and CPO, as well as distillation loss computation for knowledge distillation tasks. The project integrates with distributed training frameworks including FSDP, DeepSpeed, and DDP without additional configuration.