# linkedin/liger-kernel

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/linkedin-liger-kernel).**

6,148 stars · 489 forks · Python · bsd-2-clause

## Links

- GitHub: https://github.com/linkedin/Liger-Kernel
- Homepage: https://linkedin.github.io/Liger-Kernel/
- awesome-repositories: https://awesome-repositories.com/repository/linkedin-liger-kernel.md

## Topics

`finetuning` `gemma2` `hacktoberfest` `llama` `llama3` `llm-training` `llms` `mistral` `phi3` `triton` `triton-kernels`

## Description

Liger-Kernel is a collection of pre-built fused Triton kernels and patching utilities designed to accelerate large language model training. It provides drop-in kernel replacements for common LLM operations such as RMSNorm, cross-entropy loss, and attention, enabling increased throughput and reduced memory usage while preserving bitwise-exact gradients. The project serves as a toolkit for composing custom model architectures from individual optimized kernels and for patching pre-existing models with minimal code changes.

The project distinguishes itself through its ability to perform runtime model surgery via monkey-patching, allowing specific layers in Hugging Face or Megatron-LM models to be swapped for optimized kernels without altering source code. It also offers fused kernel techniques that combine multiple operations into single GPU kernels, including preference optimization loss fusion that reduces memory usage during fine-tuning by up to 80%. Multi-stream residual wrapping stabilizes deep network training by wrapping arbitrary layers with doubly-stochastic residual streams.

The kernel library covers a broad range of operations including RMS layer normalization, rotary position embeddings, softmax and sparsemax computation, multi-token attention, and fused linear cross-entropy. It supports alignment loss computation for methods such as DPO, ORPO, SimPO, and CPO, as well as distillation loss computation for knowledge distillation tasks. The project integrates with distributed training frameworks including FSDP, DeepSpeed, and DDP without additional configuration.

## Tags

### Artificial Intelligence & ML

- [Triton Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations/triton-kernels.md) — Provides a collection of pre-built fused Triton kernels that combine multiple LLM operations into single GPU kernels.
- [Transformer Training Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-training-accelerators.md) — Provides GPU-optimized Triton kernels that accelerate transformer training throughput by up to 20%. ([source](https://cdn.jsdelivr.net/gh/linkedin/liger-kernel@main/README.md))
- [Bitwise-Exact Gradient Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/gradient-computation/bitwise-exact-gradient-kernels.md) — Guarantees bitwise-identical gradients when replacing standard operations with fused kernels. ([source](https://linkedin.github.io/Liger-Kernel/))
- [Preference Optimization Losses](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/neural-network-components/loss-functions/perceptual-loss/content-loss-calculators/focal-loss-calculators/detection-loss-calculators/softmax-loss-calculators/preference-optimization-losses.md) — Calculates fused linear losses for preference optimization methods including DPO, ORPO, SimPO, and CPO. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Fused Linear Alignment Losses](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/neural-network-components/loss-functions/perceptual-loss/content-loss-calculators/focal-loss-calculators/detection-loss-calculators/softmax-loss-calculators/preference-optimization-losses/fused-linear-alignment-losses.md) — Fused linear transformations with alignment loss calculations to reduce memory usage during fine-tuning by up to 80%.
- [Preference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/preference-optimization.md) — Computes fused linear losses for alignment methods such as DPO, ORPO, SimPO, and CPO to reduce memory usage during fine-tuning.
- [Multi-Query Attention Scoring](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-scoring-functions/multi-query-attention-scoring.md) — Ships an optimized fused multi-token attention kernel for multiple query and key inputs. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Megatron-LM Kernel Patches](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations/triton-kernels/megatron-lm-kernel-patches.md) — Replaces RMSNorm and cross-entropy loss in Megatron-LM with faster Triton kernels. ([source](https://linkedin.github.io/Liger-Kernel/High-Level-APIs/))
- [Megatron-LM Patches](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations/triton-kernels/megatron-lm-patches.md) — Provides optimized Triton kernel replacements for Megatron-LM training setups targeting RMSNorm and cross-entropy loss operations.
- [Divergence Loss Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/knowledge-distillation/divergence-loss-kernels.md) — Provides optimized kernel implementations for calculating KL divergence and Jensen-Shannon divergence losses for knowledge distillation.
- [Distillation Loss Calculators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/neural-network-components/loss-functions/perceptual-loss/content-loss-calculators/focal-loss-calculators/detection-loss-calculators/distillation-loss-calculators.md) — Implements optimized KL divergence and Jensen-Shannon divergence kernels for knowledge distillation. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Layer-Wise Model Assembly](https://awesome-repositories.com/f/artificial-intelligence-ml/model-compatibility-layers/layer-wise-model-assembly.md) — Ships individual optimized fused kernels that can be imported and combined as modular building blocks for custom architectures.
- [Modular Layer Compositions](https://awesome-repositories.com/f/artificial-intelligence-ml/model-composition-architectures/hybrid-layer-compositions/modular-layer-compositions.md) — Offers individual optimized fused kernels that can be imported and combined into user-defined neural network architectures. ([source](https://cdn.jsdelivr.net/gh/linkedin/liger-kernel@main/README.md))
- [Multi-GPU Training Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-gpu-training-utilities.md) — Integrates with FSDP, DeepSpeed, and DDP for multi-GPU training without additional configuration. ([source](https://linkedin.github.io/Liger-Kernel/))
- [Rotary Positional Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/positional-embedding-techniques/rotary-positional-embeddings.md) — Implements an optimized rotary position embedding kernel for transformer models. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Cross-Entropy Loss Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/loss-function-calculators/binary-cross-entropy-calculators/cross-entropy-loss-functions.md) — Ships an optimized fused cross-entropy loss kernel for large-vocabulary classification tasks. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Fused Linear Cross-Entropy Kernels](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/loss-function-calculators/binary-cross-entropy-calculators/cross-entropy-loss-functions/fused-linear-cross-entropy-kernels.md) — Combines linear transformation and cross-entropy loss into a single fused GPU kernel. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Doubly-Stochastic Residual Streams](https://awesome-repositories.com/f/artificial-intelligence-ml/residual-networks/doubly-stochastic-residual-streams.md) — Implements multi-stream residual wrapping with doubly-stochastic matrices to stabilize deep network training.
- [RMS Normalizations](https://awesome-repositories.com/f/artificial-intelligence-ml/rms-normalizations.md) — Provides an optimized fused RMS normalization kernel for transformer layers. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))
- [Megatron-LM Kernel Patches](https://awesome-repositories.com/f/artificial-intelligence-ml/training-optimizations/megatron-lm-kernel-patches.md) — Accelerates Megatron-LM training by patching RMSNorm and cross-entropy loss with faster, more memory-efficient Triton kernels.

### Data & Databases

- [Training Memory Optimizers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers.md) — Reduces LLM training memory footprint through efficient fused kernel implementations that minimize intermediate storage. ([source](https://linkedin.github.io/Liger-Kernel/acknowledgement/))
- [Alignment Loss Optimizers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers/alignment-loss-optimizers.md) — Provides fused kernels that cut memory usage by up to 80% during preference alignment fine-tuning. ([source](https://cdn.jsdelivr.net/gh/linkedin/liger-kernel@main/README.md))

### DevOps & Infrastructure

- [Drop-In Kernel Optimizers](https://awesome-repositories.com/f/devops-infrastructure/model-conversion/hugging-face/drop-in-kernel-optimizers.md) — Provides drop-in kernel replacements for Hugging Face transformer models that increase throughput and reduce memory without changing model code.
- [Layer Optimization Patches](https://awesome-repositories.com/f/devops-infrastructure/model-conversion/hugging-face/layer-optimization-patches.md) — Optimizes Hugging Face transformer models by swapping standard layers for memory-efficient Triton kernels with a single function call.
- [Runtime Kernel Swaps](https://awesome-repositories.com/f/devops-infrastructure/model-conversion/hugging-face/runtime-kernel-swaps.md) — Swaps standard Hugging Face model layers for optimized Triton kernels with a single function call, preserving exact computation. ([source](https://cdn.jsdelivr.net/gh/linkedin/liger-kernel@main/README.md))

### Scientific & Mathematical Computing

- [Gradient-Preserving Fusions](https://awesome-repositories.com/f/scientific-mathematical-computing/mathematical-algorithms/mathematical-sequences/bitwise-shifting-methods/bitwise-manipulation/gradient-preserving-fusions.md) — Provides fused kernels that guarantee bitwise-exact gradients, enabling safe drop-in replacement of standard operations.
- [Softmax Normalization](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/statistics-probability/probability-distributions/softmax-normalization.md) — Ships an optimized softmax kernel that converts raw scores to probability distributions. ([source](https://linkedin.github.io/Liger-Kernel/Low-Level-APIs/))

### Software Engineering & Architecture

- [Model Layer Swapping Patches](https://awesome-repositories.com/f/software-engineering-architecture/localization-patch-sets/environment-compatibility-patches/performance-compatibility-patches/model-layer-swapping-patches.md) — Swaps Hugging Face model layers for optimized kernels via monkey-patching with a single function call. ([source](https://linkedin.github.io/Liger-Kernel/Getting-Started/))

### Testing & Quality Assurance

- [Model Layer Patching](https://awesome-repositories.com/f/testing-quality-assurance/function-call-tracking/function-behavior-replacement/runtime-method-patching/model-layer-patching.md) — Enables runtime monkey-patching of Hugging Face and Megatron-LM model layers with optimized Triton kernels.

### Part of an Awesome List

- [Inference and Serving](https://awesome-repositories.com/f/awesome-lists/ai/inference-and-serving.md) — Efficient Triton kernels for optimized LLM training and inference.
- [Computation and Optimization](https://awesome-repositories.com/f/awesome-lists/devtools/computation-and-optimization.md) — Triton kernels optimized for large language model training.