# lyogavin/airllm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/lyogavin-airllm).**

11,508 stars · 1,049 forks · Jupyter Notebook · apache-2.0

## Links

- GitHub: https://github.com/lyogavin/airllm
- awesome-repositories: https://awesome-repositories.com/repository/lyogavin-airllm.md

## Topics

`chinese-llm` `chinese-nlp` `finetune` `generative-ai` `instruct-gpt` `instruction-set` `llama` `llm` `lora` `open-models` `open-source` `open-source-models` `qlora`

## Description

Airllm is a framework designed to execute and fine-tune large language models on consumer-grade hardware. By employing layer-wise model decomposition and memory-efficient loading techniques, the engine enables the operation of massive models that would otherwise exceed available system or video memory.

The project distinguishes itself through a suite of optimization strategies that balance memory footprint with performance. It utilizes block-wise weight quantization and asynchronous layer prefetching to reduce resource consumption and hide data transfer latency. Additionally, the framework supports long-context processing for inputs up to 100,000 tokens and provides tools for model alignment and fine-tuning using low-rank adaptation.

The platform offers a unified interface for cross-platform deployment, supporting both Linux and Apple Silicon environments. It includes automated model loading to simplify initialization and supports distributed training across multiple graphics cards to accommodate larger architectures.

## Tags

### Artificial Intelligence & ML

- [Layer-Decomposition Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization/layer-decomposition-engines.md) — Executes massive language models on consumer hardware by decomposing layers and managing memory usage during inference.
- [Memory-Constrained Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization/memory-constrained-inference.md) — Runs massive language models on consumer hardware by optimizing memory usage to fit within the constraints of available graphics or system memory.
- [Cross-Platform Inference Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/cross-platform-inference-frameworks.md) — Offers a unified interface for deploying and running large language models across Linux and Apple Silicon desktop environments. ([source](https://github.com/lyogavin/airllm#readme))
- [Large Language Model Fine-Tuning Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/integrated-development-platforms/machine-learning-platforms/large-language-model-fine-tuning-frameworks.md) — Provides a platform for fine-tuning and aligning large language models on limited hardware using low-rank adaptation and memory-efficient training methods.
- [Apple Hardware Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/apple-hardware-acceleration.md) — Executes models on local desktop environments by leveraging native hardware acceleration libraries designed for specific processor architectures. ([source](https://github.com/lyogavin/airllm/tree/main/air_llm))
- [Layer Decomposition Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/layer-wise-optimization-strategies/layer-decomposition-engines.md) — Loads individual model layers into memory sequentially during inference to allow massive models to run on hardware with limited capacity.
- [Long Context Training Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/long-context-training-optimizations.md) — Analyzes and answers questions based on massive text inputs up to 100,000 tokens by utilizing memory-efficient sequence processing techniques. ([source](https://github.com/lyogavin/airllm/tree/main/anima_100k))
- [Large Language Model Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization.md) — Executes massive language models on consumer hardware by decomposing model layers and managing memory usage to fit within limited video memory capacity. ([source](https://github.com/lyogavin/airllm#readme))
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Provides tools for memory-efficient fine-tuning of pre-trained models on consumer-grade hardware. ([source](https://github.com/lyogavin/airllm/tree/main/training))
- [Memory Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/memory-optimization.md) — Reduces VRAM usage for large models using attention optimizations and parameter-efficient techniques to enable execution on consumer hardware. ([source](https://github.com/lyogavin/airllm/tree/main/anima_100k))
- [Quantization Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/memory-optimization-techniques/quantization-toolkits.md) — Reduces the memory footprint of large models through block-wise quantization and efficient layer loading techniques.
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Applies block-wise weight quantization to decrease memory footprint and accelerate inference performance. ([source](https://github.com/lyogavin/airllm#readme))
- [Parameter Efficient Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/parameter-efficient-fine-tuning.md) — Updates only a small subset of model parameters during training to minimize memory usage while aligning models with specific preferences.
- [Preference Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/preference-optimization.md) — Uses direct preference optimization to align model outputs with human preferences on limited hardware. ([source](https://github.com/lyogavin/airllm/tree/main/training))
- [Preference-Based Model Alignments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/fine-tuning-and-alignment/preference-based-model-alignments.md) — Implements direct preference optimization to align language models with human preferences. ([source](https://github.com/lyogavin/airllm/tree/main/rlhf))
- [Hardware Abstraction Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/hardware-abstraction-layers.md) — Provides a consistent interface that maps high-level operations to native acceleration libraries across different processor architectures and operating systems.
- [Context Memory Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/model-fine-tuning-adaptation/context-memory-optimizations.md) — Reduces the memory overhead of long-context processing by applying efficient sequence management techniques to handle massive input token windows.
- [Inference Latency Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-latency-optimizers.md) — Prefetches model layers during inference to hide data transfer latency and improve execution speed. ([source](https://github.com/lyogavin/airllm/blob/main/README.md))
- [Model Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/model-loading.md) — Detects model types automatically to simplify initialization without requiring manual class specification for different architectures. ([source](https://github.com/lyogavin/airllm/blob/main/README.md))
- [Asynchronous Prefetchers](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-prefetchers/asynchronous-prefetchers.md) — Overlaps data transfer and computation by loading upcoming model layers into memory before they are needed to hide latency.

### Data & Databases

- [Long-Context Sequence Processors](https://awesome-repositories.com/f/data-databases/text-processing-pipelines/long-context-sequence-processors.md) — Analyzes and answers questions based on massive text inputs up to 100,000 tokens by utilizing memory-efficient sequence processing techniques.

### DevOps & Infrastructure

- [Model Inference Deployment](https://awesome-repositories.com/f/devops-infrastructure/deployment-management/model-inference-deployment.md) — Runs language models on both Linux and desktop hardware using a unified interface to simplify local model deployment across different operating systems.
