# google-ai-edge/litert

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/google-ai-edge-litert).**

2,561 stars · 344 forks · C++ · Apache-2.0

## Links

- GitHub: https://github.com/google-ai-edge/LiteRT
- Homepage: https://ai.google.dev/edge/litert/next/overview
- awesome-repositories: https://awesome-repositories.com/repository/google-ai-edge-litert.md

## Description

LiteRT is a runtime and API for executing machine learning and generative AI models on mobile, desktop, and IoT hardware. It consists of an inference engine and a specialized environment for running quantized large language and diffusion models locally on edge hardware.

The system includes an ahead-of-time model compiler that translates models into hardware-specific bytecode to reduce startup latency and memory overhead. It provides a unified interface for Neural Processing Units with automatic fallback routing to CPUs or GPUs when specific subgraph support is unavailable. An edge model converter transforms trained models into optimized formats for deployment on resource-constrained devices.

The project covers model optimization through format conversion and post-training quantization to reduce binary size. It manages hardware acceleration through automatic accelerator selection and zero-copy memory optimizations to eliminate CPU memory copying. The framework also supports custom operator definitions through a low-level kernel interface to extend the supported mathematical operations.

## Tags

### Artificial Intelligence & ML

- [Local Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization/local-inference-engines/local-inference-engines.md) — Ships a high-performance inference engine and API for executing machine learning and generative AI models on mobile, desktop, and IoT hardware.
- [On-Device Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-inference-engines.md) — Provides an inference engine optimized for executing machine learning and generative AI models locally on mobile, desktop, and IoT hardware.
- [On-Device Deployments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/generative-ai-models/edge-deployment-platforms/on-device-deployments.md) — Enables the deployment and execution of large language models and agentic planning capabilities entirely on-device. ([source](https://ai.google.dev/edge/litert))
- [Generative AI Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/generative-ai-models/edge-deployment-platforms/on-device-deployments/generative-ai-runtimes.md) — Provides a specialized environment for running quantized large language and diffusion models locally on edge hardware.
- [On-Device LLM Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/generative-ai-models/edge-deployment-platforms/on-device-deployments/on-device-llm-runners.md) — Provides a specialized environment for running quantized large language and diffusion models locally on edge hardware.
- [Edge Hardware Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/training-algorithms/machine-learning-optimization/ml-performance-profilers/hardware-specific-model-optimizations/edge-hardware-optimizations.md) — Applies quantization and graph optimizations to reduce memory footprint and increase inference speed on resource-constrained hardware.
- [Model Format Converters](https://awesome-repositories.com/f/artificial-intelligence-ml/model-format-converters.md) — Provides tools for transforming trained machine learning models into optimized formats for deployment on resource-constrained edge devices.
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Applies quantization and architecture-specific optimizations to enable large language and diffusion models to run locally. ([source](https://cdn.jsdelivr.net/gh/google-ai-edge/litert@main/README.md))
- [NPU Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-acceleration.md) — Integrates with specialized Neural Processing Units to optimize model inference performance and energy efficiency.
- [NPU Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-accelerators.md) — Implements a unified interface for executing machine learning inference on Neural Processing Units with automatic CPU and GPU fallbacks.
- [Post-Training Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization/post-training-quantization.md) — Reduces model precision from floating point to integers after training to decrease binary size and increase inference speed. ([source](https://ai.google.dev/edge/litert))
- [Hardware-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference.md) — Executes machine learning models on edge devices using CPU, GPU, or NPU hardware acceleration for high performance. ([source](https://ai.google.dev/edge/litert))
- [Execution Fallbacks](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-support/execution-fallbacks.md) — Automatically redirects computation to a compatible processor if the primary hardware accelerator lacks support for specific operations.
- [On-Device Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-clients/on-device-inference.md) — Provides a high-level inference API and runtime environment to manage model state on edge devices. ([source](https://ai.google.dev/edge/api/litert/c))
- [Hardware Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/hardware-acceleration.md) — Enables the execution of machine learning and generative AI models across mobile, desktop, and IoT hardware using CPUs, GPUs, and NPUs. ([source](https://cdn.jsdelivr.net/gh/google-ai-edge/litert@main/README.md))
- [Automatic Accelerator Selection](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/hardware-acceleration/automatic-accelerator-selection.md) — Automatically selects the most efficient hardware accelerator and manages asynchronous execution for machine learning tasks. ([source](https://ai.google.dev/edge/litert))
- [Tensor Memory Management](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/tensor-computing-libraries/tensor-memory-management.md) — Manages the allocation and tracking of memory views for tensors using buffer references to control data flow. ([source](https://ai.google.dev/edge/api/litert/c))
- [Hardware Performance Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/training-algorithms/deep-learning-optimization/hardware-performance-tuning.md) — Tunes execution across CPUs, GPUs, and NPUs using hardware-specific optimizations to achieve peak processing speeds. ([source](https://ai.google.dev/edge/litert))
- [Graph Compilation Caching](https://awesome-repositories.com/f/artificial-intelligence-ml/model-compilation-optimizers/graph-compilation-caching.md) — Stores compiled computation graphs in a local directory to bypass runtime initialization overhead. ([source](https://ai.google.dev/edge/litert/next/npu))
- [NPU Unified Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-unified-interfaces.md) — Provides a unified interface to execute models on NPUs while abstracting vendor-specific compiler and runtime details. ([source](https://ai.google.dev/edge/litert/next/npu))
- [On-Device Compilation](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/on-device-speech-to-text-sdks/on-device-model-runtimes/on-device-compilation.md) — Translates models into NPU instructions during application initialization to ensure compatibility across diverse hardware platforms. ([source](https://ai.google.dev/edge/litert/next/npu))

### Part of an Awesome List

- [Model Bytecode Compilation](https://awesome-repositories.com/f/awesome-lists/devtools/cross-platform-compilers/model-bytecode-compilation.md) — Translates high-level machine learning model definitions into optimized low-level bytecode for specific edge hardware architectures.
- [Inference Engines](https://awesome-repositories.com/f/awesome-lists/ai/inference-engines.md) — Framework for efficient ML and GenAI deployment on edge.
- [Model Serving & Deployment](https://awesome-repositories.com/f/awesome-lists/ai/model-serving-deployment.md) — Deploys models on mobile and edge devices.

### DevOps & Infrastructure

- [Model Conversion](https://awesome-repositories.com/f/devops-infrastructure/model-conversion.md) — Transforms machine learning models into specialized formats to increase execution speed and reduce memory usage on edge hardware. ([source](https://ai.google.dev/edge/litert))
- [Zero-Copy Buffer Interoperability](https://awesome-repositories.com/f/devops-infrastructure/gpu-acceleration-libraries/zero-copy-buffer-interoperability.md) — Passes tensor data directly to accelerators without duplicating data to system memory to reduce latency and power. ([source](https://cdn.jsdelivr.net/gh/google-ai-edge/litert@main/README.md))
- [PyTorch](https://awesome-repositories.com/f/devops-infrastructure/model-conversion/pytorch.md) — Provides specialized paths for converting trained PyTorch models into optimized formats for on-device deployment. ([source](https://cdn.jsdelivr.net/gh/google-ai-edge/litert@main/README.md))

### Programming Languages & Runtimes

- [Build-Time Bytecode Compilation](https://awesome-repositories.com/f/programming-languages-runtimes/build-time-bytecode-compilation.md) — Translates models into bytecode during the build process to reduce runtime initialization and memory overhead on edge devices.
- [Ahead-of-Time Wasm Execution](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-execution-environments/webassembly/embedded-wasm-runtimes/ahead-of-time-wasm-execution.md) — Compiles models into hardware-specific bytecode before deployment to minimize startup latency on constrained devices. ([source](https://ai.google.dev/edge/litert/next/npu))

### Software Engineering & Architecture

- [Computation Subgraph Delegation](https://awesome-repositories.com/f/software-engineering-architecture/hardware-abstraction-layers/delegate-based-hardware-abstraction/computation-subgraph-delegation.md) — Partitions model computation graphs to delegate specific operations to the most compatible CPU, GPU, or NPU backends.

### Data & Databases

- [Hardware Buffer Zero-Copy](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-persistence-storage/data-storage-architectures/zero-copy-memory-mappings/hardware-buffer-zero-copy.md) — Eliminates expensive CPU memory copy operations by passing tensor data directly to the NPU hardware buffer. ([source](https://ai.google.dev/edge/litert/next/npu))
- [Model Artifact Caches](https://awesome-repositories.com/f/data-databases/query-caching-strategies/compilation-caches/model-artifact-caches.md) — Caches pre-compiled model hardware instructions in local storage to eliminate repeated translation during application launches.
