# pytorch/executorch

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/pytorch-executorch).**

4,296 stars · 848 forks · Python · other

## Links

- GitHub: https://github.com/pytorch/executorch
- Homepage: https://executorch.ai
- awesome-repositories: https://awesome-repositories.com/repository/pytorch-executorch.md

## Topics

`deep-learning` `embedded` `gpu` `machine-learning` `mobile` `neural-network` `tensor`

## Description

ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardware acceleration and on-device large language model inference capabilities.

The project distinguishes itself through a hardware accelerator delegate system that partitions model subgraphs and offloads computation to specialized backends including NPUs, GPUs, and DSPs from Apple, Arm, Intel, MediaTek, Qualcomm, and Samsung. It supports autoregressive text generation with tokenization, KV cache management, and streaming output, alongside multi-language runtime bindings for Java, Kotlin, Objective-C, and C++. Operator-level profiling and debugging tools capture execution traces and link them back to original source code for performance analysis.

The platform covers model export and optimization through PyTorch export, quantization to lower-bit representations, static memory planning, and custom compiler passes. It includes capabilities for image preprocessing, multimodal and audio model inference, and decoding vision model outputs into task-specific results. Tensor management, platform abstraction, and extensibility mechanisms allow adding custom backends, kernels, and compiler passes.

Documentation covers building from source, cross-compilation for embedded targets and iOS, and integration with Android and iOS frameworks through platform-specific APIs.

## Tags

### Artificial Intelligence & ML

- [Backend-Specific Model Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/backend-specific-model-exports.md) — Delegates model graphs to hardware-specific backends like Core ML and QNN for accelerated execution. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Cross-Language ML Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/cross-language-ml-toolkits.md) — A toolkit that packages and deploys machine learning models to Android, iOS, and bare-metal RTOS environments with multi-language bindings.
- [On-Device Decoder-Only Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/pretrained-model-integrations/text-generation-inference-integrations/on-device-decoder-only-runners.md) — ExecuTorch loads a decoder-only model and generates text from a prompt, streaming tokens as they are produced. ([source](https://docs.pytorch.org/executorch/main/llm/run-with-c-plus-plus.html))
- [On-Device Text Generation Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/pretrained-model-integrations/text-generation-inference-integrations/on-device-text-generation-runners.md) — ExecuTorch loads a text-generation model, configures its tokenizer, and generates token streams on iOS through a native Objective-C/Swift interface. ([source](https://docs.pytorch.org/executorch/main/llm/run-on-ios.html))
- [Generation Parameter Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/generation-temperature-controls/generation-parameter-configurations.md) — ExecuTorch controls generation behavior by setting parameters such as token count, temperature, and echo mode. ([source](https://docs.pytorch.org/executorch/main/llm/run-with-c-plus-plus.html))
- [NPU Inference Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference/npu-inference-execution.md) — Executes inference on CPU, GPU, NPU, or DSP by selecting a backend matching the target hardware. ([source](https://docs.pytorch.org/executorch/main/index.html))
- [Hardware Acceleration Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends.md) — Ships a hardware acceleration backend system that offloads model execution to NPUs and other accelerators. ([source](https://docs.pytorch.org/executorch/main/embedded-section.html))
- [Hardware Backend Targeting](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-backend-targeting.md) — Selects and configures target hardware backends to lower model operations for efficient on-device execution. ([source](https://docs.pytorch.org/executorch/main/advanced-topics-section.html))
- [Hardware-Specific Model Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-specific-model-exports.md) — Exports and optimizes models for specific hardware backends, producing specialized files for efficient device execution. ([source](https://docs.pytorch.org/executorch/main/backends-overview.html))
- [On-Device Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-clients/on-device-inference.md) — Executes machine learning models directly on mobile, embedded, and edge hardware for real-time, privacy-preserving applications. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))
- [KV Cache Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/kv-cache-optimizations.md) — Enables key-value caching and scaled dot-product attention to accelerate autoregressive generation. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm.html))
- [Model Execution APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/inference-servers-and-runtimes/model-execution-apis.md) — Executes loaded models with input tensors and retrieves output tensors from inference. ([source](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html))
- [Edge AI Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment.md) — Exports, optimizes, and runs trained models on mobile, embedded, and edge hardware with a lightweight runtime. ([source](https://docs.pytorch.org/executorch/main/index.html))
- [On-Device Deployments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/generative-ai-models/edge-deployment-platforms/on-device-deployments.md) — Exports, optimizes, and runs large language models on resource-constrained edge devices for on-device text generation. ([source](https://docs.pytorch.org/executorch/main/_sources/index.md.txt))
- [On-Device LLM Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/generative-ai-models/edge-deployment-platforms/on-device-deployments/on-device-llm-runners.md) — ExecuTorch exports and runs large language models on edge devices with a dedicated runner for text generation. ([source](https://cdn.jsdelivr.net/gh/pytorch/executorch@main/README.md))
- [iOS Deployments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/ios-deployments.md) — ExecuTorch loads and executes a compiled model from Objective-C, integrating inference into iOS applications. ([source](https://docs.pytorch.org/executorch/main/getting-started.html))
- [Inference Execution Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/inference-execution-models.md) — Executes forward passes or named methods on loaded models, returning output tensors or error codes. ([source](https://docs.pytorch.org/executorch/main/extension-module.html))
- [Hardware-Specific Model Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/training-algorithms/machine-learning-optimization/ml-performance-profilers/hardware-specific-model-optimizations.md) — Optimizes and lowers models for specific hardware backends, producing specialized files for efficient device execution. ([source](https://docs.pytorch.org/executorch/main/backends-overview.html))
- [Edge Hardware Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/training-algorithms/machine-learning-optimization/ml-performance-profilers/hardware-specific-model-optimizations/edge-hardware-optimizations.md) — Applies graph optimizations, quantization, and caching to reduce inference latency and memory footprint for edge devices. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))
- [Model Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/model-loading.md) — Loads serialized models from file paths with a single constructor call for execution. ([source](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html))
- [On-Device Model Loaders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/model-loading/on-device-model-loaders.md) — ExecuTorch loads a pre-trained LLM into memory on an Android device using a Java interface, preparing it for inference. ([source](https://docs.pytorch.org/executorch/main/llm/run-on-android.html))
- [Edge Model Compilers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-compilation-optimizers/edge-model-compilers.md) — Compiles and quantizes PyTorch models into compact binaries tailored for resource-constrained edge hardware.
- [Compilation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/model-compilation-optimizers/edge-model-compilers/compilation-pipelines.md) — Provides an ahead-of-time compilation pipeline that converts PyTorch models into executable programs for edge hardware. ([source](https://docs.pytorch.org/executorch/main/intro-how-it-works.html))
- [Model Graph Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-graph-optimizers.md) — Applies operator fusion, decomposition, and backend-specific lowering to improve inference performance. ([source](https://docs.pytorch.org/executorch/main/concepts.html))
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/quantization/model-quantization.md) — Applies quantization and optimization to reduce model size and improve inference speed on constrained hardware. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Applies quantization strategies during export to reduce model size and speed inference. ([source](https://docs.pytorch.org/executorch/main/advanced-topics-section.html))
- [Canonical Operator Sets](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serialization-formats/graph-serialization-formats/canonical-operator-sets.md) — Defines a canonical operator set and graph representation that decouples model authoring from target-hardware execution.
- [Compact Binary Serializations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting/compact-binary-serializations.md) — Serializes trained models into compact binary files that the runtime loads and executes on target hardware. ([source](https://docs.pytorch.org/executorch/main/file-formats-advanced.html))
- [Large Language Model Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting/large-language-model-exports.md) — Exports large language models to optimized programs using a high-level API with configuration. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm.html))
- [On-Device Inference Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/model-exporting/production-inference-exports/on-device-inference-exports.md) — Exports large language models to portable files with built-in optimizations for mobile and edge hardware. ([source](https://docs.pytorch.org/executorch/main/llm/getting-started.html))
- [Text Tokenization](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/text-tokenization.md) — ExecuTorch sets the tokenizer that converts text to model tokens and back, enabling the model to process and generate human-readable text. ([source](https://docs.pytorch.org/executorch/main/llm/run-on-android.html))
- [On-Device Inference Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-inference-engines.md) — Runs large language models directly on mobile and embedded hardware for text generation without cloud connectivity.
- [On-Device Models](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models.md) — Loads and runs compiled programs on mobile or embedded hardware for efficient inference. ([source](https://docs.pytorch.org/executorch/main/intro-how-it-works.html))
- [LLM Runtimes](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/llm-runtimes.md) — A runtime environment for executing large language models on-device with tokenization, KV cache management, and streaming text generation.
- [Local LLM Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/local-llm-execution.md) — Loads and executes exported LLMs on Android and iOS through C++ APIs or platform-specific bindings. ([source](https://docs.pytorch.org/executorch/main/llm/getting-started.html))
- [Operator Standardization](https://awesome-repositories.com/f/artificial-intelligence-ml/operator-standardization.md) — Specifies a canonical collection of ATen operators that the runtime supports for consistent behavior across deployments. ([source](https://docs.pytorch.org/executorch/main/compiler-ir-advanced.html))
- [PyTorch Model Export](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-export.md) — Captures trained model computation graphs using standard PyTorch export APIs for deployment outside training. ([source](https://cdn.jsdelivr.net/gh/pytorch/executorch@main/README.md))
- [Dialect-Based Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-export/dialect-based-exports.md) — Converts PyTorch models through ATen and Edge dialects into serialized binaries for on-device execution. ([source](https://docs.pytorch.org/executorch/main/export-to-executorch-api-reference.html))
- [Edge Device Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-export/edge-device-exports.md) — Converts PyTorch programs into portable graph representations using torch.export for edge device deployment. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Edge Format Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-export/edge-format-exports.md) — Converts PyTorch models into portable .pte files for deployment on mobile and embedded hardware. ([source](https://docs.pytorch.org/executorch/main/advanced-topics-section.html))
- [On-Device Inference Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-export/on-device-inference-exports.md) — Converts PyTorch eager-mode models into portable intermediate representations for edge hardware execution. ([source](https://docs.pytorch.org/executorch/main/_sources/index.md.txt))
- [Static Graph Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-export/static-graph-exports.md) — Captures PyTorch programs as static graphs of standardized operators for deployment on resource-constrained devices. ([source](https://docs.pytorch.org/executorch/main/intro-how-it-works.html))
- [Weight Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization.md) — Converts model weights to lower-precision formats like int8 or int4, configurable per backend. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm-optimum.html))
- [On-Device Text Generation Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-generation/autoregressive-text-generation/on-device-text-generation-runners.md) — Manages tokenization, KV cache, and streaming for on-device large language model inference sessions.
- [Static Graph Compilers](https://awesome-repositories.com/f/artificial-intelligence-ml/static-graph-compilers.md) — Exports, quantizes, and lowers model graphs into executable programs tailored for specific edge hardware targets.
- [Subgraph Backend Delegation](https://awesome-repositories.com/f/artificial-intelligence-ml/subgraph-backend-delegation.md) — Provides a standardized interface for third-party compilers to compile and execute subgraphs on accelerators. ([source](https://docs.pytorch.org/executorch/main/intro-how-it-works.html))
- [Apple Hardware Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/apple-hardware-acceleration.md) — Accelerates model inference on iOS devices using Apple-specific hardware backends for execution. ([source](https://docs.pytorch.org/executorch/main/ios-section.html))
- [XNNPACK Backend Exports](https://awesome-repositories.com/f/artificial-intelligence-ml/backend-specific-model-exports/xnnpack-backend-exports.md) — Converts PyTorch models into programs lowered for the XNNPACK delegate during export. ([source](https://docs.pytorch.org/executorch/main/backends/xnnpack/xnnpack-overview.html))
- [Computational Graph Visualizations](https://awesome-repositories.com/f/artificial-intelligence-ml/computational-graph-visualizations.md) — Displays the model as a graph with performance and debug data linked back to original source code. ([source](https://docs.pytorch.org/executorch/main/devtools-overview.html))
- [Model Deployment](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-models/model-deployment.md) — Executes image classification, object detection, and other vision tasks directly on edge hardware. ([source](https://docs.pytorch.org/executorch/main/index.html))
- [Computer Vision Preprocessing](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-preprocessing.md) — ExecuTorch resizes, crops, converts, and normalizes image tensors for model inference on Android and iOS. ([source](https://docs.pytorch.org/executorch/main/advanced-topics-section.html))
- [Dynamic Tensor Shapes](https://awesome-repositories.com/f/artificial-intelligence-ml/dynamic-tensor-shapes.md) — Handles tensor dimension changes between inference runs without requiring model recompilation. ([source](https://docs.pytorch.org/executorch/main/backends/vulkan/vulkan-overview.html))
- [GPU-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference.md) — Executes machine learning models on compatible GPUs using the cross-platform Vulkan API for accelerated inference. ([source](https://docs.pytorch.org/executorch/main/backends/vulkan/vulkan-overview.html))
- [Half-Precision Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/half-precision-inference.md) — Supports FP16 inference precision to balance model accuracy and memory usage on edge devices. ([source](https://docs.pytorch.org/executorch/main/backends/vulkan/vulkan-overview.html))
- [Hardware-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-accelerated-inference.md) — Delegates model execution to Qualcomm Hexagon or Adreno processors through the QNN SDK for on-device AI. ([source](https://docs.pytorch.org/executorch/main/backends-qualcomm.html))
- [Backend Dialect Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends/backend-dialect-translation.md) — Translates portable model graphs into hardware-specific representations for execution on accelerators. ([source](https://docs.pytorch.org/executorch/main/compiler-ir-advanced.html))
- [Image Data Preprocessing](https://awesome-repositories.com/f/artificial-intelligence-ml/image-data-preprocessing.md) — ExecuTorch keeps platform-dependent image work like decoding, resizing, and cropping in the application layer before passing pixels to the exported model. ([source](https://docs.pytorch.org/executorch/main/working-with-cv-models.html))
- [Intel Hardware Export](https://awesome-repositories.com/f/artificial-intelligence-ml/intel-hardware-export.md) — Exports and executes models on Intel CPUs, GPUs, and NPUs using the OpenVINO runtime backend. ([source](https://docs.pytorch.org/executorch/main/build-run-openvino.html))
- [Embedded Deployments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/embedded-deployments.md) — Runs ML models on microcontrollers and constrained hardware with DSP and NPU acceleration. ([source](https://docs.pytorch.org/executorch/main/edge-platforms-section.html))
- [Memory-Mapped Loading](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/data-and-checkpointing/model-loading/memory-mapped-loading.md) — Loads program files using memory mapping with optional locking and performance hints. ([source](https://docs.pytorch.org/executorch/main/executorch-runtime-api-reference.html))
- [Hugging Face Converters](https://awesome-repositories.com/f/artificial-intelligence-ml/model-format-converters/hugging-face-converters.md) — Downloads transformer models from Hugging Face Hub and converts them into portable formats for on-device execution. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))
- [Performance Profilers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/profiling-and-benchmarking/performance-profilers.md) — ExecuTorch measures runtime performance of individual operators and layers to identify bottlenecks and guide optimization efforts. ([source](https://cdn.jsdelivr.net/gh/pytorch/executorch@main/README.md))
- [Disk Size Reducers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/quantization/model-quantization-frameworks/disk-size-reducers.md) — Shrinks model file size through quantization-aware and post-training quantization for edge deployment. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Input-Output Tensor Assignments](https://awesome-repositories.com/f/artificial-intelligence-ml/model-output-pruning/input-output-tensor-assignments.md) — Assigns input tensors to methods before execution and optionally pre-allocates output tensors. ([source](https://docs.pytorch.org/executorch/main/extension-module.html))
- [CPU Inference Quantizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/8-bit-inference-quantizers/cpu-inference-quantizers.md) — Runs models with 8-bit, fp32, or fp16 activations to reduce memory and improve speed on CPU hardware. ([source](https://docs.pytorch.org/executorch/main/backends/xnnpack/xnnpack-overview.html))
- [Quantized Linear Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/8-bit-inference-quantizers/quantized-linear-layers.md) — Executes linear layers with 8-bit or 4-bit weights to reduce memory and compute during inference. ([source](https://docs.pytorch.org/executorch/main/backends/vulkan/vulkan-overview.html))
- [Hardware-Specific Quantizations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/8-bit-inference-quantizers/static-quantization/hardware-specific-quantizations.md) — Applies static 8-bit or 16-bit integer quantization to models so they can run on Samsung Exynos hardware. ([source](https://docs.pytorch.org/executorch/main/backends/samsung/samsung-overview.html))
- [Configurable Bit-Width Quantizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/configurable-bit-width-quantizers.md) — Configures model quantization to specific bit-width combinations like A16W16 or A8W4 before deployment. ([source](https://docs.pytorch.org/executorch/main/backends-mediatek.html))
- [Quantization-Aware Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/quantization-aware-training.md) — Supports quantization-aware and post-training quantization to shrink models for constrained hardware. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Multimodal Model Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-models/multimodal-model-runners.md) — ExecuTorch loads and executes a multimodal model that accepts text, image, and audio inputs on iOS through a native Objective-C/Swift interface. ([source](https://docs.pytorch.org/executorch/main/llm/run-on-ios.html))
- [On-Device Multimodal Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-models/multimodal-model-runners/on-device-multimodal-runners.md) — Ships a runtime that executes multimodal models on CPU, GPU, and Metal backends for on-device AI tasks. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))
- [NPU Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-accelerators.md) — Exports and lowers PyTorch models to run on MediaTek Neuron Processing Units for on-device inference. ([source](https://docs.pytorch.org/executorch/main/backends-mediatek.html))
- [Samsung Exynos NPU/DSP Delegates](https://awesome-repositories.com/f/artificial-intelligence-ml/npu-accelerators/samsung-exynos-npu-dsp-delegates.md) — ExecuTorch delegates model execution to Samsung's on-device NPU or DSP via the Exynos AI Litecore SDK for hardware-accelerated inference. ([source](https://docs.pytorch.org/executorch/main/backends/samsung/samsung-overview.html))
- [On-Device Model Profilers](https://awesome-repositories.com/f/artificial-intelligence-ml/on-device-models/on-device-speech-to-text-sdks/on-device-model-runtimes/on-device-model-profilers.md) — ExecuTorch measures model load time, operator-level execution, delegate execution, and end-to-end inference latency during on-device execution. ([source](https://docs.pytorch.org/executorch/main/runtime-profiling.html))
- [Precision Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/precision-quantization.md) — Converts tensors to lower-precision types using static, dynamic, or hybrid quantization techniques. ([source](https://docs.pytorch.org/executorch/main/concepts.html))
- [Hardware-Specific Quantizations](https://awesome-repositories.com/f/artificial-intelligence-ml/quantized-inference-runtimes/weight-quantization/post-training-quantization/hardware-specific-quantizations.md) — Applies post-training or quantization-aware training to reduce model precision for efficient execution on Qualcomm accelerators. ([source](https://docs.pytorch.org/executorch/main/backends-qualcomm.html))

### Part of an Awesome List

- [LLM Quantization Exports](https://awesome-repositories.com/f/awesome-lists/ai/local-llm-execution/quantized-inference-runtimes/llm-quantization-exports.md) — Applies TorchAO or pt2e quantization to reduce model size and improve execution speed on resource-constrained hardware. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm.html))
- [Large Language Model Deployments](https://awesome-repositories.com/f/awesome-lists/ai/local-model-deployment/large-language-model-deployments.md) — Runs large language models on-device for text generation without requiring a cloud connection. ([source](https://docs.pytorch.org/executorch/main/index.html))
- [On-Device LLM Runners](https://awesome-repositories.com/f/awesome-lists/ai/local-model-deployment/large-language-model-deployments/on-device-llm-runners.md) — ExecuTorch loads and executes a text-generation LLM on-device, handling tokenization and autoregressive generation. ([source](https://cdn.jsdelivr.net/gh/pytorch/executorch@main/README.md))
- [Custom Backend Integrations](https://awesome-repositories.com/f/awesome-lists/data/backend-as-a-service/custom-backend-integrations.md) — ExecuTorch integrates a new hardware or software backend into the compilation and runtime pipeline by providing partition, preprocess, and execution functions. ([source](https://docs.pytorch.org/executorch/main/backend-delegates-integration.html))
- [Debugging And Profiling](https://awesome-repositories.com/f/awesome-lists/devtools/debugging-and-profiling.md) — Captures runtime execution traces and links them back to original source code for performance analysis and numerical validation.
- [LLM Profiling Suites](https://awesome-repositories.com/f/awesome-lists/devtools/debugging-and-profiling-tools/llm-profiling-suites.md) — Generates operator delegation tables, memory profiles, and time profiles to diagnose exported LLM performance. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm.html))
- [Model Execution Debuggers](https://awesome-repositories.com/f/awesome-lists/devtools/debugging-and-profiling-tools/pipeline-debugging-and-profiling/model-execution-debuggers.md) — ExecuTorch inspects performance data, visualizes the graph, and correlates runtime behavior back to the original source code during development. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [On-Device Model Debuggers](https://awesome-repositories.com/f/awesome-lists/devtools/debugging-and-profiling-tools/pipeline-debugging-and-profiling/on-device-model-debuggers.md) — ExecuTorch uses developer tools to inspect, visualize, and diagnose the behavior of a model during on-device execution. ([source](https://docs.pytorch.org/executorch/main/concepts.html))
- [Inference Engines](https://awesome-repositories.com/f/awesome-lists/ai/inference-engines.md) — On-device AI deployment framework for mobile and embedded systems.

### Data & Databases

- [Model-as-a-Table Integrations](https://awesome-repositories.com/f/data-databases/model-as-a-table-integrations.md) — ExecuTorch links only the kernels and backend libraries required by a specific model to produce a small, efficient executable for deployment. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Custom Inference Operations](https://awesome-repositories.com/f/data-databases/collective-communication-operations/custom-operation-definitions/custom-inference-operations.md) — Accelerates inference by replacing standard attention and KV cache operations with faster custom implementations. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm-optimum.html))
- [Runtime Profiling Data Extractors](https://awesome-repositories.com/f/data-databases/data-observability-profilings/runtime-profiling-data-extractors.md) — ExecuTorch captures profiling and debugging data from model execution and exposes it through a structured dump for post-run analysis. ([source](https://docs.pytorch.org/executorch/main/etdump.html))
- [Arbitrary Source Model Loaders](https://awesome-repositories.com/f/data-databases/dynamic-data-model-execution/device-data-model-execution/arbitrary-source-model-loaders.md) — Provides an abstraction to load model files from files, memory, or other data sources. ([source](https://docs.pytorch.org/executorch/main/concepts.html))
- [Profiling Data Inspector APIs](https://awesome-repositories.com/f/data-databases/query-performance-analyzers/execution-performance-analyzers/profiling-data-inspector-apis.md) — ExecuTorch generates profiling artifacts and uses an inspector API to analyze operator-level performance and identify bottlenecks. ([source](https://docs.pytorch.org/executorch/main/devtools-tutorial.html))
- [Model Metadata Retrieval](https://awesome-repositories.com/f/data-databases/retrieval-metadata/platform-metadata-retrievers/model-metadata-retrieval.md) — Retrieves method names and tensor metadata like shape, type, and size from loaded models. ([source](https://docs.pytorch.org/executorch/main/extension-module.html))

### Development Tools & Productivity

- [On-Device LLM Session Runners](https://awesome-repositories.com/f/development-tools-productivity/command-line-model-inferences/interactive-model-inference-sessions/on-device-llm-session-runners.md) — ExecuTorch loads and executes a language model on-device, wrapping the runtime for text generation tasks. ([source](https://docs.pytorch.org/executorch/main/javadoc/org/pytorch/executorch/extension/llm/package-summary.html))
- [LLM Component Customizations](https://awesome-repositories.com/f/development-tools-productivity/component-configuration/llm-component-assignments/llm-component-customizations.md) — ExecuTorch replaces default tokenizer, sampler, or acceleration backend with a user-defined implementation to adapt the inference pipeline. ([source](https://docs.pytorch.org/executorch/main/llm/getting-started.html))
- [Intermediate Output Inspection](https://awesome-repositories.com/f/development-tools-productivity/debugging-profiling-testing/debugging-diagnostics/debugging-inspection-tools/debugging-and-inspection-tools/intermediate-output-inspection.md) — ExecuTorch inspects intermediate outputs and detects numerical discrepancies between ahead-of-time and runtime execution to validate model correctness. ([source](https://docs.pytorch.org/executorch/main/devtools-overview.html))
- [Operator-to-Source Tracebacks](https://awesome-repositories.com/f/development-tools-productivity/source-code-linking/operator-to-source-tracebacks.md) — ExecuTorch links each runtime-executed operator back to the exact line of Python code that produced it, enabling hotspot identification. ([source](https://docs.pytorch.org/executorch/main/runtime-profiling.html))
- [Event-to-Source Mapping](https://awesome-repositories.com/f/development-tools-productivity/source-map-generators/event-to-source-mapping.md) — ExecuTorch associates captured runtime events with model Python source code using an optional export-time record. ([source](https://docs.pytorch.org/executorch/main/model-debugging.html))
- [Bundled Model Output Validators](https://awesome-repositories.com/f/development-tools-productivity/terminal-output-monitors/output-validation/bundled-model-output-validators.md) — ExecuTorch bundles a model with sample inputs and expected outputs to automatically verify that runtime results match expectations. ([source](https://docs.pytorch.org/executorch/main/devtools-overview.html))

### DevOps & Infrastructure

- [Model Executions](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies/execution-platforms-and-targets/hardware-profile-deployments/embedded-hardware-deployment/model-executions.md) — ExecuTorch loads a serialized program and runs it through a lightweight C++ runtime that dispatches operations to the appropriate kernels or backends. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Model Deployment Platforms](https://awesome-repositories.com/f/devops-infrastructure/model-deployment-platforms.md) — Integrates exported models into Android, iOS, and desktop applications using platform-specific runtime bindings. ([source](https://docs.pytorch.org/executorch/main/pathway-beginner.html))
- [LLM Module Builders](https://awesome-repositories.com/f/devops-infrastructure/configuration-management/application-settings-management/application-module-configuration/module-configuration-stores/module-behavior-configurations/llm-module-builders.md) — ExecuTorch constructs a language model inference module by specifying initialization parameters through a builder pattern. ([source](https://docs.pytorch.org/executorch/main/javadoc/org/pytorch/executorch/extension/llm/package-summary.html))
- [Android Hardware Deployments](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies/execution-platforms-and-targets/hardware-profile-deployments/embedded-hardware-deployment/android-hardware-deployments.md) — Pushes compiled models and runtime libraries to Android devices for inference via the QNN backend. ([source](https://docs.pytorch.org/executorch/main/backends-qualcomm.html))
- [Model Executions](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies/execution-platforms-and-targets/hardware-profile-deployments/embedded-hardware-deployment/android-hardware-deployments/model-executions.md) — Runs exported ML models on Android devices with hardware acceleration for on-device inference. ([source](https://docs.pytorch.org/executorch/main/android-section.html))
- [Single-Command Model Exports](https://awesome-repositories.com/f/devops-infrastructure/deployment-management/model-export-formats/single-command-model-exports.md) — Exports supported large language models to optimized formats using a unified CLI or YAML configuration. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm.html))
- [Model Inference Run Executions](https://awesome-repositories.com/f/devops-infrastructure/workflow-run-management/synchronous-run-executions/model-inference-run-executions.md) — ExecuTorch runs a pre-built program that loads a model file with random inputs to verify execution on a device. ([source](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html))

### Mobile Development

- [AI Model Execution](https://awesome-repositories.com/f/mobile-development/android-runtime-execution/ai-model-execution.md) — Loads serialized programs and runs inference with a lightweight C++ runtime on edge devices. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Mobile Model Deployment](https://awesome-repositories.com/f/mobile-development/mobile-model-deployment.md) — Runs exported models on Android or iOS using platform-specific APIs and sample applications. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm-optimum.html))
- [LLM Mobile Deployments](https://awesome-repositories.com/f/mobile-development/mobile-model-deployment/llm-mobile-deployments.md) — ExecuTorch executes a deployed large language model on Android or iOS hardware, including optional acceleration via Qualcomm AI Engine Direct. ([source](https://docs.pytorch.org/executorch/main/llm/working-with-llms.html))
- [Mobile Framework Integrations](https://awesome-repositories.com/f/mobile-development/mobile-model-deployment/mobile-framework-integrations.md) — ExecuTorch bridges the runtime with mobile toolkits so AI models run natively inside mobile applications. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))

### Networking & Communication

- [Token Streaming](https://awesome-repositories.com/f/networking-communication/real-time-event-streams/token-streaming.md) — ExecuTorch produces output tokens one at a time from a loaded model, streaming the generated text as it is produced. ([source](https://docs.pytorch.org/executorch/main/llm/run-on-android.html))
- [Token Generation Callbacks](https://awesome-repositories.com/f/networking-communication/callback-based-data-streaming/token-generation-callbacks.md) — ExecuTorch receives each generated token via a callback function, enabling real-time output display during inference. ([source](https://docs.pytorch.org/executorch/main/llm/run-with-c-plus-plus.html))

### Operating Systems & Systems Programming

- [Fixed-Size Memory Pools](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/custom-memory-allocators/managed-memory-allocators/model-memory-managers/fixed-size-memory-pools.md) — ExecuTorch provides a fixed-size memory pool that the runtime uses to allocate intermediate tensors during model inference. ([source](https://docs.pytorch.org/executorch/main/executorch-runtime-api-reference.html))
- [Operator Kernel Selective Linking](https://awesome-repositories.com/f/operating-systems-systems-programming/operating-system-kernel-build-tools/static-linking/operator-kernel-selective-linking.md) — ExecuTorch links only the kernels and backend libraries required by the deployed model to minimize the application's binary footprint. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Operator Kernel Implementations](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/kernel-development/kernel-driver-implementation/operator-kernel-implementations.md) — ExecuTorch adds or replaces operator implementations in the kernel library to support specialized hardware or logic. ([source](https://docs.pytorch.org/executorch/main/advanced-topics-section.html))
- [Model Memory Configurations](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/memory-allocation-libraries/low-level-system-operations/low-level-systems-programming/hardware-resource-control/model-memory-configurations.md) — ExecuTorch configures memory allocation, placement, and data loading directly for advanced hardware-specific use cases. ([source](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html))

### Programming Languages & Runtimes

- [Static Memory Planning](https://awesome-repositories.com/f/programming-languages-runtimes/ahead-of-time-compilation/static-memory-planning.md) — Plans memory allocation statically during compilation to avoid dynamic allocation overhead at runtime on edge devices. ([source](https://docs.pytorch.org/executorch/main/getting-started-architecture.html))
- [Model-Specific Memory Planning](https://awesome-repositories.com/f/programming-languages-runtimes/ahead-of-time-compilation/static-memory-planning/model-specific-memory-planning.md) — Pre-allocates memory buffers for tensors at compile time to avoid runtime allocation overhead and reduce peak memory usage. ([source](https://docs.pytorch.org/executorch/main/intro-how-it-works.html))
- [Inference Runtime Binary Reduction](https://awesome-repositories.com/f/programming-languages-runtimes/binary-size-optimizations/wasm-binary-size-reduction/inference-runtime-binary-reduction.md) — ExecuTorch links only the kernels and operators required by a specific model, reducing the binary size for deployment on resource-constrained devices. ([source](https://docs.pytorch.org/executorch/main/concepts.html))
- [C++ Inference Runtimes](https://awesome-repositories.com/f/programming-languages-runtimes/c-inference-runtimes.md) — ExecuTorch links the inference engine into a C++ application and manages model loading, tensor memory, and custom module extensions. ([source](https://docs.pytorch.org/executorch/main/embedded-section.html))
- [C++ Model Interfaces](https://awesome-repositories.com/f/programming-languages-runtimes/c-model-interfaces.md) — ExecuTorch loads and runs a compiled model using C++ APIs, enabling inference on embedded systems with or without dynamic memory. ([source](https://docs.pytorch.org/executorch/main/getting-started.html))
- [Model Program Loading](https://awesome-repositories.com/f/programming-languages-runtimes/model-program-loading.md) — Loads serialized model programs from files or data sources for execution on target devices. ([source](https://docs.pytorch.org/executorch/main/executorch-runtime-api-reference.html))
- [Model Method Initializations](https://awesome-repositories.com/f/programming-languages-runtimes/runtime-initializations/model-method-initializations.md) — Runs named methods on loaded models, passing tensors and retrieving outputs from the runtime. ([source](https://docs.pytorch.org/executorch/main/executorch-runtime-api-reference.html))
- [Backend-Enabled Builds](https://awesome-repositories.com/f/programming-languages-runtimes/source-code-compilers/runtime-builds/backend-enabled-builds.md) — ExecuTorch activates support for specific hardware accelerators by setting build flags for each backend. ([source](https://docs.pytorch.org/executorch/main/using-executorch-building-from-source.html))
- [Backend Binary Compilation](https://awesome-repositories.com/f/programming-languages-runtimes/binary-blob-management/backend-binary-compilation.md) — Transforms tagged model subgraphs into compiled binary blobs for direct loading onto target hardware. ([source](https://docs.pytorch.org/executorch/main/compiler-delegate-and-partitioner.html))
- [Multi-Language Runtime Bindings](https://awesome-repositories.com/f/programming-languages-runtimes/language-interoperability/interoperability/multi-language-runtime-bindings.md) — Provides inference APIs for Java, Kotlin, Objective-C, and C++ to integrate the engine into diverse application environments.

### Software Engineering & Architecture

- [Intermediate Representations](https://awesome-repositories.com/f/software-engineering-architecture/data-formats/intermediate-representations.md) — Defines a portable graph format that decouples model authoring from target-hardware execution. ([source](https://docs.pytorch.org/executorch/main/compiler-ir-advanced.html))
- [Multi-Platform Mobile Deployers](https://awesome-repositories.com/f/software-engineering-architecture/development-methodologies/application-targets-domains/embedded-systems-development/embedded-ai-deployment/multi-platform-mobile-deployers.md) — Integrates and runs machine learning models on Android, iOS, and microcontrollers with multi-language runtime bindings.
- [Computation Subgraph Delegation](https://awesome-repositories.com/f/software-engineering-architecture/hardware-abstraction-layers/delegate-based-hardware-abstraction/computation-subgraph-delegation.md) — A system that partitions model subgraphs and offloads computation to accelerators like NPUs, GPUs, and DSPs.
- [Platform Abstraction Layers](https://awesome-repositories.com/f/software-engineering-architecture/platform-abstraction-layers.md) — ExecuTorch isolates platform-specific system calls behind a thin abstraction layer so the runtime can be re-targeted without changing model code. ([source](https://docs.pytorch.org/executorch/main/runtime-integration-advanced.html))
- [Model Graph Passes](https://awesome-repositories.com/f/software-engineering-architecture/compiler-optimizations/pass-pipeline-customization/model-graph-passes.md) — Ships a custom pass system for applying user-defined transformations to model graphs during compilation. ([source](https://docs.pytorch.org/executorch/main/compiler-ir-advanced.html))
- [Runtime Module Integrations](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/plugin-architectures/developer-authoring-interfaces/custom-module-implementations/module-functionality-extenders/functional-module-integrators/runtime-module-integrations.md) — ExecuTorch adds custom operators or modules to the runtime to support model-specific or hardware-specific logic. ([source](https://docs.pytorch.org/executorch/main/api-section.html))
- [Runtime Feature Extensions](https://awesome-repositories.com/f/software-engineering-architecture/library-extension-modules/optional-module-imports/runtime-feature-extensions.md) — ExecuTorch includes extra runtime features like data loading, tensor management, or LLM support via build options. ([source](https://docs.pytorch.org/executorch/main/using-executorch-building-from-source.html))
- [Runtime Customizations](https://awesome-repositories.com/f/software-engineering-architecture/runtime-abstraction-layers/runtime-customizations.md) — ExecuTorch adapts the on-device runtime through a platform abstraction layer and custom integration code. ([source](https://docs.pytorch.org/executorch/main/advanced-topics-section.html))

### Graphics & Multimedia

- [Export Pipeline Profiling](https://awesome-repositories.com/f/graphics-multimedia/video-converters/multi-format-exporters/multi-format-asset-exports/model-export-pipelines/export-pipeline-profiling.md) — ExecuTorch generates detailed logs of operator delegation, memory usage, and export timing to diagnose performance bottlenecks. ([source](https://docs.pytorch.org/executorch/main/llm/export-llm.html))
- [OpenVINO Model Exports](https://awesome-repositories.com/f/graphics-multimedia/video-converters/multi-format-exporters/multi-format-asset-exports/model-export-pipelines/openvino-model-exports.md) — Exports PyTorch models to an optimized format using the OpenVINO toolkit for Intel hardware. ([source](https://docs.pytorch.org/executorch/main/build-run-openvino.html))

### Hardware & IoT

- [Inference Runtimes](https://awesome-repositories.com/f/hardware-iot/arm-ethos-u-npu-delegation/inference-runtimes.md) — Provides a runtime that executes models on Arm Ethos-U NPUs with optimized kernels. ([source](https://docs.pytorch.org/executorch/main/success-stories.html))

### System Administration & Monitoring

- [Execution Tracing](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/execution-tracing-analysis/execution-tracing.md) — Generates trace files during program execution to record debug and performance data for analysis. ([source](https://docs.pytorch.org/executorch/main/runtime-python-api-reference.html))
- [Post-Export Debug Linkers](https://awesome-repositories.com/f/system-administration-monitoring/observability-tracing/batch-export-utilities/trace-exporters/observability-data-exporters/export-debug-artifacts/post-export-debug-linkers.md) — ExecuTorch links runtime performance and debug data back to the original Python source code for post-execution analysis. ([source](https://docs.pytorch.org/executorch/main/etrecord.html))
- [Delegated Execution Mappers](https://awesome-repositories.com/f/system-administration-monitoring/telemetry-correlation/code-telemetry-correlation-tools/assembly-to-source-mappers/delegated-execution-mappers.md) — ExecuTorch maps runtime errors or profiling data from a delegated subgraph back to the original PyTorch code. ([source](https://docs.pytorch.org/executorch/main/compiler-delegate-and-partitioner.html))

### Testing & Quality Assurance

- [Compiled Model Validations](https://awesome-repositories.com/f/testing-quality-assurance/compiled-model-validations.md) — ExecuTorch loads a compiled file with Python bindings and runs inference to verify model accuracy before deployment. ([source](https://docs.pytorch.org/executorch/main/getting-started.html))
- [On-Device Runtime Debugging](https://awesome-repositories.com/f/testing-quality-assurance/on-device-runtime-debugging.md) — Captures runtime performance, memory usage, and numerical accuracy of models executing on target hardware.
- [On-Device Model Profilers](https://awesome-repositories.com/f/testing-quality-assurance/on-device-runtime-debugging/on-device-model-profilers.md) — ExecuTorch instruments model execution to collect performance metrics and diagnose issues during inference on the target platform. ([source](https://docs.pytorch.org/executorch/main/_sources/index.md.txt))
- [On-Device Sanity Checks](https://awesome-repositories.com/f/testing-quality-assurance/on-device-sanity-checks.md) — ExecuTorch runs a pre-built executable that loads a model file and executes it with random inputs for quick validation. ([source](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html))

### User Interface & Experience

- [Multimodal Input Processors](https://awesome-repositories.com/f/user-interface-experience/form-and-input-management/input-handling/multimodal-input-processors.md) — ExecuTorch accepts image and audio data alongside a text prompt for models that support multiple input modalities, then generates a response. ([source](https://docs.pytorch.org/executorch/main/llm/run-on-android.html))