Paddle Lite

Features

Inference Execution Engines - Provides a high-performance inference execution engine designed for mobile and edge device environments.
Hardware-Aware Operator Kernels - Implements hardware-aware operator kernels optimized for specific chip architectures to maximize tensor operation throughput.
Deep Learning Inference Engines - Provides a high-performance deep learning inference engine for running models on mobile and edge devices.
Edge AI Runtimes - Offers a decoupled edge AI runtime with a minimal binary footprint for resource-constrained hardware.
Hardware-Accelerated Inference - Implements a framework for executing machine learning models directly on specialized hardware accelerators.
Inference Deployment Engines - Ships a minimal-footprint inference deployment engine for resource-constrained environments without third-party dependencies.
Edge AI Model Deployment - Optimizes and deploys machine learning models to run efficiently on local edge devices and IoT hardware.
Model Quantization Tools - Ships a model quantization tool for reducing precision and size through static and dynamic methods.
Model Graph Optimizers - Includes a model graph optimizer that simplifies execution paths to improve inference performance.
Model Quantization - Provides techniques for reducing model weight precision to decrease memory footprint on mobile hardware.
Dynamic Quantization - Supports both static and dynamic quantization pipelines to reduce model size and inference latency.
Weight Quantization - Compresses model weights into lower-precision formats to accelerate inference speed and reduce memory usage.
Mixed-Accelerator Orchestration - Orchestrates computation tasks across mixed hardware accelerators including CPUs, GPUs, and NPUs.
Graph and Operator Optimizations - Refines computation graphs and fuses operators to lower latency for real-time AI applications on end-user devices.
Hardware-Aware Deployment - Employs hardware-aware deployment to execute deep learning models across diverse CPU, GPU, and NPU backends.
Mobile Inference Deployments - Enables the deployment and execution of pre-trained neural networks on smartphones and tablets.
Intermediate Representation Analysis - Analyzes and prunes the model's intermediate representation to remove redundant nodes and optimize memory allocation.
Kernel Fusion Operations - Merges multiple mathematical operations into single kernels to reduce memory access and improve execution speed.

Open-source alternatives to Paddle Lite

Similar open-source projects, ranked by how many features they share with Paddle Lite.

pytorch/executorch
pytorch/executorch
4,296View on GitHub
ExecuTorch is a lightweight C++ runtime for deploying PyTorch models on mobile, embedded, and edge hardware. It provides an ahead-of-time compilation pipeline that exports, quantizes, and lowers model graphs into compact serialized programs, then executes them through a minimal runtime with hardware acceleration and on-device large language model inference capabilities. The project distinguishes itself through a hardware accelerator delegate system that partitions model subgraphs and offloads computation to specialized backends including NPUs, GPUs, and DSPs from Apple, Arm, Intel, MediaTek,
Pythondeep-learningembeddedgpu
View on GitHub4,296
dusty-nv/jetson-inference
dusty-nv/jetson-inference
8,734View on GitHub
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
C++caffecomputer-visiondeep-learning
View on GitHub8,734
intel/neural-compressor
intel/neural-compressor
2,585View on GitHub
Neural Compressor is a deep learning model compression toolkit and AI inference acceleration engine. It functions as an automated model quantization tool and hardware-aware model compiler designed to reduce the memory footprint of neural networks and decrease execution latency. The project provides specialized frameworks for optimizing large language models, utilizing weight-only quantization and hardware-specific kernels to improve the operational efficiency of generative AI workloads. It maps neural network operators to specialized CPU and GPU vector instructions to accelerate model executi
Pythonauto-tuningawqfp4
View on GitHub2,585
alibaba/mnn
alibaba/MNN
14,242View on GitHub
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
C++armconvolutiondeep-learning
View on GitHub14,242

See all 30 alternatives to Paddle Lite

PaddlePaddlePaddle-Lite

Features

Open-source alternatives to Paddle Lite

pytorch/executorch

dusty-nv/jetson-inference

intel/neural-compressor

alibaba/MNN

Star history

Open-source alternatives to Paddle Lite

pytorch/executorch

dusty-nv/jetson-inference

intel/neural-compressor

alibaba/MNN