Onnx Tensorrt

This project is a deep learning model compiler and parser that converts ONNX models into optimized TensorRT engines. It functions as a bridge that maps standardized ONNX operators to vendor-specific kernels to enable high-performance inference on NVIDIA GPUs.

The system operates as a GPU inference optimizer, selecting hardware-specific kernels and tuning memory allocation to maximize throughput. It transforms neural network graphs into serialized binary execution plans to reduce runtime overhead.

The toolset covers deep learning model deployment and edge AI performance tuning. It includes capabilities for inference engine compilation and model compatibility verification to ensure models can be parsed and executed on target devices.

Features

ONNX-to-TensorRT Conversions - Converts ONNX models into optimized TensorRT engines for high-performance inference on NVIDIA GPUs.

Inference Optimization Tools - Applies hardware-specific kernel selection and memory tuning to maximize machine learning model throughput.

Model Deployment - Preparing machine learning models for production by ensuring they can be parsed and executed efficiently on target devices.

GPU Kernel Selection Heuristics - Automatically selects the most efficient GPU execution kernels based on the target hardware architecture and available memory.

Graph-Based Inference - Translates standardized ONNX model graphs into a format compatible with hardware-accelerated inference backends.

Operator Mappings - Maps generic neural network operators from the ONNX standard to highly optimized vendor-specific kernels.

Inference Engine Compilers - Builds hardware-specific inference engines from model definitions to eliminate runtime parsing overhead.

Ahead-of-Time Kernel Compilation - Implements ahead-of-time compilation of model weights and topology into binary engine files to eliminate runtime overhead.

Neural Network Binary Serialization - Transforms neural network graphs into serialized binary execution plans to reduce runtime overhead.

Dynamic Tensor Shapes - Determines optimal memory allocation and tensor dimensions during the build process to support variable input sizes.

Edge AI Runtimes - Optimizes model execution and latency for real-time applications running on embedded or server-grade GPUs.

ONNX-to-TensorRT Compatibility Verifications - Verifies if an ONNX model can be successfully converted to a specific TensorRT version before production deployment.

Model Runtime Compatibility Verifications - Provides command-line tools to verify if a model can be parsed and built into an engine before deployment.

Model Execution Plan Caching - Stores optimized execution plans on disk to enable rapid model loading without repeating the optimization process.

Model-to-Runtime Compatibility Verifications - Tests whether an ONNX model is compatible with specific TensorRT versions before deployment.

onnxonnx-tensorrt

Features

Star history