# nvidia-ai-iot/torch2trt

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nvidia-ai-iot-torch2trt).**

4,877 stars · 699 forks · Python · MIT

## Links

- GitHub: https://github.com/NVIDIA-AI-IOT/torch2trt
- awesome-repositories: https://awesome-repositories.com/repository/nvidia-ai-iot-torch2trt.md

## Topics

`classification` `inference` `jetson-nano` `jetson-tx2` `jetson-xavier` `pytorch` `tensorrt`

## Description

torch2trt is a tool for transforming PyTorch model modules into optimized TensorRT engines to improve inference performance on NVIDIA GPUs. It functions as a deep learning model optimizer and engine generator that converts neural network layers into high-performance runtime formats for hardware-accelerated graphics processors.

The project features a custom layer conversion tool that allows users to define and register Python-based conversion logic to handle specialized operations not supported by default. This extensibility is paired with a registry-based system for mapping specific layer types to user-defined conversion functions.

The system covers GPU inference acceleration through deep learning model quantization and quantization aware training to reduce memory usage and increase throughput. It also includes capabilities for model persistence, allowing the state of optimized engines to be stored and reloaded.

## Tags

### Artificial Intelligence & ML

- [TensorRT Framework Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/ml-library-integrations/tensorrt-framework-integrations.md) — Integrates TensorRT optimization into PyTorch by translating deep learning operations into optimized graphs.
- [Custom Neural Network Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-neural-network-layers.md) — Supports the implementation of specialized conversion logic for non-standard neural network operations.
- [GPU-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference.md) — Accelerates GPU inference by transforming deep learning models into highly optimized hardware-specific formats.
- [Deep Learning Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/training-algorithms/deep-learning-optimization.md) — Reduces model latency and memory usage through precision optimization and computational graph refinement.
- [ONNX-to-TensorRT Conversions](https://awesome-repositories.com/f/artificial-intelligence-ml/ml-library-integrations/tensorrt-framework-integrations/onnx-to-tensorrt-conversions.md) — Transforms model modules into optimized TensorRT engines to improve inference performance on GPUs. ([source](https://github.com/nvidia-ai-iot/torch2trt#readme))
- [PyTorch-to-TensorRT Converters](https://awesome-repositories.com/f/artificial-intelligence-ml/ml-library-integrations/tensorrt-framework-integrations/onnx-to-tensorrt-conversions/pytorch-to-tensorrt-converters.md) — Transforms PyTorch model modules into optimized TensorRT engines to improve GPU inference performance.
- [TensorRT Engine Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/ml-library-integrations/tensorrt-framework-integrations/onnx-to-tensorrt-conversions/tensorrt-engine-generators.md) — Converts neural network layers into high-performance runtime engines for hardware-accelerated graphics processors.
- [Model Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization.md) — Reduces the precision of model weights to decrease memory footprint and increase GPU throughput.
- [Precision Quantization](https://awesome-repositories.com/f/artificial-intelligence-ml/precision-quantization.md) — Implements precision quantization to reduce memory usage and accelerate GPU inference.
- [PyTorch Model Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-model-optimizations.md) — Optimizes PyTorch models by converting them into TensorRT engines for lower latency on NVIDIA GPUs.
- [Model Conversion Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/model-conversion-utilities.md) — Provides a utility for defining and registering Python-based conversion logic for model transformation.
- [Quantization-Aware Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-quantization/quantization-aware-training.md) — Implements training techniques that simulate quantization noise to optimize model precision. ([source](https://github.com/nvidia-ai-iot/torch2trt#readme))
- [Recursive Module Operations](https://awesome-repositories.com/f/artificial-intelligence-ml/recursive-module-operations.md) — Provides utilities for recursively traversing the PyTorch model hierarchy to transform modules into TensorRT representations.

### Data & Databases

- [Layer Conversion Registries](https://awesome-repositories.com/f/data-databases/data-type-managers/dynamic-type-managers/custom-type-serializers/custom-type-converters/layer-conversion-registries.md) — Features a registry-based system to map specific layer types to user-defined conversion functions.

### Software Engineering & Architecture

- [Scripting Extension Layers](https://awesome-repositories.com/f/software-engineering-architecture/scripting-extension-layers.md) — Exposes a scripting extension layer using Python to extend the functionality of the conversion process. ([source](https://github.com/nvidia-ai-iot/torch2trt#readme))