7 مستودعات
Tools for building hardware-specific inference engines from model definitions.
Distinguishing note: Focuses on compilation for inference rather than general software compilation.
Explore 7 awesome GitHub repositories matching devops & infrastructure · Inference Engine Compilers. Refine with filters or upvote what's useful.
InsightFace is a comprehensive deep learning framework designed for face recognition, biometric identity verification, and feature extraction. It provides a specialized engine for one-to-one verification and one-to-many identification tasks, utilizing convolutional neural networks to transform raw image pixels into high-dimensional vector embeddings. The project includes a complete toolkit for detecting, aligning, and processing facial data to ensure consistent identity discrimination. Beyond core recognition, the platform distinguishes itself through an extensive model management and optimiz
Builds hardware-specific inference engines from simplified models for production.
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
Supports building custom, hardware-specific inference engines from source to optimize performance for target environments.
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
Transforms trained models into optimized hardware-specific formats to increase throughput and reduce latency.
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Creates lightweight, cross-OS and cross-GPU portable inference engines directly on target hardware.
tensorrtx is a computer vision inference engine and model implementation library designed for graphics processor acceleration. It provides a framework for optimizing deep learning models through a GPU inference optimizer, a deep learning model converter for transforming weights from frameworks like TensorFlow and PyTorch, and a custom plugin library to implement operations not natively supported by the TensorRT API. The project distinguishes itself through a comprehensive collection of pre-defined network implementations, ranging from various YOLO versions and DETR transformers for object det
Compiles neural network architectures into optimized hardware-specific engines for high-performance execution.
Transforms trained models into optimized engines using quantization, layer fusion, and kernel tuning.
This project is a deep learning model compiler and parser that converts ONNX models into optimized TensorRT engines. It functions as a bridge that maps standardized ONNX operators to vendor-specific kernels to enable high-performance inference on NVIDIA GPUs. The system operates as a GPU inference optimizer, selecting hardware-specific kernels and tuning memory allocation to maximize throughput. It transforms neural network graphs into serialized binary execution plans to reduce runtime overhead. The toolset covers deep learning model deployment and edge AI performance tuning. It includes ca
Builds hardware-specific inference engines from model definitions to eliminate runtime parsing overhead.