7 repos

Awesome GitHub RepositoriesInference Optimization

Techniques and configurations that enhance model execution speed, reduce memory usage, and improve computational efficiency during inference.

Explore 7 awesome GitHub repositories matching artificial intelligence & ml · Inference Optimization. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

tensorflow/tensorflow
tensorflow/tensorflow
193,864GitHubView on GitHub
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Optimizes execution performance by setting specific model weights to zero through target-aware authoring and specialized kernels.
C++deep-learningdeep-neural-networksdistributed
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Activates optimized execution paths through specific configuration parameters to boost performance in production environments.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Dynamically inserts new sequences into active inference batches to maximize hardware utilization.
Pythonamdblackwellcuda
dair-ai/Prompt-Engineering-Guide
dair-ai/Prompt-Engineering-Guide
70,526GitHubView on GitHub
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Reviews high-performance infrastructure solutions designed to minimize latency and maximize throughput for model inference.
MDXagentagentsai-agents
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Reduces numerical precision in model weights to lower memory footprint and accelerate inference on local devices.
Python
ultralytics/yolov5
ultralytics/yolov5
56,830GitHubView on GitHub
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning
Decreases model size and improves execution speed by setting a specific percentage of weights to zero.
Pythoncoremldeep-learningios
unslothai/unsloth
unslothai/unsloth
52,461GitHubView on GitHub
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade
Predicts multiple future tokens in parallel to accelerate the generation process and reduce total processing steps.
Pythonagentdeepseekdeepseek-r1

Explore sub-tags

7 repos

Awesome GitHub RepositoriesInference Optimization

Techniques and configurations that enhance model execution speed, reduce memory usage, and improve computational efficiency during inference.

Explore 7 awesome GitHub repositories matching artificial intelligence & ml · Inference Optimization. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

tensorflow/tensorflow
tensorflow/tensorflow
193,864GitHubView on GitHub
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Optimizes execution performance by setting specific model weights to zero through target-aware authoring and specialized kernels.
C++deep-learningdeep-neural-networksdistributed
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Activates optimized execution paths through specific configuration parameters to boost performance in production environments.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Dynamically inserts new sequences into active inference batches to maximize hardware utilization.
Pythonamdblackwellcuda
dair-ai/Prompt-Engineering-Guide
dair-ai/Prompt-Engineering-Guide
70,526GitHubView on GitHub
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Reviews high-performance infrastructure solutions designed to minimize latency and maximize throughput for model inference.
MDXagentagentsai-agents
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Reduces numerical precision in model weights to lower memory footprint and accelerate inference on local devices.
Python
ultralytics/yolov5
ultralytics/yolov5
56,830GitHubView on GitHub
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning
Decreases model size and improves execution speed by setting a specific percentage of weights to zero.
Pythoncoremldeep-learningios
unslothai/unsloth
unslothai/unsloth
52,461GitHubView on GitHub
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade
Predicts multiple future tokens in parallel to accelerate the generation process and reduce total processing steps.
Pythonagentdeepseekdeepseek-r1

Awesome Inference Optimization GitHub Repositories

tensorflow/tensorflow

PaddlePaddle/PaddleOCR

vllm-project/vllm

dair-ai/Prompt-Engineering-Guide

meta-llama/llama

ultralytics/yolov5

unslothai/unsloth

Explore sub-tags

Awesome Inference Optimization GitHub Repositories

tensorflow/tensorflow

PaddlePaddle/PaddleOCR

vllm-project/vllm

dair-ai/Prompt-Engineering-Guide

meta-llama/llama

ultralytics/yolov5

unslothai/unsloth

Explore sub-tags