awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Inference Engines · Awesome GitHub Repositories

7 repos

Awesome GitHub RepositoriesInference Engines

Runtime environments designed to execute pre-trained neural network models with optimized performance and efficiency.

Explore 7 awesome GitHub repositories matching artificial intelligence & ml · Inference Engines. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Machine Learning
  4. Infrastructure
  5. Model Inference and Serving
  6. Inference Engines

Awesome Inference Engines GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • nomic-ai/gpt4all

    nomic-ai/gpt4all

    77,146GitHubView on GitHub↗

    GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh

    C++ai-chatllm-inference
  • PaddlePaddle/PaddleOCR

    PaddlePaddle/PaddleOCR

    70,931GitHubView on GitHub↗

    PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen

    Pythonai4sciencechineseocrdocument-parsing
  • vllm-project/vllm

    vllm-project/vllm

    70,745GitHubView on GitHub↗

    vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen

    Pythonamdblackwellcuda
  • ultralytics/yolov5

    ultralytics/yolov5

    56,830GitHubView on GitHub↗

    YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning

    Pythoncoremldeep-learningios
  • ultralytics/ultralytics

    ultralytics/ultralytics

    53,426GitHubView on GitHub↗

    Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification

    Pythonclicomputer-visiondeep-learning
  • facebookresearch/segment-anything

    facebookresearch/segment-anything

    53,431GitHubView on GitHub↗

    This project provides a deep learning architecture designed to identify and isolate distinct objects within images by generating precise pixel-level masks. It functions as a browser-based inference engine, enabling the execution of complex machine learning models directly within web environments without requiring serve

    Jupyter Notebook
  • unslothai/unsloth

    unslothai/unsloth

    52,461GitHubView on GitHub↗

    Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade

    Pythonagentdeepseekdeepseek-r1

Explore sub-tags

  • C++ Inference BackendsHigh-performance tensor computation engines written in C++.
  • Computer Vision InferenceExecution of vision-based models using standard libraries for real-time object detection.
  • Deep LearningHigh-performance runtimes that execute neural network models across CPUs, GPUs, and specialized accelerators.
  • Hardware-Agnostic Inference Layers
Abstraction layers that decouple model execution logic from specific hardware backends.
  • Local Inference RuntimesDeployment environments that run quantized models on local hardware with API support.
  • ONNX Runtime InferenceExecuting models using the cross-platform ONNX runtime for consistent performance.
  • Request SchedulersComponents that manage and prioritize incoming inference requests to optimize throughput and latency.
  • Streaming Inference ProcessorsExecution engines designed to process continuous streams of data using memory-efficient generators.