awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Inference Runtimes · Awesome GitHub Repositories

6 repos

Awesome GitHub RepositoriesInference Runtimes

Execution environments designed to load and run machine learning models for real-time or high-performance inference tasks.

Explore 6 awesome GitHub repositories matching artificial intelligence & ml · Inference Runtimes. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Machine Learning
  4. Infrastructure
  5. Deployment & Serving
  6. Inference Servers and Runtimes
  7. Inference Runtimes

Awesome Inference Runtimes GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • huggingface/transformers

    huggingface/transformers

    156,730GitHubView on GitHub↗

    Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering

    Exports models into a portable format with ahead-of-time memory planning and hardware-specific operation dispatch for edge device inference.

    Pythonaudiodeep-learningdeepseek
  • hacksider/Deep-Live-Cam

    hacksider/Deep-Live-Cam

    79,568GitHubView on GitHub↗

    Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi

    Optimizes generative models for low-latency, real-time inference on consumer-grade hardware.

    Pythonaiai-deep-fakeai-face
  • nomic-ai/gpt4all

    nomic-ai/gpt4all

    77,146GitHubView on GitHub↗

    GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh

    Delivers a cross-platform execution environment for running large language models locally on consumer hardware.

    C++ai-chatllm-inference
  • PaddlePaddle/PaddleOCR

    PaddlePaddle/PaddleOCR

    70,931GitHubView on GitHub↗

    PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen

    Facilitates the deployment of text extraction models as scalable services across various hardware environments.

    Pythonai4sciencechineseocrdocument-parsing
  • meta-llama/llama

    meta-llama/llama

    59,157GitHubView on GitHub↗

    Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on

    Executes model checkpoints locally with configurable parameters like sequence length and batch size to optimize performance.

    Python
  • ultralytics/yolov5

    ultralytics/yolov5

    56,830GitHubView on GitHub↗

    YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning

    Executes high-speed visual inference using hardware-accelerated processing and test-time augmentation.

    Pythoncoremldeep-learningios

Explore sub-tags

  • Edge ModelLightweight runtimes optimized for edge device deployment.
  • High-Performance AI InferenceOptimized model execution for low-latency, real-time video manipulation on consumer hardware.
  • Inference Deployment EnginesSystems that facilitate the execution of trained models across diverse hardware backends including CPUs, GPUs, and mobile processors.
Local Inference Runners
Tools that execute model checkpoints on local hardware with configurable parameters.
  • Local-First AI RuntimesExecution environments that enable machine learning models to run locally on consumer hardware.
  • Real-TimeHigh-speed execution environments designed for low-latency model inference and immediate data processing.