awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Model Inference and Serving · Awesome GitHub Repositories

32 repos

Awesome GitHub RepositoriesModel Inference and Serving

Platforms and techniques for deploying, optimizing, and serving machine learning models for production use.

Explore 32 awesome GitHub repositories matching artificial intelligence & ml · Model Inference and Serving. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Machine Learning
  4. Infrastructure
  5. Model Inference and Serving

Awesome Model Inference and Serving GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • keras-team/keras

    keras-team/keras

    63,858GitHubView on GitHub↗

    Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a di

    Exposes unified interfaces to switch between various computational backends for consistent model execution.

    Pythondata-sciencedeep-learningjax
  • traefik/traefik

    traefik/traefik

    61,814GitHubView on GitHub↗

    Traefik is a cloud-native edge router and API gateway designed to manage service communication and traffic flow across distributed infrastructure. It functions as a dynamic service proxy that automatically discovers backend services and configures routing rules in real time, eliminating the need for manual restarts or

    Caches model responses based on query semantics to minimize redundant computation and lower inference latency.

    Goconsuldockeretcd
  • meta-llama/llama

    meta-llama/llama

    59,157GitHubView on GitHub↗

    Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on

    Reduces numerical precision in model weights to lower memory footprint and accelerate inference on local devices.

    Python
  • cline/cline

    cline/cline

    58,164GitHubView on GitHub↗

    Cline is an extensible agent runtime and multi-agent orchestration engine designed to automate complex software engineering workflows. It functions as an integrated development environment extension that bridges strategic task planning with autonomous execution, allowing users to manage multi-step projects through huma

    Connects various local and cloud-based language models to facilitate automated software engineering workflows.

    TypeScript
  • ultralytics/yolov5

    ultralytics/yolov5

    56,830GitHubView on GitHub↗

    YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning

    Decreases model size and improves execution speed by setting a specific percentage of weights to zero.

    Pythoncoremldeep-learningios
  • AntonOsika/gpt-engineer

    AntonOsika/gpt-engineer

    55,201GitHubView on GitHub↗

    GPT-Engineer is an autonomous agent and framework designed for AI-assisted software development. It functions as a generative codebase architect that translates natural language requirements into complete, functional software projects by reading and writing files directly to the local file system. The platform disting

    Supports the deployment and integration of various local and cloud-based language models for generative tasks.

    Pythonaiautonomous-agentcode-generation
  • Mintplex-Labs/anything-llm

    Mintplex-Labs/anything-llm

    54,751GitHubView on GitHub↗

    This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that a

    Deploys language model interfaces and data processing engines directly onto local hardware for private, self-hosted operations.

    JavaScriptai-agentscustom-ai-agentsdeepseek
  • karpathy/nanoGPT

    karpathy/nanoGPT

    53,461GitHubView on GitHub↗

    nanoGPT is a lightweight engine for training and fine-tuning transformer-based language models from scratch. It provides a minimalist codebase designed for educational exploration and rapid experimentation with neural network architectures, utilizing self-attention and feed-forward layers to process sequences and predi

    Exposes a command-line interface for sampling text sequences with adjustable generation settings.

    Python
  • ultralytics/ultralytics

    ultralytics/ultralytics

    53,426GitHubView on GitHub↗

    Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification

    Parses and structures raw model outputs into usable formats like bounding boxes, masks, and keypoint coordinates.

    Pythonclicomputer-visiondeep-learning
  • facebookresearch/segment-anything

    facebookresearch/segment-anything

    53,431GitHubView on GitHub↗

    This project provides a deep learning architecture designed to identify and isolate distinct objects within images by generating precise pixel-level masks. It functions as a browser-based inference engine, enabling the execution of complex machine learning models directly within web environments without requiring serve

    Enables the execution of sophisticated deep learning models directly within the browser environment using hardware-accelerated runtimes.

    Jupyter Notebook
  • unslothai/unsloth

    unslothai/unsloth

    52,461GitHubView on GitHub↗

    Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade

    Reduces memory usage and increases processing speed during the fine-tuning of large models for specific applications.

    Pythonagentdeepseekdeepseek-r1
  • tensorflow/tfjs-examples

    tensorflow/tfjs-examples

    6,783GitHubView on GitHub↗

    This repository provides a collection of practical demonstrations and implementation guides for machine learning tasks using TensorFlow.js. It serves as a resource for developers to explore model architectures, training workflows, and data manipulation techniques across domains such as computer vision, natural language

    Backend-specific kernels register optimized logic for operations, enabling efficient memory access and dispatch during execution.

    JavaScript
Prev12Next

Explore sub-tags

  • Engines, Runtimes & Servers7 sub-tags
  • Inference Engines8 sub-tagsRuntime environments designed to execute pre-trained neural network models with optimized performance and efficiency.
  • Inference Optimization6 sub-tagsTechniques and configurations that enhance model execution speed, reduce memory usage, and improve computational efficiency during inference.
  • Local AI Deployment Platforms
2 sub-tags
Platforms for deploying and managing language model interfaces and data processing tasks on local hardware.
  • Model Integration & Pipelines4 sub-tags
  • Request Routing & Gateways6 sub-tags
  • Runtime Interfaces & Orchestration4 sub-tags