awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Model Inference and Serving · Awesome GitHub Repositories

32 repos

Awesome GitHub RepositoriesModel Inference and Serving

Platforms and techniques for deploying, optimizing, and serving machine learning models for production use.

Explore 32 awesome GitHub repositories matching artificial intelligence & ml · Model Inference and Serving. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Machine Learning
  4. Infrastructure
  5. Model Inference and Serving

Awesome Model Inference and Serving GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • tensorflow/tensorflow

    tensorflow/tensorflow

    193,864GitHubView on GitHub↗

    TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst

    Optimizes execution performance by setting specific model weights to zero through target-aware authoring and specialized kernels.

    C++deep-learningdeep-neural-networksdistributed
  • huggingface/transformers

    huggingface/transformers

    156,730GitHubView on GitHub↗

    Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering

    Extends standard library functionality with specialized loaders for device mapping, quantization, and custom attention backends.

    Pythonaudiodeep-learningdeepseek
  • immich-app/immich

    immich-app/immich

    92,953GitHubView on GitHub↗

    Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining ful

    Processes machine learning tasks using externalized models and thread pools to optimize performance for image and text analysis.

    TypeScriptbackup-toolfluttergoogle-photos
  • opencv/opencv

    opencv/opencv

    86,238GitHubView on GitHub↗

    OpenCV is a comprehensive computer vision library designed for real-time performance and cross-platform deployment. It provides a native execution environment that leverages multi-threaded operations and automated memory management to handle intensive computational tasks, including image processing and machine learning

    Executes pre-trained neural networks to perform classification, detection, and segmentation tasks on visual data.

    C++c-plus-pluscomputer-visiondeep-learning
  • punkpeye/awesome-mcp-servers

    punkpeye/awesome-mcp-servers

    81,101GitHubView on GitHub↗

    This project serves as a centralized directory and interoperability hub for the Model Context Protocol, providing a curated collection of standardized service connectors that bridge artificial intelligence models with external software, databases, and APIs. It facilitates the integration of AI agents with diverse ecosy

    Delivers standardized interfaces for agents to control desktop environments, manage windows, and simulate user input.

    aimcp
  • hacksider/Deep-Live-Cam

    hacksider/Deep-Live-Cam

    79,568GitHubView on GitHub↗

    Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi

    Executes deep learning models directly on hardware-specific providers to minimize latency.

    Pythonaiai-deep-fakeai-face
  • modelcontextprotocol/servers

    modelcontextprotocol/servers

    79,000GitHubView on GitHub↗

    The Model Context Protocol is a standardized communication framework designed to connect language models to external data sources, functional tools, and interactive user interfaces. It provides a vendor-neutral interface layer that enables AI hosts to discover and execute capabilities across heterogeneous service envir

    Creates a unified interface layer that enables seamless interaction between diverse AI clients and backend service providers.

    TypeScript
  • browser-use/browser-use

    browser-use/browser-use

    78,576GitHubView on GitHub↗

    Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows

    Manages settings and parameters for integrating specific generative AI models into browser-based automation workflows.

    Pythonai-agentsai-toolsbrowser-automation
  • nomic-ai/gpt4all

    nomic-ai/gpt4all

    77,146GitHubView on GitHub↗

    GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh

    Executes quantized language models using optimized C++ tensor computation libraries for local CPU and GPU hardware.

    C++ai-chatllm-inference
  • zed-industries/zed

    zed-industries/zed

    75,634GitHubView on GitHub↗

    Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactori

    Bridge local development sessions with remote cloud-based intelligence providers to access sophisticated code completion and analysis capabilities.

    Rustgpuirust-langtext-editor
  • twitter/the-algorithm

    twitter/the-algorithm

    72,764GitHubView on GitHub↗

    The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver

    Deploys predictive models to score content relevance and user engagement probabilities in real-time.

    Scala
  • CompVis/stable-diffusion

    CompVis/stable-diffusion

    72,380GitHubView on GitHub↗

    Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im

    Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.

    Jupyter Notebook
  • josephmisiti/awesome-machine-learning

    josephmisiti/awesome-machine-learning

    71,702GitHubView on GitHub↗

    This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco

    Facilitates collaborative model training and analytics across decentralized data sources using unified architectural frameworks.

    Python
  • PaddlePaddle/PaddleOCR

    PaddlePaddle/PaddleOCR

    70,931GitHubView on GitHub↗

    PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen

    Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.

    Pythonai4sciencechineseocrdocument-parsing
  • vllm-project/vllm

    vllm-project/vllm

    70,745GitHubView on GitHub↗

    vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen

    Maximizes token generation speed and memory efficiency when serving large language models to multiple concurrent users.

    Pythonamdblackwellcuda
  • dair-ai/Prompt-Engineering-Guide

    dair-ai/Prompt-Engineering-Guide

    70,526GitHubView on GitHub↗

    This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task

    Reviews high-performance infrastructure solutions designed to minimize latency and maximize throughput for model inference.

    MDXagentagentsai-agents
  • binary-husky/gpt_academic

    binary-husky/gpt_academic

    70,112GitHubView on GitHub↗

    This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th

    Manage and serve large language models on local hardware through a unified, web-based interface.

    Pythonacademicchatglm-6bchatgpt
  • OpenHands/OpenHands

    OpenHands/OpenHands

    67,974GitHubView on GitHub↗

    OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system

    Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.

    Pythonagentartificial-intelligencechatgpt
  • hiyouga/LlamaFactory

    hiyouga/LlamaFactory

    67,386GitHubView on GitHub↗

    LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro

    Exposes trained models via standardized network protocols to facilitate scalable and reliable prediction services.

    Pythonagentaideepseek
  • xtekky/gpt4free

    xtekky/gpt4free

    65,720GitHubView on GitHub↗

    This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co

    Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.

    Pythonchatbotchatbotschatgpt
Prev12Next

Explore sub-tags

  • Engines, Runtimes & Servers7 sub-tags
  • Inference Engines8 sub-tagsRuntime environments designed to execute pre-trained neural network models with optimized performance and efficiency.
  • Inference Optimization6 sub-tagsTechniques and configurations that enhance model execution speed, reduce memory usage, and improve computational efficiency during inference.
  • Local AI Deployment Platforms
2 sub-tags
Platforms for deploying and managing language model interfaces and data processing tasks on local hardware.
  • Model Integration & Pipelines4 sub-tags
  • Request Routing & Gateways6 sub-tags
  • Runtime Interfaces & Orchestration4 sub-tags