32 repos

Awesome GitHub RepositoriesModel Inference and Serving

Platforms and techniques for deploying, optimizing, and serving machine learning models for production use.

Explore 32 awesome GitHub repositories matching artificial intelligence & ml · Model Inference and Serving. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

tensorflow/tensorflow
tensorflow/tensorflow
193,864GitHubView on GitHub
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Optimizes execution performance by setting specific model weights to zero through target-aware authoring and specialized kernels.
C++deep-learningdeep-neural-networksdistributed
huggingface/transformers
huggingface/transformers
156,730GitHubView on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
Extends standard library functionality with specialized loaders for device mapping, quantization, and custom attention backends.
Pythonaudiodeep-learningdeepseek
immich-app/immich
immich-app/immich
92,953GitHubView on GitHub
Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining ful
Processes machine learning tasks using externalized models and thread pools to optimize performance for image and text analysis.
TypeScriptbackup-toolfluttergoogle-photos
opencv/opencv
opencv/opencv
86,238GitHubView on GitHub
OpenCV is a comprehensive computer vision library designed for real-time performance and cross-platform deployment. It provides a native execution environment that leverages multi-threaded operations and automated memory management to handle intensive computational tasks, including image processing and machine learning
Executes pre-trained neural networks to perform classification, detection, and segmentation tasks on visual data.
C++c-plus-pluscomputer-visiondeep-learning
punkpeye/awesome-mcp-servers
punkpeye/awesome-mcp-servers
81,101GitHubView on GitHub
This project serves as a centralized directory and interoperability hub for the Model Context Protocol, providing a curated collection of standardized service connectors that bridge artificial intelligence models with external software, databases, and APIs. It facilitates the integration of AI agents with diverse ecosy
Delivers standardized interfaces for agents to control desktop environments, manage windows, and simulate user input.
aimcp
hacksider/Deep-Live-Cam
hacksider/Deep-Live-Cam
79,568GitHubView on GitHub
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi
Executes deep learning models directly on hardware-specific providers to minimize latency.
Pythonaiai-deep-fakeai-face
modelcontextprotocol/servers
modelcontextprotocol/servers
79,000GitHubView on GitHub
The Model Context Protocol is a standardized communication framework designed to connect language models to external data sources, functional tools, and interactive user interfaces. It provides a vendor-neutral interface layer that enables AI hosts to discover and execute capabilities across heterogeneous service envir
Creates a unified interface layer that enables seamless interaction between diverse AI clients and backend service providers.
TypeScript
browser-use/browser-use
browser-use/browser-use
78,576GitHubView on GitHub
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows
Manages settings and parameters for integrating specific generative AI models into browser-based automation workflows.
Pythonai-agentsai-toolsbrowser-automation
nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Executes quantized language models using optimized C++ tensor computation libraries for local CPU and GPU hardware.
C++ai-chatllm-inference
zed-industries/zed
zed-industries/zed
75,634GitHubView on GitHub
Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactori
Bridge local development sessions with remote cloud-based intelligence providers to access sophisticated code completion and analysis capabilities.
Rustgpuirust-langtext-editor
twitter/the-algorithm
twitter/the-algorithm
72,764GitHubView on GitHub
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Deploys predictive models to score content relevance and user engagement probabilities in real-time.
Scala
CompVis/stable-diffusion
CompVis/stable-diffusion
72,380GitHubView on GitHub
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im
Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.
Jupyter Notebook
josephmisiti/awesome-machine-learning
josephmisiti/awesome-machine-learning
71,702GitHubView on GitHub
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco
Facilitates collaborative model training and analytics across decentralized data sources using unified architectural frameworks.
Python
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Maximizes token generation speed and memory efficiency when serving large language models to multiple concurrent users.
Pythonamdblackwellcuda
dair-ai/Prompt-Engineering-Guide
dair-ai/Prompt-Engineering-Guide
70,526GitHubView on GitHub
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Reviews high-performance infrastructure solutions designed to minimize latency and maximize throughput for model inference.
MDXagentagentsai-agents
binary-husky/gpt_academic
binary-husky/gpt_academic
70,112GitHubView on GitHub
This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th
Manage and serve large language models on local hardware through a unified, web-based interface.
Pythonacademicchatglm-6bchatgpt
OpenHands/OpenHands
OpenHands/OpenHands
67,974GitHubView on GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.
Pythonagentartificial-intelligencechatgpt
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Exposes trained models via standardized network protocols to facilitate scalable and reliable prediction services.
Pythonagentaideepseek
xtekky/gpt4free
xtekky/gpt4free
65,720GitHubView on GitHub
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co
Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.
Pythonchatbotchatbotschatgpt

Explore sub-tags

32 repos

Awesome GitHub RepositoriesModel Inference and Serving

Platforms and techniques for deploying, optimizing, and serving machine learning models for production use.

Explore 32 awesome GitHub repositories matching artificial intelligence & ml · Model Inference and Serving. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

tensorflow/tensorflow
tensorflow/tensorflow
193,864GitHubView on GitHub
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Optimizes execution performance by setting specific model weights to zero through target-aware authoring and specialized kernels.
C++deep-learningdeep-neural-networksdistributed
huggingface/transformers
huggingface/transformers
156,730GitHubView on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
Extends standard library functionality with specialized loaders for device mapping, quantization, and custom attention backends.
Pythonaudiodeep-learningdeepseek
immich-app/immich
immich-app/immich
92,953GitHubView on GitHub
Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining ful
Processes machine learning tasks using externalized models and thread pools to optimize performance for image and text analysis.
TypeScriptbackup-toolfluttergoogle-photos
opencv/opencv
opencv/opencv
86,238GitHubView on GitHub
OpenCV is a comprehensive computer vision library designed for real-time performance and cross-platform deployment. It provides a native execution environment that leverages multi-threaded operations and automated memory management to handle intensive computational tasks, including image processing and machine learning
Executes pre-trained neural networks to perform classification, detection, and segmentation tasks on visual data.
C++c-plus-pluscomputer-visiondeep-learning
punkpeye/awesome-mcp-servers
punkpeye/awesome-mcp-servers
81,101GitHubView on GitHub
This project serves as a centralized directory and interoperability hub for the Model Context Protocol, providing a curated collection of standardized service connectors that bridge artificial intelligence models with external software, databases, and APIs. It facilitates the integration of AI agents with diverse ecosy
Delivers standardized interfaces for agents to control desktop environments, manage windows, and simulate user input.
aimcp
hacksider/Deep-Live-Cam
hacksider/Deep-Live-Cam
79,568GitHubView on GitHub
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi
Executes deep learning models directly on hardware-specific providers to minimize latency.
Pythonaiai-deep-fakeai-face
modelcontextprotocol/servers
modelcontextprotocol/servers
79,000GitHubView on GitHub
The Model Context Protocol is a standardized communication framework designed to connect language models to external data sources, functional tools, and interactive user interfaces. It provides a vendor-neutral interface layer that enables AI hosts to discover and execute capabilities across heterogeneous service envir
Creates a unified interface layer that enables seamless interaction between diverse AI clients and backend service providers.
TypeScript
browser-use/browser-use
browser-use/browser-use
78,576GitHubView on GitHub
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows
Manages settings and parameters for integrating specific generative AI models into browser-based automation workflows.
Pythonai-agentsai-toolsbrowser-automation
nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Executes quantized language models using optimized C++ tensor computation libraries for local CPU and GPU hardware.
C++ai-chatllm-inference
zed-industries/zed
zed-industries/zed
75,634GitHubView on GitHub
Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactori
Bridge local development sessions with remote cloud-based intelligence providers to access sophisticated code completion and analysis capabilities.
Rustgpuirust-langtext-editor
twitter/the-algorithm
twitter/the-algorithm
72,764GitHubView on GitHub
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Deploys predictive models to score content relevance and user engagement probabilities in real-time.
Scala
CompVis/stable-diffusion
CompVis/stable-diffusion
72,380GitHubView on GitHub
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im
Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.
Jupyter Notebook
josephmisiti/awesome-machine-learning
josephmisiti/awesome-machine-learning
71,702GitHubView on GitHub
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco
Facilitates collaborative model training and analytics across decentralized data sources using unified architectural frameworks.
Python
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Maximizes token generation speed and memory efficiency when serving large language models to multiple concurrent users.
Pythonamdblackwellcuda
dair-ai/Prompt-Engineering-Guide
dair-ai/Prompt-Engineering-Guide
70,526GitHubView on GitHub
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Reviews high-performance infrastructure solutions designed to minimize latency and maximize throughput for model inference.
MDXagentagentsai-agents
binary-husky/gpt_academic
binary-husky/gpt_academic
70,112GitHubView on GitHub
This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th
Manage and serve large language models on local hardware through a unified, web-based interface.
Pythonacademicchatglm-6bchatgpt
OpenHands/OpenHands
OpenHands/OpenHands
67,974GitHubView on GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.
Pythonagentartificial-intelligencechatgpt
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Exposes trained models via standardized network protocols to facilitate scalable and reliable prediction services.
Pythonagentaideepseek
xtekky/gpt4free
xtekky/gpt4free
65,720GitHubView on GitHub
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co
Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.
Pythonchatbotchatbotschatgpt

Awesome Model Inference and Serving GitHub Repositories

tensorflow/tensorflow

huggingface/transformers

immich-app/immich

opencv/opencv

punkpeye/awesome-mcp-servers

hacksider/Deep-Live-Cam

modelcontextprotocol/servers

browser-use/browser-use

nomic-ai/gpt4all

zed-industries/zed

twitter/the-algorithm

CompVis/stable-diffusion

josephmisiti/awesome-machine-learning

PaddlePaddle/PaddleOCR

vllm-project/vllm

dair-ai/Prompt-Engineering-Guide

binary-husky/gpt_academic

OpenHands/OpenHands

hiyouga/LlamaFactory

xtekky/gpt4free

Explore sub-tags

Awesome Model Inference and Serving GitHub Repositories

tensorflow/tensorflow

huggingface/transformers

immich-app/immich

opencv/opencv

punkpeye/awesome-mcp-servers

hacksider/Deep-Live-Cam

modelcontextprotocol/servers

browser-use/browser-use

nomic-ai/gpt4all

zed-industries/zed

twitter/the-algorithm

CompVis/stable-diffusion

josephmisiti/awesome-machine-learning

PaddlePaddle/PaddleOCR

vllm-project/vllm

dair-ai/Prompt-Engineering-Guide

binary-husky/gpt_academic

OpenHands/OpenHands

hiyouga/LlamaFactory

xtekky/gpt4free

Explore sub-tags