14 repos

Awesome GitHub RepositoriesInference Servers and Runtimes

Explore 14 awesome GitHub repositories matching artificial intelligence & ml · Inference Servers and Runtimes. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

tensorflow/tensorflow
tensorflow/tensorflow
193,864GitHubView on GitHub
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Deploys models into production environments to handle scalable requests while maintaining consistent inference latency.
C++deep-learningdeep-neural-networksdistributed
huggingface/transformers
huggingface/transformers
156,730GitHubView on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
Exports models into a portable format with ahead-of-time memory planning and hardware-specific operation dispatch for edge device inference.
Pythonaudiodeep-learningdeepseek
Comfy-Org/ComfyUI
Comfy-Org/ComfyUI
103,654GitHubView on GitHub
ComfyUI is a node-based generative AI orchestration engine designed for constructing, testing, and executing complex image and video synthesis pipelines. By utilizing a directed acyclic graph execution model, the platform allows users to build reproducible workflows through modular, interconnected processing blocks wit
Serves visual, node-based generative pipelines as programmable API endpoints for integration into external software.
Pythonaicomfycomfyui
deepseek-ai/DeepSeek-V3
deepseek-ai/DeepSeek-V3
101,631GitHubView on GitHub
DeepSeek-V3 is a large language model that provides comprehensive resources for model utilization, including technical specifications, pre-trained weights, and evaluation benchmarks. The project details the core transformer architecture, including parameter counts and multi-token prediction modules, while supporting na
Handles high-performance serving through multi-machine tensor parallelism and mixed-precision execution for large-scale language models.
Python
ggml-org/llama.cpp
ggml-org/llama.cpp
95,400GitHubView on GitHub
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU archi
Executes large language models locally on standard consumer hardware with high performance.
C++ggml
hacksider/Deep-Live-Cam
hacksider/Deep-Live-Cam
79,568GitHubView on GitHub
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi
Optimizes generative models for low-latency, real-time inference on consumer-grade hardware.
Pythonaiai-deep-fakeai-face
nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Delivers a cross-platform execution environment for running large language models locally on consumer hardware.
C++ai-chatllm-inference
mlabonne/llm-course
mlabonne/llm-course
75,340GitHubView on GitHub
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we
Architectural patterns for scaling model inference range from simple local setups to complex multi-GPU cluster configurations.
courselarge-language-modelsllm
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Facilitates the deployment of text extraction models as scalable services across various hardware environments.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Scales large language model inference to handle high volumes of concurrent requests with minimal latency.
Pythonamdblackwellcuda
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Wraps model execution in a web-accessible interface to provide consistent endpoints for client-side requests.
Pythonagentaideepseek
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Executes model checkpoints locally with configurable parameters like sequence length and batch size to optimize performance.
Python
ultralytics/yolov5
ultralytics/yolov5
56,830GitHubView on GitHub
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning
Executes high-speed visual inference using hardware-accelerated processing and test-time augmentation.
Pythoncoremldeep-learningios
tensorflow/tfjs-examples
tensorflow/tfjs-examples
6,783GitHubView on GitHub
This repository provides a collection of practical demonstrations and implementation guides for machine learning tasks using TensorFlow.js. It serves as a resource for developers to explore model architectures, training workflows, and data manipulation techniques across domains such as computer vision, natural language
Low-level interfaces allow for precise weight initialization and the construction of custom model architectures using granular tensor operations.
JavaScript

Explore sub-tags

14 repos

Awesome GitHub RepositoriesInference Servers and Runtimes

Explore 14 awesome GitHub repositories matching artificial intelligence & ml · Inference Servers and Runtimes. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

tensorflow/tensorflow
tensorflow/tensorflow
193,864GitHubView on GitHub
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Deploys models into production environments to handle scalable requests while maintaining consistent inference latency.
C++deep-learningdeep-neural-networksdistributed
huggingface/transformers
huggingface/transformers
156,730GitHubView on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
Exports models into a portable format with ahead-of-time memory planning and hardware-specific operation dispatch for edge device inference.
Pythonaudiodeep-learningdeepseek
Comfy-Org/ComfyUI
Comfy-Org/ComfyUI
103,654GitHubView on GitHub
ComfyUI is a node-based generative AI orchestration engine designed for constructing, testing, and executing complex image and video synthesis pipelines. By utilizing a directed acyclic graph execution model, the platform allows users to build reproducible workflows through modular, interconnected processing blocks wit
Serves visual, node-based generative pipelines as programmable API endpoints for integration into external software.
Pythonaicomfycomfyui
deepseek-ai/DeepSeek-V3
deepseek-ai/DeepSeek-V3
101,631GitHubView on GitHub
DeepSeek-V3 is a large language model that provides comprehensive resources for model utilization, including technical specifications, pre-trained weights, and evaluation benchmarks. The project details the core transformer architecture, including parameter counts and multi-token prediction modules, while supporting na
Handles high-performance serving through multi-machine tensor parallelism and mixed-precision execution for large-scale language models.
Python
ggml-org/llama.cpp
ggml-org/llama.cpp
95,400GitHubView on GitHub
Llama.cpp is an inference engine designed for the local execution of text-based and multimodal language models on consumer hardware. It provides a core environment for running models that process both text and image inputs, utilizing hardware-accelerated backends to optimize performance across diverse CPU and GPU archi
Executes large language models locally on standard consumer hardware with high performance.
C++ggml
hacksider/Deep-Live-Cam
hacksider/Deep-Live-Cam
79,568GitHubView on GitHub
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi
Optimizes generative models for low-latency, real-time inference on consumer-grade hardware.
Pythonaiai-deep-fakeai-face
nomic-ai/gpt4all
nomic-ai/gpt4all
77,146GitHubView on GitHub
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Delivers a cross-platform execution environment for running large language models locally on consumer hardware.
C++ai-chatllm-inference
mlabonne/llm-course
mlabonne/llm-course
75,340GitHubView on GitHub
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we
Architectural patterns for scaling model inference range from simple local setups to complex multi-GPU cluster configurations.
courselarge-language-modelsllm
PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Facilitates the deployment of text extraction models as scalable services across various hardware environments.
Pythonai4sciencechineseocrdocument-parsing
vllm-project/vllm
vllm-project/vllm
70,745GitHubView on GitHub
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Scales large language model inference to handle high volumes of concurrent requests with minimal latency.
Pythonamdblackwellcuda
hiyouga/LlamaFactory
hiyouga/LlamaFactory
67,386GitHubView on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Wraps model execution in a web-accessible interface to provide consistent endpoints for client-side requests.
Pythonagentaideepseek
meta-llama/llama
meta-llama/llama
59,157GitHubView on GitHub
Llama is a computational framework and runtime environment designed for executing transformer-based neural networks locally. It functions as a generative AI inference engine, enabling the processing of input sequences through pre-trained model weights to produce text completions and structured data outputs directly on
Executes model checkpoints locally with configurable parameters like sequence length and batch size to optimize performance.
Python
ultralytics/yolov5
ultralytics/yolov5
56,830GitHubView on GitHub
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning
Executes high-speed visual inference using hardware-accelerated processing and test-time augmentation.
Pythoncoremldeep-learningios
tensorflow/tfjs-examples
tensorflow/tfjs-examples
6,783GitHubView on GitHub
This repository provides a collection of practical demonstrations and implementation guides for machine learning tasks using TensorFlow.js. It serves as a resource for developers to explore model architectures, training workflows, and data manipulation techniques across domains such as computer vision, natural language
Low-level interfaces allow for precise weight initialization and the construction of custom model architectures using granular tensor operations.
JavaScript

Awesome Inference Servers and Runtimes GitHub Repositories

tensorflow/tensorflow

huggingface/transformers

Comfy-Org/ComfyUI

deepseek-ai/DeepSeek-V3

ggml-org/llama.cpp

hacksider/Deep-Live-Cam

nomic-ai/gpt4all

mlabonne/llm-course

PaddlePaddle/PaddleOCR

vllm-project/vllm

hiyouga/LlamaFactory

meta-llama/llama

ultralytics/yolov5

tensorflow/tfjs-examples

Explore sub-tags

Awesome Inference Servers and Runtimes GitHub Repositories

tensorflow/tensorflow

huggingface/transformers

Comfy-Org/ComfyUI

deepseek-ai/DeepSeek-V3

ggml-org/llama.cpp

hacksider/Deep-Live-Cam

nomic-ai/gpt4all

mlabonne/llm-course

PaddlePaddle/PaddleOCR

vllm-project/vllm

hiyouga/LlamaFactory

meta-llama/llama

ultralytics/yolov5

tensorflow/tfjs-examples

Explore sub-tags