Why is dusty-nv/jetson-inference a recommended Multi-Framework Model Serving GitHub Repositories repository?

Serves models from multiple frameworks across diverse hardware accelerators and CPUs using optimized configurations.

Why is kserve/kserve a recommended Multi-Framework Model Serving GitHub Repositories repository?

Supports serving models from TensorFlow, PyTorch, Scikit-Learn, XGBoost, ONNX, and Hugging Face with standardized inference protocols.

Why is kubeflow/kfserving a recommended Multi-Framework Model Serving GitHub Repositories repository?

Runs exported models from TensorFlow, PyTorch, Scikit-learn, XGBoost, and others behind a unified inference endpoint.

Why is sakurallm/sakurallm a recommended Multi-Framework Model Serving GitHub Repositories repository?

Loads full-precision models using the vLLM backend with PagedAttention and tensor parallel multi-GPU acceleration.

Why is snowkylin/tensorflow-handbook a recommended Multi-Framework Model Serving GitHub Repositories repository?

Explains how to load specific model versions and automatically update to the latest deployment version.

Why is vllm-project/vllm-omni a recommended Multi-Framework Model Serving GitHub Repositories repository?

Serves as a high-throughput runtime for omni-modal models using vLLM's PagedAttention and tensor parallelism.

6 Repos

Awesome GitHub RepositoriesMulti-Framework Model Serving

Hosting models from diverse deep learning frameworks across varied hardware accelerators.

Distinct from Model Serving Frameworks: Specifically addresses the ability to serve models from multiple different frameworks simultaneously.

Explore 6 awesome GitHub repositories matching artificial intelligence & ml · Multi-Framework Model Serving. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

dusty-nv/jetson-inference
dusty-nv/jetson-inference
8,734Auf GitHub ansehen
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Serves models from multiple frameworks across diverse hardware accelerators and CPUs using optimized configurations.
C++caffecomputer-visiondeep-learning
Auf GitHub ansehen8,734
kserve/kserve
kserve/kserve
5,576Auf GitHub ansehen
KServe is a Kubernetes-native platform for deploying and serving machine learning models as scalable inference services. It supports both generative AI models, including large language models, and traditional predictive models from frameworks such as TensorFlow, PyTorch, Scikit-Learn, XGBoost, and ONNX. The platform manages the full lifecycle of model deployments, including revision tracking, canary rollouts, A/B testing, and automatic rollbacks, and provides serverless scale-to-zero capabilities for cost-efficient resource management. KServe distinguishes itself through a standardized infere
Supports serving models from TensorFlow, PyTorch, Scikit-Learn, XGBoost, ONNX, and Hugging Face with standardized inference protocols.
Go
Auf GitHub ansehen5,576
kubeflow/kfserving
kubeflow/kfserving
5,576Auf GitHub ansehen
KServe is an open platform for deploying and serving generative and predictive AI models on Kubernetes. It defines inference services as custom resources with declarative YAML specifications, enabling a Kubernetes-native approach to model deployment and lifecycle management. The platform leverages Knative-based serverless scaling for automatic scale-to-zero and revision management, and supports a pluggable serving runtime architecture that maps model formats to containerized execution environments. KServe distinguishes itself through model-aware autoscaling that scales replicas based on token
Runs exported models from TensorFlow, PyTorch, Scikit-learn, XGBoost, and others behind a unified inference endpoint.
Go
Auf GitHub ansehen5,576
sakurallm/sakurallm
SakuraLLM/SakuraLLM
4,618Auf GitHub ansehen
SakuraLLM is a multi-format document translation system that hosts large language models for translating Japanese text into other languages. It functions as an inference server that exposes translation models through an OpenAI-compatible API, allowing any tool supporting the OpenAI client format to send translation requests. The system is designed as a glossary-aware translation engine that applies user-defined term dictionaries to ensure consistent translation of proper nouns and names across outputs. The project distinguishes itself by supporting multiple high-performance inference backends
Loads full-precision models using the vLLM backend with PagedAttention and tensor parallel multi-GPU acceleration.
Python
Auf GitHub ansehen4,618
snowkylin/tensorflow-handbook
snowkylin/tensorflow-handbook
3,927Auf GitHub ansehen
Dieses Projekt ist eine umfassende Bildungsressource und ein Tutorial-Handbuch für das Erstellen, Trainieren und Bereitstellen von Machine-Learning-Modellen mit TensorFlow 2. Es dient als strukturierter Lernleitfaden für grundlegende Deep-Learning-Konzepte, einschließlich neuronaler Netzwerkarchitekturen, automatischer Differenzierung und Tensor-Operationen. Das Handbuch bietet technische Anleitungen zur Optimierung der Ausführungseffizienz durch GPU-Speicherverwaltung, verteiltes Training und Modellquantisierung. Es enthält zudem detaillierte Anleitungen für den Aufbau leistungsfähiger Datenpipelines und den Export von Modellen für Produktionsserver, mobile Geräte und Webbrowser. Das Material deckt ein breites Spektrum an Funktionen ab, darunter die Modellentwicklung mit konvolutionellen und rekurrenten Netzwerken, die Implementierung benutzerdefinierter Verlustfunktionen und Layer sowie die Nutzung vortrainierter Modelle für Transfer Learning. Zudem werden Bereitstellungsstrategien für Edge-Geräte und die Nutzung cloudbasierter Runtimes zur Hardwarebeschleunigung behandelt. Die Ressource ist als Sammlung von Jupyter Notebooks implementiert.
Explains how to load specific model versions and automatically update to the latest deployment version.
Jupyter Notebook
Auf GitHub ansehen3,927
vllm-project/vllm-omni
vllm-project/vllm-omni
2,776Auf GitHub ansehen
vllm-omni is a high-throughput serving engine and distributed inference framework designed for omni-modal models. It serves as a multi-modal model API server capable of generating text, image, video, and audio data, providing a standardized interface for remote client access. The system features a non-autoregressive generation engine for parallel media production and a robot policy inference server that acts as a real-time communication bridge to robotic hardware using specialized protocols. It supports hybrid execution models that combine sequential token generation with parallelized media g
Serves as a high-throughput runtime for omni-modal models using vLLM's PagedAttention and tensor parallelism.
Pythonaudio-generationdiffusionimage-generation
Auf GitHub ansehen2,776

Awesome Multi-Framework Model Serving GitHub Repositories

dusty-nv/jetson-inference

kserve/kserve

kubeflow/kfserving

SakuraLLM/SakuraLLM

snowkylin/tensorflow-handbook

vllm-project/vllm-omni

Unter-Tags erkunden