Why is openvinotoolkit/openvino a recommended Remote Model Loading GitHub Repositories repository?

Retrieves AI models directly from cloud storage using URI paths and authentication credentials.

Why is setzer22/llama-rs a recommended Remote Model Loading GitHub Repositories repository?

Retrieves model-specific vocabulary and merge rules from external hubs for consistent encoding.

Why is kubeflow/kfserving a recommended Remote Model Loading GitHub Repositories repository?

Fetches model artifacts from S3, GCS, Azure Blob, or Hugging Face Hub for deployment.

Why is pytorch/serve a recommended Remote Model Loading GitHub Repositories repository?

Supports downloading and registering model archives directly from public HTTP links or cloud storage URLs.

Why is zml/zml a recommended Remote Model Loading GitHub Repositories repository?

Downloads model weights and configurations from cloud buckets and HTTPS endpoints.

5 Repos

Awesome GitHub RepositoriesRemote Model Loading

Capabilities for loading AI models directly from cloud-native object storage or remote repositories.

Distinct from Cloud Storage: Focuses on the loading of ML models for inference, not general cloud storage management.

Explore 5 awesome GitHub repositories matching devops & infrastructure · Remote Model Loading. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

openvinotoolkit/openvino
openvinotoolkit/openvino
10,414Auf GitHub ansehen
OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specialized generative AI runtime for large language models. The project distinguishes itself through a plugin-based hardware acceleration layer that maps neural network operations to vendor-specific drivers. It features advanced execution mechanisms such as continuous batching, speculative decoding, and
Retrieves AI models directly from cloud storage using URI paths and authentication credentials.
C++aicomputer-visiondeep-learning
Auf GitHub ansehen10,414
setzer22/llama-rs
setzer22/llama-rs
6,150Auf GitHub ansehen
llama-rs ist eine Inferenz-Engine für lokale Large Language Models, die in Rust implementiert ist. Sie ermöglicht die Ausführung von Modellberechnungen auf lokaler Hardware, um Textantworten aus Benutzer-Prompts zu generieren. Das Projekt nutzt Rust-basierte Tensor-Operationen und Direct-Memory-Modell-Mapping, um High-Performance-Lineare-Algebra und effizientes Laden von Gewichten zu handhaben. Es integriert Weight-Quantization, um den Speicherbedarf von Modellen durch Konvertierung hochpräziser Gewichte in kleinere Formate zu reduzieren. Das System enthält ein Kommandozeilen-Interface für interaktive Chat-Sitzungen und einmalige Prompts, zusammen mit Datei-basierter Sitzungspersistenz, um Konversationsverläufe zu speichern und wiederherzustellen. Es bietet zudem Utilities zum Abrufen von Tokenizer-Konfigurationen von Remote-Hubs sowie Tools zur Berechnung von Perplexity-Scores, um die Modellleistung zu evaluieren.
Retrieves model-specific vocabulary and merge rules from external hubs for consistent encoding.
Rust
Auf GitHub ansehen6,150
kubeflow/kfserving
kubeflow/kfserving
5,576Auf GitHub ansehen
KServe is an open platform for deploying and serving generative and predictive AI models on Kubernetes. It defines inference services as custom resources with declarative YAML specifications, enabling a Kubernetes-native approach to model deployment and lifecycle management. The platform leverages Knative-based serverless scaling for automatic scale-to-zero and revision management, and supports a pluggable serving runtime architecture that maps model formats to containerized execution environments. KServe distinguishes itself through model-aware autoscaling that scales replicas based on token
Fetches model artifacts from S3, GCS, Azure Blob, or Hugging Face Hub for deployment.
Go
Auf GitHub ansehen5,576
pytorch/serve
pytorch/serve
4,354Auf GitHub ansehen
Dieses Projekt ist ein PyTorch-Framework für das Model-Serving, das darauf ausgelegt ist, Machine-Learning-Modelle in der Produktion über skalierbare Netzwerk-Endpunkte bereitzustellen. Es fungiert als leistungsstarker Inference-Server, Optimierer und Modell-Lifecycle-Manager, der das Laden von Modellen, Request-Batching und Hardware-Beschleunigung übernimmt. Das System zeichnet sich durch fortschrittliche Orchestrierungs- und Optimierungsfunktionen aus, wie etwa das Verketten mehrerer Modelle zu sequenziellen Workflows mittels Ausführungsgraphen und den Einsatz von Dynamic Batching zur Verbesserung von Durchsatz und Latenz. Es bietet spezialisierte Unterstützung für generative KI und Large Language Models durch Continuous Batching und Tensor-Parallelität. Zu den breiten Funktionsbereichen gehören GPU-Ressourcenmanagement für diverse Hardware wie NVIDIA, AMD und Apple Silicon sowie ein umfassendes Lifecycle-Management für Registrierung, Versionierung und Worker-Skalierung. Zudem integriert es Observability-Tools zur Überwachung des Systemzustands und der Modellleistung über Prometheus-kompatible Metriken. Der Server wird über eine Kommandozeilenschnittstelle verwaltet, die zur Steuerung des Lifecycles und zur Konfiguration von Laufzeitparametern dient.
Supports downloading and registering model archives directly from public HTTP links or cloud storage URLs.
Java
Auf GitHub ansehen4,354
zml/zml
zml/zml
3,171Auf GitHub ansehen
zml is a machine learning model compiler and cross-platform inference engine that transforms model descriptions into optimized executable binaries for specific hardware accelerators. It functions as a model deployment toolkit and hardware-agnostic orchestrator, utilizing a tensor-based architecture definition to provide strong type checking during the compilation process. The project distinguishes itself through the ability to shard tensors and distribute large-scale AI workloads across a logical mesh of multiple devices. It further supports the remote model lifecycle by authenticating and do
Downloads model weights and configurations from cloud buckets and HTTPS endpoints.
Zigaibazelhpc
Auf GitHub ansehen3,171

Awesome Remote Model Loading GitHub Repositories

openvinotoolkit/openvino

setzer22/llama-rs

kubeflow/kfserving

pytorch/serve

zml/zml

Unter-Tags erkunden