LLMKube

Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready

LLMKube | Awesome Repository

defilantechLLMKube

Features