# defilantech/llmkube

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/defilantech-llmkube).**

130 stars · 19 forks · Go · Apache-2.0

## Links

- GitHub: https://github.com/defilantech/LLMKube
- Homepage: https://llmkube.com
- awesome-repositories: https://awesome-repositories.com/repository/defilantech-llmkube.md

## Description

Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready

## Tags

### Part of an Awesome List

- [Model Serving Engines](https://awesome-repositories.com/f/awesome-lists/ai/model-serving-engines.md) — Kubernetes operator for managing multi-runtime LLM inference.
