brontoguanakrasis

0

Krasis

Krasis is an LLM runtime for running large MoE models on NVIDIA consumer GPUs. It is built around fast GPU prompt processing, GPU-executed decode, and HCS expert residency management so models much larger than VRAM can still run locally.

Features

Inference Engines - Hybrid runtime for running large models on limited VRAM.

Krasis

Features

Star history