Krasis is an LLM runtime for running large MoE models on NVIDIA consumer GPUs. It is built around fast GPU prompt processing, GPU-executed decode, and HCS expert residency management so models much larger than VRAM can still run locally.
Features
Inference Engines - Hybrid runtime for running large models on limited VRAM.