←Backturboderp/exllama0Copy as MarkdownView on GitHub↗2,924 stars·222 forks·Python·MIT·0 viewsExllamaFeaturesInference and Serving - Memory-efficient implementation for running quantized Llama models.Inference Engines - Memory-efficient inference implementation optimized for GPU execution.