awesome-repositories.comBlog

© 2026 Bringes Technology SRL·VAT RO45896025·hello@awesome-repositories.com

MCP Blog Curated searches Sitemap Privacy Terms

Exllama | Awesome Repository

turboderpexllama

0

View on GitHub↗

2,924 stars·222 forks·Python·MIT·0 views

Exllama

Features

Inference and Serving - Memory-efficient implementation for running quantized Llama models.
Inference Engines - Memory-efficient inference implementation optimized for GPU execution.

AI search

Explore more awesome repositories

Describe what you need in plain English — the AI ranks thousands of curated open-source projects by relevance.

Start searching with AI

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.