20 Repos
Techniques for pruning, quantizing, and optimizing transformer models for efficiency.
Explore 20 awesome GitHub repositories matching part of an awesome list · Model Compression. Refine with filters or upvote what's useful.
This is a collection of our NAS and Vision Transformer work.
Compresses vision transformers using weight multiplexing.
The official implementation of the EMNLP 2023 paper LLM-FP4
4-bit floating-point quantization for large language models.
Data-independent pruning for hierarchical vision transformers.
Pushes binary vision transformers toward convolutional performance.
Combines token pruning and squeezing for aggressive compression.