1 repo

Awesome GitHub RepositoriesInference Optimization Kernels

Specialized computational kernels designed to accelerate the token generation and decoding phases of large language models.

Distinguishing note: Focuses specifically on low-level kernel optimization for inference speed, distinct from general model training or high-level API wrappers.

Explore 1 awesome GitHub repository matching artificial intelligence & ml · Inference Optimization Kernels. Refine with filters or upvote what's useful.

Find the best repos with AI.We'll search the best matching repositories with AI.

microsoft/BitNet
microsoft/BitNet
28,521View on GitHub
BitNet is a quantized inference engine designed to execute highly compressed language models by performing arithmetic on low-precision, bit-level weight data. It functions as a model optimization toolkit and a high-performance kernel library, enabling the execution of large language models on consumer hardware by reducing memory footprints and increasing processing speeds. The project distinguishes itself through hardware-specific kernel optimizations that leverage native processor instructions to accelerate matrix multiplication. By utilizing packed integer arithmetic and memory-aligned weig
Decode tokens using optimized kernels that reduce processing delays during the autoregressive generation phase of highly compressed language models.
Python
28,521View on GitHub