1 repo
Methods and layers for reducing model weight precision to optimize memory usage and inference performance.
Distinguishing note: Focuses specifically on weight quantization for model optimization, distinct from general model training or architecture design.
Explore 1 awesome GitHub repository matching artificial intelligence & ml · Quantization Techniques. Refine with filters or upvote what's useful.
Whisper.cpp is a high-performance, local-first speech recognition engine designed to run large-scale machine learning models on consumer hardware. It functions as a portable library that converts audio into text, supporting both static file transcription and real-time stream processing. By utilizing a lightweight inference engine and weight quantization, the project minimizes memory and compute overhead, allowing for efficient execution without reliance on external cloud APIs or internet connectivity. The project distinguishes itself through a hardware-agnostic compute abstraction that offloa
Reduces model memory footprint and increases inference speed by converting high-precision floating-point weights into lower-bit integer representations.