1 مستودع
Initializes a device buffer to negative zero to set up a Lamport synchronization primitive used in multi-GPU inference.
Distinct from Buffer Initialization Operations: Distinct from Buffer Initialization Operations: specifically sets up a Lamport synchronization primitive, not general buffer filling.
Explore 1 awesome GitHub repository matching data & databases · Lamport Synchronization Primitive Initializers. Refine with filters or upvote what's useful.
FlashInfer is a library of high-performance GPU kernels purpose-built for accelerating large language model inference. It provides optimized implementations for attention operations (including flash attention, page attention, multi-head latent attention, and cascade attention) using paged key-value caches, fused kernel composition, and just-in-time compilation. The library also includes specialized kernels for mixture-of-experts layers, block-scaled low-precision quantization (FP8, FP4), and distributed collective communication. What distinguishes FlashInfer is its fused all-reduce communicat
Initializes device buffers for Lamport synchronization primitives in multi-GPU inference.