4 مستودعات
Operations to fill GPU buffers with constant values or zero them out after allocation.
Distinct from GPU Buffer Allocators: Distinct from GPU Buffer Allocators: focuses on populating buffers with constant data after allocation, not the allocation itself.
Explore 4 awesome GitHub repositories matching data & databases · Buffer Initialization Operations. Refine with filters or upvote what's useful.
This project is a learning guide and collection of study notes designed to teach Node.js backend development. It provides a comprehensive core API reference and practical demonstrations for implementing server-side logic, network programming, and system APIs. The guide specifically covers advanced technical domains including process management for scaling applications via clusters and child processes, as well as network programming for building TCP, UDP, and HTTP services. It also includes detailed instructional material on security implementation, focusing on cryptographic hashing and encryp
Provides a utility to populate a buffer with a repeating value for initialization.
gfx هو تجريد لواجهة برمجة تطبيقات رسوميات لا يعتمد على العتاد يترجم مجموعة موحدة من أوامر الرسوميات والحوسبة إلى تعليمات أصلية لبرامج تشغيل GPU متعددة. يوفر واجهة مشتركة للعرض عبر المنصات وبرمجة حوسبة GPU للأغراض العامة. يتميز المشروع بنظام ترجمة تظليل تمثيلي وسيط يحول الكود المصدري وSPIR-V إلى لغات خاصة بالهدف. يستخدم إطار عمل اختبار مرجعي يعتمد على البيانات للتحقق من أن مخرجات الرسوميات تظل متسقة عبر منصات العتاد المختلفة. تشمل القدرات ترميز مخزن الأوامر المتوازي عبر خيوط متعددة وتغليف حالات خط الأنابيب في كائنات مفردة لتقليل تغييرات الحالة الزائدة. يدير النظام موارد GPU منخفضة المستوى، بما في ذلك تخصيص الذاكرة، وتعيين المخزن المؤقت غير المتزامن، وعرض الإطار الصريح عبر swapchains. يستهدف التنفيذ البيئات الأصلية ومتصفحات الويب من خلال WebAssembly، مما يوفر طبقات ترجمة لـ WebGL وWebGL2.
Automatically clears GPU buffer memory upon allocation to ensure consistent state across different hardware platforms.
TileLang is a Python-embedded domain-specific language compiler that JIT-compiles and autotunes GPU kernels. It uses a tile-based DSL, automatic software pipelining, and parallel autotuning to generate optimized GPU kernels at runtime. It supports tensor core operations with Pythonic syntax, automatic memory management, and thread mapping. The compiler searches over tile sizes, thread counts, and scheduling policies, compiling and benchmarking candidates in parallel to find the fastest kernel. It also caches compiled binaries and tuning results to disk for reuse across sessions. TileLang inc
TVM's feature to fill every element of a buffer with a specified constant value, including a dedicated operation to zero it out.
FlashInfer is a library of high-performance GPU kernels purpose-built for accelerating large language model inference. It provides optimized implementations for attention operations (including flash attention, page attention, multi-head latent attention, and cascade attention) using paged key-value caches, fused kernel composition, and just-in-time compilation. The library also includes specialized kernels for mixture-of-experts layers, block-scaled low-precision quantization (FP8, FP4), and distributed collective communication. What distinguishes FlashInfer is its fused all-reduce communicat
Initializes device buffers for distributed synchronization protocols in multi-GPU inference.