Tilelang | Awesome Repository

TileLang is a Python-embedded domain-specific language compiler that JIT-compiles and autotunes GPU kernels. It uses a tile-based DSL, automatic software pipelining, and parallel autotuning to generate optimized GPU kernels at runtime.

It supports tensor core operations with Pythonic syntax, automatic memory management, and thread mapping. The compiler searches over tile sizes, thread counts, and scheduling policies, compiling and benchmarking candidates in parallel to find the fastest kernel. It also caches compiled binaries and tuning results to disk for reuse across sessions.

TileLang includes optimizations for attention, convolution, and reduction operators, with multi-level tiling, software pipelining, and warp specialization. It manages memory across global, shared, and register levels, supports synchronization barriers, and provides debugging and diagnostic tools.

Features

Python GPU Kernels - Provides a Python-embedded DSL for writing high-performance GPU kernels with automatic thread mapping and shared memory management.
Autotuning GPU Kernel Compilers - Automatically searches over tile sizes and thread counts to optimize GPU kernel performance.
GPU Kernel DSL Compilers - Provides the core Pythonic DSL for structuring high-performance kernels with automatic memory management and thread mapping.

Features

Python GPU Kernels - Provides a Python-embedded DSL for writing high-performance GPU kernels with automatic thread mapping and shared memory management.
Autotuning GPU Kernel Compilers - Automatically searches over tile sizes and thread counts to optimize GPU kernel performance.
GPU Kernel DSL Compilers - Provides the core Pythonic DSL for structuring high-performance kernels with automatic memory management and thread mapping.