This repository is a collection of reference implementations and programming examples for the CUDA Toolkit. It serves as a GPGPU implementation guide and a parallel computing reference, providing code for using graphics hardware to perform general-purpose calculations and high-performance parallel processing. The project provides specific samples for GPU kernel development and resource management. These include demonstrations of multi-GPU communication, peer-to-peer memory access, and system hardware inspection to coordinate distributed GPU resources. The codebase covers a wide range of capa
rust-cuda is a GPU programming framework and device compiler that allows for the development and execution of high-performance kernels on NVIDIA hardware using Rust. It provides a driver wrapper to manage device memory allocation and kernel launching, effectively serving as a system for writing GPU compute logic without relying on C++. The project includes a compute library with hardware-optimized primitives for neural network acceleration and hardware-accelerated raytracing. It utilizes a compilation toolchain that translates source code into a low-level intermediate representation for execu
TileLang is a Python-embedded domain-specific language compiler that JIT-compiles and autotunes GPU kernels. It uses a tile-based DSL, automatic software pipelining, and parallel autotuning to generate optimized GPU kernels at runtime. It supports tensor core operations with Pythonic syntax, automatic memory management, and thread mapping. The compiler searches over tile sizes, thread counts, and scheduling policies, compiling and benchmarking candidates in parallel to find the fastest kernel. It also caches compiled binaries and tuning results to disk for reuse across sessions. TileLang inc
cuda-python provides low-level Python bindings for the CUDA Driver and Runtime APIs. It serves as a programmatic wrapper for controlling device memory, managing hardware toolchains, and orchestrating execution graphs on NVIDIA GPUs, allowing for the compilation and launching of parallel kernels directly from Python.
The main features of nvidia/cuda-python are: Python GPU Development, CUDA Driver Wrappers, Hardware Driver API Mappings, CUDA Driver API Integrations, Driver Runtime Integrations, Hardware Abstraction Layers, Device-Local Memory Layouts, Device Buffer Managers.
Open-source alternatives to nvidia/cuda-python include: nvidia/isaac-gr00t. nvidia/cuda-samples — This repository is a collection of reference implementations and programming examples for the CUDA Toolkit. It serves… rust-gpu/rust-cuda — rust-cuda is a GPU programming framework and device compiler that allows for the development and execution of… tile-ai/tilelang — TileLang is a Python-embedded domain-specific language compiler that JIT-compiles and autotunes GPU kernels. It uses a… pyo3/pyo3 — This project provides a framework for binding Rust and Python, enabling the creation of native extension modules and… answerdotai/gpu.cpp — gpu.cpp is a lightweight C++ library for executing low-level general-purpose GPU computation across different hardware…