# nvidia/warp

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nvidia-warp).**

6,233 stars · 443 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/NVIDIA/warp
- Homepage: https://nvidia.github.io/warp/
- awesome-repositories: https://awesome-repositories.com/repository/nvidia-warp.md

## Topics

`cuda` `differentiable-programming` `gpu` `gpu-acceleration` `nvidia` `nvidia-warp` `python`

## Description

Warp is a Python framework that JIT-compiles Python functions into CUDA kernels for GPU-accelerated parallel computation, with built-in automatic differentiation and multi-framework array interoperability. At its core, it provides a GPU kernel compilation system that enables writing and executing custom GPU kernels directly from Python, while supporting automatic gradient computation through those kernels for integration with machine learning pipelines. The framework also includes tile-based cooperative computing, where thread blocks partition into tiles for shared-memory and tensor-core operations, and a block-sparse matrix engine for GPU-accelerated linear algebra.

What distinguishes Warp is its deep interoperability with major deep learning frameworks. It exchanges GPU arrays with PyTorch, JAX, Paddle, and NumPy via DLPack and CUDA array interfaces without copying data, and can wrap Warp kernels as callable primitives inside JAX JIT-compiled functions or PyTorch autograd graphs. The framework supports CUDA graph capture and replay for low-overhead repeated execution, multi-device orchestration across multiple GPUs, and spatial acceleration structures like BVH and hash grids for collision detection and ray casting. It also provides debugging and diagnostics tools including kernel code stepping, assertion checking, and integration with CUDA Compute Sanitizer.

The framework covers GPU-accelerated physics simulation, finite element method on GPU, geometry querying on meshes and bounding volume hierarchies, and hardware-accelerated texture sampling. It includes mathematical operations on scalars, vectors, matrices, quaternions, and spatial transforms, as well as random number generation and volume data handling. Warp can be installed via conda, built from source, or integrated as an Omniverse extension, and supports Docker container execution with the NVIDIA Container Toolkit.

## Tags

### Programming Languages & Runtimes

- [Python GPU Kernels](https://awesome-repositories.com/f/programming-languages-runtimes/compiler-interpreter-internals/compiler-infrastructure/jit-kernel-compilers/python-gpu-kernels.md) — JIT-compiles Python functions into CUDA kernels for GPU-accelerated parallel computation.
- [Kernel Debugging Tools](https://awesome-repositories.com/f/programming-languages-runtimes/compiler-interpreter-internals/compiler-infrastructure/jit-kernel-compilers/python-gpu-kernels/kernel-debugging-tools.md) — Provides runtime checks, kernel code stepping, and CUDA Compute Sanitizer integration for debugging GPU kernels.
- [Tile Pass-by-Reference Semantics](https://awesome-repositories.com/f/programming-languages-runtimes/function-argument-passing/tile-pass-by-reference-semantics.md) — Allows tile parameters in user-defined functions to be modified in place, matching Python's mutable-object semantics. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))

### Artificial Intelligence & ML

- [Automatic Differentiation](https://awesome-repositories.com/f/artificial-intelligence-ml/automatic-differentiation.md) — Wires custom vector-Jacobian products around Warp kernels for automatic differentiation in JAX. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [Asynchronous Kernel Launchers](https://awesome-repositories.com/f/artificial-intelligence-ml/kernel-schedulers/asynchronous-kernel-launchers.md) — Schedules GPU kernel execution without blocking the host thread, enabling concurrent CPU and GPU work. ([source](https://nvidia.github.io/warp/deep_dive/concurrency.html))
- [Custom Autograd Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/model-construction/automatic-differentiation-systems/functional-autograd/custom-autograd-functions.md) — Wraps Warp kernel launches inside custom autograd functions for gradient flow through both frameworks. ([source](https://nvidia.github.io/warp/user_guide/interoperability_pytorch.html))
- [PyTorch Tensor Interoperabilities](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-backends/pytorch-tensor-interoperabilities.md) — Converts Warp arrays to and from PyTorch tensors without copying data. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [DLPack Array Exchanges](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-type-conversion/framework-array-conversions/dlpack-array-exchanges.md) — Imports and exports arrays with any DLPack-compatible framework without copying data. ([source](https://nvidia.github.io/warp/user_guide/interoperability.html))
- [Cooperative Tile Operations](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/cooperative-tile-operations.md) — Performs cooperative parallel operations on GPU thread-block tiles including sorting, scanning, and matrix multiplication. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))
- [Cooperative Tile Processors](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/cooperative-tile-processors.md) — Performs cooperative parallel operations on thread-block tiles including matrix multiply, FFT, sorting, and reductions.
- [Compute Graph Captures](https://awesome-repositories.com/f/artificial-intelligence-ml/compute-graph-builders/compute-graph-captures.md) — Records sequences of Warp and PyTorch operations into CUDA graphs for low-overhead repeated execution.
- [Cross-Framework Graph Captures](https://awesome-repositories.com/f/artificial-intelligence-ml/compute-graph-builders/compute-graph-captures/cross-framework-graph-captures.md) — Records a sequence of Warp and PyTorch kernel launches into a single CUDA graph for repeated replay. ([source](https://nvidia.github.io/warp/user_guide/interoperability_pytorch.html))
- [CUDA Graph Array Allocations](https://awesome-repositories.com/f/artificial-intelligence-ml/compute-graph-builders/compute-graph-captures/cuda-graph-array-allocations.md) — Records array creation as part of a CUDA graph capture so the graph allocates temporary storage when replayed. ([source](https://nvidia.github.io/warp/deep_dive/allocators.html))
- [Multi-Device Kernel Launches](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations/multi-device-kernel-launches.md) — Launches independent kernels on different CUDA devices simultaneously to parallelize sub-tasks across GPUs. ([source](https://nvidia.github.io/warp/deep_dive/concurrency.html))
- [Kernel Caching Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/kernel-optimizations/kernel-caching-systems.md) — Stores compiled GPU kernels between application runs to skip recompilation on subsequent launches. ([source](https://nvidia.github.io/warp/api_reference/warp_config.html))
- [Multi-GPU Distribution](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/inference-deployment/model-deployment-toolkits/distributed-deployment-utilities/multi-gpu-distribution.md) — Uses JAX's shard_map to run Warp kernels on sharded arrays across multiple GPUs. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [Paddle Tensor Conversions](https://awesome-repositories.com/f/artificial-intelligence-ml/tensor-type-conversion/framework-array-conversions/paddle-tensor-conversions.md) — Converts Warp arrays to and from Paddle tensors without copying data, including gradient arrays. ([source](https://nvidia.github.io/warp/user_guide/interoperability.html))
- [Tile Reductions](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/cooperative-tile-operations/tile-reductions.md) — Computes the sum, minimum, maximum, or other aggregate of all elements in a tile cooperatively across a block. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))
- [Tile Triangular System Solvers](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/cooperative-tile-operations/tile-triangular-system-solvers.md) — Solves lower or upper triangular matrix equations on tile data using cooperative block-wide operations. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))
- [Tile Reductions](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/cooperative-tile-processors/tile-reductions.md) — Replaces per-thread atomic accumulation with a cooperative block-wide reduction to improve memory bandwidth utilization. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))

### Part of an Awesome List

- [GPU Kernel Differentiators](https://awesome-repositories.com/f/awesome-lists/ai/differentiable-programming/gpu-kernel-differentiators.md) — Computes gradients through GPU kernel code, enabling gradient-based optimization for simulation and machine learning. ([source](https://nvidia.github.io/warp/stable/user_guide/faq.html))
- [Tile-Based Kernel Differentiators](https://awesome-repositories.com/f/awesome-lists/ai/differentiable-programming/gpu-kernel-differentiators/tile-based-kernel-differentiators.md) — Generates backward-mode gradient computations for tile-based kernels, supporting in-place addition and subtraction. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))
- [Array Converters](https://awesome-repositories.com/f/awesome-lists/ai/jax-frameworks/array-converters.md) — Provides zero-copy conversion of GPU arrays between Warp and JAX for seamless multi-framework workflows. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [JAX Array Interoperabilities](https://awesome-repositories.com/f/awesome-lists/ai/jax-frameworks/jax-array-interoperabilities.md) — Converts Warp arrays to and from JAX arrays and creates JAX callbacks from Warp kernels. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [Kernel Primitives](https://awesome-repositories.com/f/awesome-lists/ai/jax-frameworks/kernel-primitives.md) — Enables Warp GPU kernels to be called directly inside JAX JIT-compiled functions for accelerated computation. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [Finite Element Assemblers](https://awesome-repositories.com/f/awesome-lists/ai/physics-and-pde-solvers/finite-element-assemblers.md) — Defines a geometry, function space, integration domain, and linear/bilinear forms to assemble and solve a system of equations. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [Physics Simulation](https://awesome-repositories.com/f/awesome-lists/ai/physics-simulation.md) — Runs large-scale physics simulations directly on GPU with automatic differentiation for gradient-based optimization.
- [PyTorch Data Sharings](https://awesome-repositories.com/f/awesome-lists/ai/pytorch-frameworks/pytorch-data-sharings.md) — Converts arrays between Warp and PyTorch without copying data, preserving gradient information. ([source](https://nvidia.github.io/warp/user_guide/interoperability_pytorch.html))
- [JAX Array Conversions](https://awesome-repositories.com/f/awesome-lists/ai/jax-frameworks/jax-array-conversions.md) — Converts JAX arrays into Warp arrays using the DLPack protocol for zero-copy data exchange. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [JAX Callable Wrappers](https://awesome-repositories.com/f/awesome-lists/ai/jax-frameworks/jax-callable-wrappers.md) — Wraps Python functions that launch multiple Warp kernels as JAX callables for JIT-compiled code. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [Vectorized Mapping Behavior Controls](https://awesome-repositories.com/f/awesome-lists/ai/jax-frameworks/vectorized-mapping-behavior-controls.md) — Specifies how Warp kernel callbacks transform under JAX's vmap transformation with broadcast or sequential options. ([source](https://nvidia.github.io/warp/user_guide/interoperability_jax.html))
- [Batched Environment Simulators](https://awesome-repositories.com/f/awesome-lists/ai/simulation-environments/batched-environment-simulators.md) — Represents topologically disconnected simulation environments within a single geometry for batched or reinforcement-learning workloads. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [GPU Kernel Runtime Checks](https://awesome-repositories.com/f/awesome-lists/devtools/development-and-debugging/gpu-kernel-runtime-checks.md) — Activates runtime checks for array access, CUDA errors, floating-point values, and gradient computation to catch correctness issues during development. ([source](https://nvidia.github.io/warp/api_reference/warp_config.html))

### Data & Databases

- [GPU Device Synchronization](https://awesome-repositories.com/f/data-databases/data-synchronization/cross-device-synchronization-engines/cross-device-operation-execution/gpu-device-synchronization.md) — Synchronize kernel launches and memory copies automatically, requiring manual coordination only for asynchronous CPU transfers. ([source](https://nvidia.github.io/warp/stable/user_guide/faq.html))
- [Zero-Copy Array Slicing](https://awesome-repositories.com/f/data-databases/immutable-array-updates/zero-copy-array-slicing.md) — Converts Warp arrays to and from any DLPack-compatible array without copying data. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [DLPack Protocols](https://awesome-repositories.com/f/data-databases/serialization-frameworks/zero-copy/dlpack-protocols.md) — Exchanges GPU arrays between frameworks via DLPack without copying memory for seamless interop.
- [GPU Framework Data Exchanges](https://awesome-repositories.com/f/data-databases/shared-memory-data-exchange/gpu-framework-data-exchanges.md) — Exchanges GPU array data with PyTorch and JAX through the CUDA array interface without copying. ([source](https://nvidia.github.io/warp/stable/user_guide/faq.html))
- [Mesh and BVH Queries](https://awesome-repositories.com/f/data-databases/data-querying/spatial-querying/mesh-and-bvh-queries.md) — Finds closest points, intersections, and ray hits on meshes and bounding volume hierarchies. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))
- [Tile-Based Spatial Queries](https://awesome-repositories.com/f/data-databases/data-querying/spatial-querying/mesh-and-bvh-queries/tile-based-spatial-queries.md) — Performs axis-aligned bounding box or ray queries against a BVH or mesh within a tile, returning results cooperatively. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))
- [GPU Array Atomics](https://awesome-repositories.com/f/data-databases/key-value-stores/atomic-key-updaters/atomic-list-updaters/gpu-array-atomics.md) — Atomically reads, modifies, and writes values in GPU arrays for thread-safe concurrent updates. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))

### DevOps & Infrastructure

- [GPU Kernel Function Wrappers](https://awesome-repositories.com/f/devops-infrastructure/function-as-a-service-platforms/gpu-kernel-function-wrappers.md) — Wraps Python functions as Warp-compatible callables for use in GPU kernels. ([source](https://nvidia.github.io/warp/api_reference/warp_utils.html))
- [Math Library Accelerators](https://awesome-repositories.com/f/devops-infrastructure/gpu-acceleration-libraries/math-library-accelerators.md) — Switches on GPU-accelerated implementations for FFT, matrix multiply, and solver operations using cuFFTDx, cuBLASDx, and cuSolverDx. ([source](https://nvidia.github.io/warp/api_reference/warp_config.html))
- [Optimizer Integrations](https://awesome-repositories.com/f/devops-infrastructure/model-conversion/pytorch/optimizer-integrations.md) — Feeds Warp arrays to PyTorch optimizers like Adam by converting them to tensors for gradient updates. ([source](https://nvidia.github.io/warp/user_guide/interoperability_pytorch.html))

### Graphics & Multimedia

- [GPU-Accelerated Physics Simulations](https://awesome-repositories.com/f/graphics-multimedia/particle-physics-simulations/gpu-accelerated-physics-simulations.md) — Runs large-scale physics simulations such as particle systems with gravitational attraction directly on the GPU using custom kernel functions. ([source](https://cdn.jsdelivr.net/gh/nvidia/warp@main/README.md))
- [Spatial Field Evaluators](https://awesome-repositories.com/f/graphics-multimedia/arbitrary-domain-evaluators/spatial-field-evaluators.md) — Looks up field values at arbitrary sample points using spatial queries and acceleration structures. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [Displacement Field Deformations](https://awesome-repositories.com/f/graphics-multimedia/curve-flattening/displacement-field-deformations.md) — Deforms a base geometry with a displacement field to create a curved variant for higher-order finite element analysis. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [Field Couplings](https://awesome-repositories.com/f/graphics-multimedia/graphics-engines-rendering/3d-math-and-geometry-toolkits/geometry-primitives/field-couplings.md) — Wraps a field defined on one geometry for integration over a different geometry using position-based mapping. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [3D Matrix Builders](https://awesome-repositories.com/f/graphics-multimedia/graphics-engines-rendering/rendering/coordinate-viewport-transformations/affine-transformation-matrices/3d-matrix-builders.md) — Builds, decomposes, and applies 4x4 transformation matrices from position, rotation, and scale components. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))
- [Texture Sampling](https://awesome-repositories.com/f/graphics-multimedia/layer-based-animations/texture-sampling.md) — Loads and samples 1D, 2D, and 3D textures with hardware acceleration on the GPU. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [Volumetric Mesh Extraction](https://awesome-repositories.com/f/graphics-multimedia/mesh-processing-tools/volumetric-mesh-extraction.md) — Generates triangle meshes from volumetric data using a reusable marching cubes context on the GPU. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [Spatial Hierarchy Accelerators](https://awesome-repositories.com/f/graphics-multimedia/spatial-hierarchy-accelerators.md) — Uses BVH and hash grids to accelerate collision detection and ray casting queries on the GPU.

### Operating Systems & Systems Programming

- [GPU Stream Scheduling](https://awesome-repositories.com/f/operating-systems-systems-programming/gpu-stream-scheduling.md) — Groups GPU operations into ordered sequences that execute concurrently on the same device to overlap compute and data transfers. ([source](https://nvidia.github.io/warp/deep_dive/concurrency.html))
- [Remote GPU Memory Access](https://awesome-repositories.com/f/operating-systems-systems-programming/remote-gpu-memory-access.md) — Permits one GPU to directly read or write memory allocated in another GPU's pool for accelerated cross-device transfers. ([source](https://nvidia.github.io/warp/deep_dive/allocators.html))

### Scientific & Mathematical Computing

- [Tile-Based Matrix Multiplications](https://awesome-repositories.com/f/scientific-mathematical-computing/generalized-matrix-multiplications/tile-based-matrix-multiplications.md) — Performs a general matrix multiply (GEMM) on tile operands, leveraging shared memory and Tensor Cores when available. ([source](https://nvidia.github.io/warp/user_guide/tiles.html))
- [Multi-Dimensional Arrays](https://awesome-repositories.com/f/scientific-mathematical-computing/multi-dimensional-arrays.md) — Creates, clones, and manipulates fixed-size multi-dimensional arrays on CPU or CUDA devices. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [GPU Kernel Differentiators](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/automatic-differentiation/gpu-kernel-differentiators.md) — Provides automatic differentiation through GPU kernel code for integration with PyTorch, JAX, and Paddle pipelines. ([source](https://cdn.jsdelivr.net/gh/nvidia/warp@main/README.md))
- [Adaptive Sparse Grids](https://awesome-repositories.com/f/scientific-mathematical-computing/data-modeling-processing/computational-graphs/graph-based-computational-execution/deep-learning-execution/sparse-voxel-operations/adaptive-sparse-grids.md) — Constructs a sparse grid with power-of-two voxel scales for adaptive resolution in finite element simulations. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [Finite Element Field Exporters](https://awesome-repositories.com/f/scientific-mathematical-computing/finite-element-field-exporters.md) — Generates VTK-compatible cell types and node indices from a function space for viewing in external tools. ([source](https://nvidia.github.io/warp/domain_modules/fem.html))
- [Sparse Matrix Multiplications](https://awesome-repositories.com/f/scientific-mathematical-computing/generalized-matrix-multiplications/sparse-matrix-multiplications.md) — Performs GPU-accelerated matrix-matrix multiplication on BSR matrices with configurable scaling and accumulation. ([source](https://nvidia.github.io/warp/api_reference/warp_sparse.html))
- [GPU Finite Element Solvers](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/physics-simulations/structural-finite-element-analysis/gpu-finite-element-solvers.md) — Ships a finite element method solver that assembles and solves PDE systems entirely on GPU.
- [Spatial Vector and Matrix Operations](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/scientific-computing-platforms/scientific-computing/matrix-operations/matrix-vector-products/spatial-vector-and-matrix-operations.md) — Performs operations on 6D screw vectors and spatial inertia matrices for rigid-body dynamics. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))
- [Matrix Transposition Kernels](https://awesome-repositories.com/f/scientific-mathematical-computing/matrix-transposition-kernels.md) — Provides GPU-accelerated transposition of block-sparse matrices as part of its linear algebra engine. ([source](https://nvidia.github.io/warp/api_reference/warp_sparse.html))
- [Block-Sparse Engines](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/linear-algebra/sparse-linear-algebra-routines/block-sparse-engines.md) — Ships a block-sparse matrix engine for efficient GPU-accelerated linear algebra operations.
- [NumPy Array Integration](https://awesome-repositories.com/f/scientific-mathematical-computing/numpy-array-integration.md) — Converts Warp arrays to and from NumPy arrays without copying data. ([source](https://nvidia.github.io/warp/api_reference/warp.html))
- [Scalar Mathematical Utilities](https://awesome-repositories.com/f/scientific-mathematical-computing/scalar-mathematical-utilities.md) — Computes common mathematical functions such as trigonometry, exponentiation, and rounding on scalar values. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))
- [Block-Sparse Matrix Arithmetic](https://awesome-repositories.com/f/scientific-mathematical-computing/sparse-matrix-storage/block-sparse-matrix-arithmetic.md) — Accumulates block-sparse matrices with scaling, reusing persistent GPU work buffers across calls. ([source](https://nvidia.github.io/warp/api_reference/warp_sparse.html))
- [Sparse Matrix-Vector Multiplications](https://awesome-repositories.com/f/scientific-mathematical-computing/sparse-matrix-storage/block-sparse-matrix-arithmetic/sparse-matrix-vector-multiplications.md) — Computes a sparse matrix-vector product on the GPU, scaling and accumulating the result. ([source](https://nvidia.github.io/warp/api_reference/warp_sparse.html))
- [Block-Sparse Matrix Builders](https://awesome-repositories.com/f/scientific-mathematical-computing/sparse-matrix-storage/block-sparse-matrix-builders.md) — Builds block-sparse matrices from coordinate triplets with arbitrary block sizes on the GPU. ([source](https://nvidia.github.io/warp/api_reference/warp_sparse.html))

### System Administration & Monitoring

- [GPU Orchestrators](https://awesome-repositories.com/f/system-administration-monitoring/multi-device-management/gpu-orchestrators.md) — Orchestrate array allocation, kernel launches, and data copies across multiple GPUs within a single process using device aliases. ([source](https://nvidia.github.io/warp/stable/user_guide/faq.html))
- [GPU Kernel Profilers](https://awesome-repositories.com/f/system-administration-monitoring/execution-time-profilers/gpu-kernel-profilers.md) — Records and reports the duration of individual CUDA operations such as kernel launches, memory copies, and memset calls. ([source](https://nvidia.github.io/warp/deep_dive/profiling.html))

### Development Tools & Productivity

- [Compilation Strategy Selectors](https://awesome-repositories.com/f/development-tools-productivity/build-tooling/build-orchestration-logic/build-orchestration-configuration/build-configuration-systems/compiler-configurations/compilation-setting-configuration/gpu-kernel-compilation-settings/compilation-strategy-selectors.md) — Sets the compilation strategy for GPU kernels, choosing between eager, lazy, or other modes to balance startup time and runtime performance. ([source](https://nvidia.github.io/warp/api_reference/warp_config.html))
- [GPU Math Type Constructors](https://awesome-repositories.com/f/development-tools-productivity/type-definition-generators/type-customization-extensions/gpu-math-type-constructors.md) — Provides constructors for GPU math types like vectors, matrices, and quaternions used in physics and geometry kernels. ([source](https://nvidia.github.io/warp/api_reference/warp_types.html))

### Security & Cryptography

- [Construction and Decomposition](https://awesome-repositories.com/f/security-cryptography/security/operations-and-incident-response/quaternion-rotation-utilities/construction-and-decomposition.md) — Constructs, decomposes, and interpolates quaternions for representing 3D rotations in GPU kernels. ([source](https://nvidia.github.io/warp/language_reference/builtins.html))

### Software Engineering & Architecture

- [GPU](https://awesome-repositories.com/f/software-engineering-architecture/sorting-algorithms/radix-sorts/gpu.md) — Sorts key-value pairs in parallel using GPU-accelerated radix sort for efficient ordering. ([source](https://nvidia.github.io/warp/api_reference/warp_utils.html))