# nvlabs/tiny-cuda-nn

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nvlabs-tiny-cuda-nn).**

4,418 stars · 547 forks · C++ · other

## Links

- GitHub: https://github.com/NVlabs/tiny-cuda-nn
- awesome-repositories: https://awesome-repositories.com/repository/nvlabs-tiny-cuda-nn.md

## Topics

`cuda` `deep-learning` `gpu` `mlp` `nerf` `neural-network` `pytorch` `real-time` `rendering`

## Description

This project is a high-performance C++ and CUDA neural network library designed for fast training and inference of small networks on NVIDIA GPUs. It serves as a specialized backend for neural radiance fields and coordinate-based networks, providing a fused GPU kernel library and a hash grid encoder for transforming raw input dimensions into high-dimensional representations.

The library distinguishes itself through the use of C++ template metaprogramming and fused-kernel execution, which merge neural network layers into single GPU device functions to eliminate memory bottlenecks. It leverages tensor-core accelerated GEMM for high-throughput linear algebra and implements multiresolution hash encoding and spherical harmonic encoding to capture fine spatial and angular details.

The system covers a broad range of capabilities including 3D scene reconstruction, signed distance function implementation, and path radiance caching. It includes a comprehensive suite of training tools for weight optimization and loss calculation, as well as utilities for environment lighting approximation and material decomposition.

Low-level CUDA implementations and fast multilayer perceptrons are exposed as extensions for use within Python environments via a PyTorch C++ extension.

## Tags

### Artificial Intelligence & ML

- [CUDA Accelerated Neural Networks](https://awesome-repositories.com/f/artificial-intelligence-ml/cuda-accelerated-neural-networks.md) — Ships a high-performance C++ and CUDA library for training and inference of small neural networks on NVIDIA GPUs.
- [Coordinate-Based Neural Mappings](https://awesome-repositories.com/f/artificial-intelligence-ml/coordinate-based-neural-mappings.md) — Trains neural networks that map spatial coordinates to output values for 3D scene reconstruction and signed distance functions.
- [3D Reconstruction](https://awesome-repositories.com/f/artificial-intelligence-ml/foundation-models/3d-reconstruction.md) — Optimizes topology, materials, and lighting from multi-view images to produce triangle meshes with PBR textures. ([source](https://nvlabs.github.io/nvdiffrec/))
- [Small MLP Inference Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference/small-mlp-inference-accelerators.md) — Provides fused GPU kernels optimized for low-latency inference of compact multi-layer perceptrons.
- [Neural Network Layer Fusers](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations/kernel-composition-frameworks/fused-gpu-kernel-composition/neural-network-layer-fusers.md) — Implements fused GPU kernels that merge neural network layers into single device functions for reduced memory traffic.
- [Neural Network Layer Fusions](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-kernel-implementations/kernel-composition-frameworks/fused-gpu-kernel-composition/neural-network-layer-fusions.md) — Provides fused GPU kernels that merge neural network layers into single device functions for high throughput.
- [Just-In-Time Kernel Compilers](https://awesome-repositories.com/f/artificial-intelligence-ml/just-in-time-kernel-compilers.md) — Compiles fully fused forward and backward kernels just-in-time to adapt to specific network architectures. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/CMakeLists.txt))
- [Neural Network Kernel Compilers](https://awesome-repositories.com/f/artificial-intelligence-ml/just-in-time-kernel-compilers/neural-network-kernel-compilers.md) — Compiles fully fused forward and backward kernels at runtime for adaptive network architectures.
- [JIT Kernel Fusion Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/training-acceleration-engines/jit-kernel-fusion-accelerators.md) — Compiles and fuses network operations at runtime to achieve 1.5x to 2.5x performance gains on compatible GPUs. ([source](https://github.com/NVlabs/tiny-cuda-nn#readme))
- [Fully Fused MLP Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training/layer-specific-training/fully-fused-mlp-training.md) — Trains small multi-layer perceptrons with hidden layers restricted to sizes 16, 32, 64, or 128 for maximum GPU throughput. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Network Inference Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/network-inference-execution.md) — Runs inference on trained small neural networks with low latency using optimized GPU kernels. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/CITATION.cff))
- [Neural Network Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-training-frameworks.md) — Trains fully fused multi-layer perceptrons on GPU with configurable input encodings, loss functions, and optimizers. ([source](https://github.com/NVlabs/tiny-cuda-nn#readme))
- [CUTLASS MLP Training](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-network-training-frameworks/cutlass-mlp-training.md) — Trains multi-layer perceptrons with arbitrary hidden and output neuron counts using CUTLASS GEMM routines. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Spatial Coordinate Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/positional-encodings/spatial-coordinate-encodings.md) — Maps input coordinates to a multi-resolution hash table for efficient feature encoding in neural fields. ([source](https://nvlabs.github.io/nvdiffrec/assets/bib.txt))
- [PyTorch Kernel Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-backends/pytorch-tensor-interoperabilities/pytorch-kernel-integrations.md) — Calls trained PyTorch models from Python with automatic FP16 precision for fast MLP operations. ([source](https://github.com/NVlabs/tiny-cuda-nn#readme))
- [L1 Pixel Loss](https://awesome-repositories.com/f/artificial-intelligence-ml/adversarial-loss-functions/l1-pixel-loss.md) — Computes the standard L1 loss between network predictions and targets. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Input Encoding Combinations](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures/encoder-combiner-architectures/input-encoding-combinations.md) — Combines multiple input encodings by applying a different encoding to each subset of input dimensions. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Frequency Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures/encoder-combiner-architectures/input-encoding-combinations/frequency-encodings.md) — Transforms each input dimension into sine and cosine pairs at logarithmically spaced frequencies. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [OneBlob Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures/encoder-combiner-architectures/input-encoding-combinations/oneblob-encodings.md) — Encodes each input dimension into a set of bins using a quartic kernel for accurate fitting with limited dynamic range. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [TriangleWave Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/encoder-decoder-architectures/encoder-combiner-architectures/input-encoding-combinations/trianglewave-encodings.md) — Encodes each input dimension using a cheap-to-compute triangle wave at multiple frequencies. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Loss Function Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/loss-function-implementations.md) — Computes the standard L2 loss between network predictions and targets. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Adam Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/optimization-algorithms/adam-optimizers.md) — Trains network parameters using the Adam optimizer with optional AdaBound generalization. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Neural Network Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/optimization-algorithms/neural-network-optimizers.md) — Trains networks using Adam, SGD, Novograd, or Shampoo optimizers with configurable loss functions. ([source](https://github.com/NVlabs/tiny-cuda-nn#readme))
- [Cross-Entropy Loss Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/loss-function-calculators/binary-cross-entropy-calculators/cross-entropy-loss-functions.md) — Computes standard cross entropy loss for probability density function predictions. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
- [Stochastic Gradient Descent](https://awesome-repositories.com/f/artificial-intelligence-ml/stochastic-gradient-descent.md) — Trains network parameters using standard stochastic gradient descent optimization. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))

### Development Tools & Productivity

- [PyTorch Bindings](https://awesome-repositories.com/f/development-tools-productivity/compilers-toolchains/c-extension-interfaces/c-extension-development/pytorch-bindings.md) — Exposes low-level CUDA implementations as Python-callable PyTorch extensions for integration.
- [Template Metaprogramming](https://awesome-repositories.com/f/development-tools-productivity/build-tooling/build-time-tooling/template-metaprogramming.md) — Uses C++ template metaprogramming to generate specialized kernel code at compile time for optimal performance.

### Software Engineering & Architecture

- [High-Dimensional Spatial Encodings](https://awesome-repositories.com/f/software-engineering-architecture/hash-tables/high-dimensional-spatial-encodings.md) — Transforms low-dimensional inputs into high-dimensional feature vectors using multiresolution hash grids. ([source](https://github.com/NVlabs/tiny-cuda-nn#readme))
- [Multiresolution Hash Encoders](https://awesome-repositories.com/f/software-engineering-architecture/hash-tables/multiresolution-hash-encoders.md) — Provides a multiresolution hash table encoder that transforms raw input coordinates into high-dimensional feature vectors.
- [Grid Backends](https://awesome-repositories.com/f/software-engineering-architecture/hash-tables/multiresolution-hash-encoders/grid-backends.md) — Supports multiple grid backends including hash tables, dense storage, and tiled storage for flexible input encoding.

### Part of an Awesome List

- [GPU-Accelerated Backends](https://awesome-repositories.com/f/awesome-lists/ai/neural-radiance-field-implementations/gpu-accelerated-backends.md) — Serves as a specialized GPU-accelerated backend for neural radiance fields with hash grid encoding.
- [Material Decompositions](https://awesome-repositories.com/f/awesome-lists/ai/scene-reconstruction/material-decompositions.md) — Separates reconstructed 3D models into spatially-varying PBR materials and HDR environment lighting. ([source](https://nvlabs.github.io/nvdiffrec/))
- [Multi-View Scene Reconstructions](https://awesome-repositories.com/f/awesome-lists/ai/scene-reconstruction/multi-view-scene-reconstructions.md) — Optimizes topology, materials, and lighting from multi-view images to produce a triangle mesh with PBR textures and environment lighting.
- [Tensor Core Matrix Multiplications](https://awesome-repositories.com/f/awesome-lists/ai/tensor-core-optimization/tensor-core-programming-frameworks/tensor-core-matrix-multiplications.md) — Leverages NVIDIA tensor cores for high-throughput matrix multiplication in fully connected layers.

### User Interface & Experience

- [Spherical Harmonic Encoders](https://awesome-repositories.com/f/user-interface-experience/color-spaces/spherical-harmonic-encoders.md) — Encodes 3D direction vectors into spherical harmonic coefficients for frequency-space representation. ([source](https://github.com/NVlabs/tiny-cuda-nn/blob/master/DOCUMENTATION.md))
