# nvidia/cuda-samples

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nvidia-cuda-samples).**

9,319 stars · 2,366 forks · C++ · NOASSERTION

## Links

- GitHub: https://github.com/NVIDIA/cuda-samples
- awesome-repositories: https://awesome-repositories.com/repository/nvidia-cuda-samples.md

## Topics

`cuda` `cuda-driver-api` `cuda-kernels` `cuda-opengl`

## Description

This repository is a collection of reference implementations and programming examples for the CUDA Toolkit. It serves as a GPGPU implementation guide and a parallel computing reference, providing code for using graphics hardware to perform general-purpose calculations and high-performance parallel processing.

The project provides specific samples for GPU kernel development and resource management. These include demonstrations of multi-GPU communication, peer-to-peer memory access, and system hardware inspection to coordinate distributed GPU resources.

The codebase covers a wide range of capabilities, including GPU memory management, performance optimization through execution graphs, and the integration of domain-specific libraries for linear algebra and image processing. It also demonstrates interoperability between compute contexts and graphics APIs to combine rendering and processing tasks.

## Tags

### Operating Systems & Systems Programming

- [GPU Kernel Offloading](https://awesome-repositories.com/f/operating-systems-systems-programming/gpu-kernel-offloading.md) — Demonstrates the compilation and launch of specialized C-style functions from the host CPU onto the GPU device.
- [Direct Memory Access](https://awesome-repositories.com/f/operating-systems-systems-programming/direct-memory-access.md) — Provides examples of moving data between system RAM and GPU VRAM using high-speed buses to minimize processing latency.
- [GPU Memory Allocators](https://awesome-repositories.com/f/operating-systems-systems-programming/kernel-core-internals/process-and-memory-management/memory-management/allocation-strategies/dynamic-memory-allocation/gpu-memory-allocators.md) — Provides utilities for allocating and moving data between CPU and GPU buffers using direct memory access. ([source](https://nvidia.github.io/cuda-python/cuda-core/latest/))
- [SIMT Execution Models](https://awesome-repositories.com/f/operating-systems-systems-programming/simt-execution-models.md) — Implements the SIMT model to run the same instruction across multiple threads for parallel processing of large datasets.

### Programming Languages & Runtimes

- [GPU Kernel Programming](https://awesome-repositories.com/f/programming-languages-runtimes/gpu-kernel-programming.md) — Serves as a primary reference for writing and executing parallel compute kernels on GPU hardware.
- [GPU Kernel Launch Configurations](https://awesome-repositories.com/f/programming-languages-runtimes/kernel-launch-context-setups/gpu-kernel-launch-configurations.md) — Provides examples for executing source code on the graphics processor using high-level interfaces and custom launch configurations. ([source](https://nvidia.github.io/cuda-python/cuda-core/latest/))
- [GPU Parallelism Strategies](https://awesome-repositories.com/f/programming-languages-runtimes/parallel-computing-implementation/gpu-parallelism-strategies.md) — Implements advanced parallelism using cooperative groups and execution graphs to optimize GPU workload distribution. ([source](https://github.com/nvidia/cuda-samples#readme))

### Artificial Intelligence & ML

- [CUDA-Accelerated Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-libraries/cuda-accelerated-libraries.md) — Demonstrates the integration of specialized CUDA libraries for linear algebra, Fourier transforms, and image processing.
- [Cooperative Thread Groups](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/cooperative-tile-processors/cooperative-thread-groups.md) — Provides implementations for coordinating sets of threads to synchronize memory access and manage collective workloads.

### Education & Learning Resources

- [CUDA Programming Examples](https://awesome-repositories.com/f/education-learning-resources/cuda-programming-examples.md) — Provides a comprehensive collection of reference implementations for GPU-accelerated computing using the CUDA Toolkit.
- [GPGPU Implementation Guides](https://awesome-repositories.com/f/education-learning-resources/gpgpu-implementation-guides.md) — Acts as a practical guide for implementing general-purpose calculations on graphics hardware via CUDA.

### Networking & Communication

- [GPU Peer-to-Peer Memory Access](https://awesome-repositories.com/f/networking-communication/peer-to-peer-tunneling/gpu-peer-to-peer-memory-access.md) — Provides implementations allowing multiple GPUs to read and write to each others memory without routing data through the CPU.
- [Multi-GPU Resource Coordination](https://awesome-repositories.com/f/networking-communication/peer-to-peer-tunneling/multi-gpu-resource-coordination.md) — Demonstrates how to coordinate multi-GPU communication and distribute workloads across multiple devices.

### Scientific & Mathematical Computing

- [GPU-Accelerated Computation](https://awesome-repositories.com/f/scientific-mathematical-computing/gpu-accelerated-computation.md) — Provides reference implementations for writing and executing parallel programs that accelerate complex calculations on GPU hardware. ([source](https://github.com/nvidia/cuda-samples#readme))
- [High-Performance and Parallel Computing](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing.md) — Offers a reference for implementing high-performance parallel processing, kernel execution, and thread coordination.
- [Unified Memory Systems](https://awesome-repositories.com/f/scientific-mathematical-computing/unified-memory-systems.md) — Creates a single virtual memory space shared between the CPU and GPU to simplify data movement.
- [CUDA Domain Library Examples](https://awesome-repositories.com/f/scientific-mathematical-computing/cuda-domain-library-examples.md) — Provides reference code for applying domain-specific libraries for linear algebra and image processing.
- [GPU Linear Algebra Libraries](https://awesome-repositories.com/f/scientific-mathematical-computing/gpu-linear-algebra-libraries.md) — Utilizes specialized GPU-optimized libraries for linear algebra and image processing to solve complex mathematical problems. ([source](https://github.com/nvidia/cuda-samples#readme))

### Graphics & Multimedia

- [Graphics Resource Sharing](https://awesome-repositories.com/f/graphics-multimedia/external-data-integrators/graphics-resource-integrators/graphics-resource-sharing.md) — Provides examples of sharing memory and texture data across compute contexts and graphics APIs. ([source](https://nvidia.github.io/cuda-python/cuda-core/latest/))
- [GPU Resource Management](https://awesome-repositories.com/f/graphics-multimedia/gpu-resource-management.md) — Provides samples for managing the lifecycle and allocation of GPU resources and hardware inspection.

### Hardware & IoT

- [GPU & Performance](https://awesome-repositories.com/f/hardware-iot/integration-performance/gpu-performance.md) — Implements techniques to measure memory bandwidth and apply execution strategies to increase graphics hardware processing speed. ([source](https://github.com/nvidia/cuda-samples#readme))
- [Performance Optimization Samples](https://awesome-repositories.com/f/hardware-iot/integration-performance/gpu-performance/performance-optimization-samples.md) — Includes demonstrations for measuring memory bandwidth and using execution graphs to optimize performance.

### Software Engineering & Architecture

- [Static Task Graphs](https://awesome-repositories.com/f/software-engineering-architecture/execution-graphs/graph-evaluation-scheduling/static-task-graphs.md) — Implements sequences of GPU tasks as static graphs to reduce the overhead of repeated kernel launches.
- [Graph Execution Compilers](https://awesome-repositories.com/f/software-engineering-architecture/execution-graphs/graph-execution-compilers.md) — Compiles and executes captured operation graphs as executable objects to optimize GPU task scheduling. ([source](https://nvidia.github.io/cuda-python/cuda-core/latest/))

### System Administration & Monitoring

- [Hardware Inspection](https://awesome-repositories.com/f/system-administration-monitoring/graphics-hardware-monitors/hardware-inspection.md) — Provides tools to retrieve technical specifications about available hardware devices and the current execution environment. ([source](https://nvidia.github.io/cuda-python/cuda-core/latest/))

### Part of an Awesome List

- [Development Utilities](https://awesome-repositories.com/f/awesome-lists/devtools/development-utilities.md) — Sample code for testing CUDA integration within the subsystem.
