AMD Hardware Acceleration - Optimizes machine learning and AI computations by leveraging AMD Instinct and Radeon GPUs through an open-source platform.
GPU Acceleration - Runs machine learning, engineering, and scientific workloads on GFX9 and CDNA GPUs using the ROCm software stack.
Deep Learning Frameworks - Integrates with popular deep learning frameworks to enable accelerated training and inference on AMD GPUs.
Computation Abstraction Layers - Provides a unified runtime and compiler infrastructure mapping multiple programming models to AMD GPU hardware.
Matrix Fused Multiply-Add Engines - Performs matrix fused multiply-add on KxN matrices using mixed-precision inputs to accelerate ML and HPC workloads.
GDB-Based GPU Debuggers - Inspects and controls GPU-accelerated programs with a GDB-based debugger for AMD hardware.
GPU Health Monitors - Tracks and manages data center GPU health metrics through a command-line interface wrapping device driver calls.
GPU - Provides a command-line tool for monitoring and managing AMD GPU health, reliability, and performance metrics.
Kernel Dispatchers - Launches compute kernels directly on GPU hardware via a command processor for minimal latency.
Mixed-Precision Compute Engines - Accelerates matrix multiply-add operations by dispatching FP32, FP16, BF16, and Int8 tiles to dedicated hardware units.
Mixed-Precision Computing - Performs matrix operations with FP32, FP16, BF16, and Int8 inputs to speed up ML and HPC workloads.
GPU Health Monitors - Monitors and manages data center GPU health metrics through a command-line interface.
Config-File Package Integrations - Exports pre-built CMake config files that automatically locate and link ROCm components without custom Find modules.
Inter-GPU Bandwidth Probes - Reports the throughput of the xGMI interconnect between GPU nodes via a dedicated library API.
HPC System Tuners - Offers guidance on system optimization and performance validation for HPC workloads on AMD GPUs.