30 open-source projects similar to nvidia/nvidia-container-toolkit, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Nvidia Container Toolkit alternative.
NVIDIA Docker is a container runtime wrapper that enables the use of host-level graphics processing units within isolated container environments. It functions as a containerized GPU orchestrator, mapping physical hardware resources into virtualized environments to support high-performance computing and machine learning workloads. The project provides a toolkit that facilitates integration between containerized applications and host-level graphics hardware. By utilizing a pre-start hook to intercept container creation, the runtime injects necessary device drivers and libraries into the isolate
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Chainer is an open-source deep learning framework built around define-by-run automatic differentiation, where computation graphs are constructed dynamically during forward execution. This imperative approach allows networks to be built using standard Python control flow, with gradients computed automatically through reverse-mode differentiation on the dynamically recorded graph. The framework supports GPU acceleration through a NumPy-compatible array backend with CUDA and cuDNN support, and provides a pluggable device abstraction that lets users switch between CPU and GPU computation without c
This project provides a containerized environment for running Steam games and applications without a physical monitor. It consists of a Docker image designed for headless game server hosting, utilizing a virtual display server to enable remote game streaming of video and audio to a web browser. The system integrates NVIDIA GPU virtualization to provide hardware acceleration for high-performance 3D graphics rendering within the container. A remote desktop gateway allows users to access and manage the virtualized desktop environment and game client remotely. The software includes capabilities
This project is a technical curriculum and set of educational resources focused on parallel programming, high-performance computing, and systems programming. It provides a structured course covering the implementation of parallel algorithms and multithreading techniques for processing large datasets. The project includes a systems programming guide for modern language features, a framework for lock-free concurrency patterns, and a manual for optimizing CPU and GPU performance through assembly analysis and cache management. The material covers hardware performance tuning, the implementation o
Spack is a source-based build system and package manager designed for high-performance computing. It serves as a multi-version software manager and a logic-based dependency solver that handles complex software stacks across various platforms and hardware architectures. The project distinguishes itself by managing multiple compilers and toolchains to target specific hardware. It allows the coexistence of multiple versions and configurations of the same software package on a single system by utilizing prefix-based isolation and unprivileged deployment. The system provides comprehensive capabil
Slurm is a cluster workload manager and job scheduler designed for high-performance computing environments. It functions as a distributed compute orchestrator that queues and executes large-scale computational tasks across multiple compute nodes in a cluster. The system acts as a resource arbitrator, distributing hardware nodes and processors among concurrent users to prevent resource conflicts and maximize efficiency. It coordinates the simultaneous launch of multiple processes across different physical servers to execute parallel jobs and scientific workloads. The platform covers broad cap
This project is a distributed computing platform designed to orchestrate containerized workloads across heterogeneous hardware clusters. It functions as a centralized control plane that manages resource allocation, scheduling, and execution environments, enabling organizations to share high-performance computing infrastructure securely among multiple users and projects. The platform distinguishes itself through advanced hardware virtualization and multi-tenant management capabilities. It supports the partitioning of physical graphics processing units into fractional slices, allowing multiple
This project is a parallel simulation engine and molecular dynamics simulator designed to model the physical movements of atoms and molecules. It functions as an interatomic potential framework for calculating forces between particles and a materials analysis tool for computing thermodynamic, structural, and transport properties of solids and fluids. The engine is distinguished by its high-performance computing capabilities, utilizing spatial-domain decomposition and message-passing interface communication to distribute workloads across processors. It supports multi-backend GPU acceleration v
Azure Docs is the official technical documentation repository for Microsoft Azure, the cloud computing platform. It provides comprehensive guidance on the full spectrum of Azure services, covering everything from core infrastructure components like virtual machines, Kubernetes clusters, and serverless computing to platform services for AI, machine learning, data analytics, and storage. The documentation details how to provision, manage, and govern cloud resources at scale, including policy enforcement, identity management, and cost optimization. The documentation distinguishes Azure through i
Cpp-taskflow is a C++ task-parallelism framework and task graph scheduler designed to manage and execute complex dependency graphs of parallel tasks across CPU and GPU hardware. It provides a parallel algorithm library for high-performance implementations of reductions, sorts, pipelines, and iterations. The framework distinguishes itself through its ability to offload heavy computational workloads from a task graph to graphics processors for acceleration. It also includes a task profiling tool and a performance analysis interface for visualizing task execution flow and dependency structures t
MindSpore is a deep learning framework designed for building and training neural networks across cloud, edge, and mobile environments. It functions as a distributed training system and a hardware accelerated AI toolkit capable of executing workloads on CPUs, GPUs, and specialized AI processors. The project includes an automatic differentiation engine that computes gradients through source transformation and static compilation. It enables distributed model training by splitting workloads across hardware using data and model parallelism. The framework covers cross-platform AI deployment and mo
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
cuml is a GPU-accelerated machine learning library and framework that uses CUDA to accelerate tabular data preprocessing and model execution. It provides a suite of tools for training and deploying classification, regression, and clustering models on NVIDIA GPUs and GPU clusters. The library is designed for scalability, offering a distributed GPU machine learning environment that can spread computation and data across multiple hardware accelerators and nodes to handle datasets exceeding single-device memory. It mirrors standard estimator interfaces to allow the replacement of CPU-based models
SegFormer is a semantic segmentation framework and transformer-based model designed for pixel-level image classification. It provides a deep learning architecture that assigns class labels to pixels using a hierarchical transformer encoder and a multi-layer perceptron decoder. The framework utilizes a hierarchical transformer encoder to process multi-scale features through a pyramid of blocks and an all-MLP decoder to aggregate these features without complex attention mechanisms. It incorporates overlap patch embedding to preserve local continuity and sequential self-attention reduction to ma
ISPC is a vectorizing compiler and SIMD parallel programming language that implements a single program multiple data model. It serves as a toolchain for translating C-based code with parallel extensions into optimized machine code for various CPU and GPU architectures using an LLVM backend. The compiler is designed for cross-platform SIMD toolchain support, generating specialized instruction sets for x86 SSE/AVX, ARM NEON, and Intel GPU from a single source. It features a runtime dispatch mechanism that selects the most efficient hardware-specific implementation for the current system during
Serve is a multimodal AI orchestrator and inference server designed for deploying and scaling machine learning models as cloud-native services. It functions as a containerized workflow engine and distributed service mesh that routes multimodal data through connected execution units. The framework provides specialized capabilities for large language models, including a token streaming gateway that delivers generated text incrementally to reduce perceived latency. It distinguishes itself by enabling the chaining of executors into complex data processing pipelines and the orchestration of these
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
This project is a comprehensive engineering framework and technical reference for managing, scaling, and optimizing distributed machine learning infrastructure. It provides a suite of methodologies and diagnostic tools designed to support large-scale model training and inference on high-performance computing clusters. The project distinguishes itself through a specialized diagnostic toolkit and infrastructure optimization suite that addresses the complexities of multi-node environments. It enables precise control over cluster resources, including hardware maintenance, network topology configu
This project is a comprehensive library of reusable React hooks designed to simplify browser API integration, state management, and component lifecycle tracking. It provides a declarative interface for managing complex browser interactions, allowing developers to encapsulate imperative logic into modular, composable functions that integrate directly with the component lifecycle. The library distinguishes itself by offering specialized utilities for asynchronous data orchestration, including built-in caching, retry logic, and loading state management. It also features advanced performance opti
This project is a natural language processing framework focused on a generalized autoregressive pretrainer designed for unsupervised language representation. It implements a language model that combines permutation-based training with a Transformer-XL backbone to function as a long-context text processor. The system is distinguished by its ability to handle text sequences that exceed standard length limits through the use of segment-level recurrence and relative positional encoding. It scales high-performance pretraining across multiple GPUs and TPU clusters using distributed training impleme
klib is a comprehensive C standard library extension and data structure toolkit. It provides a set of fundamental tools for memory management, data organization, and general-purpose utility functions for standalone C applications. The project features specialized capabilities for bioinformatics sequence analysis, including the parsing of FASTA, FASTQ, and Newick formats and the implementation of Smith-Waterman sequence alignment and Hidden Markov Models. It also includes a mathematical computation library for numerical routines and expression evaluation, as well as a lightweight HTTP and FTP
Leaf is a machine learning framework and neural network architecture toolkit used for building, training, and deploying models. It functions as a hardware abstraction layer, mapping high-level computational graphs to low-level instructions across various CPU and GPU backends and operating systems. The system enables the design of flexible model structures through a modular architecture where reusable container layers encapsulate weights and mathematical operations. This allows for the composition of complex neural networks via nested components. The framework includes a data engineering pipe
This project is a deep learning framework designed for end-to-end speech-to-text transcription. It utilizes the WaveNet neural network architecture to process spoken audio input and generate written text transcripts, leveraging connectionist temporal classification to map variable-length audio sequences to character-level outputs. The system distinguishes itself through a comprehensive training pipeline that supports distributed execution across multiple graphics processing units. It includes specialized utilities for audio data augmentation and the transformation of raw audio files into opti
Jetson Containers is a container management system that builds and runs GPU-accelerated Docker images for machine learning workloads on ARM64 edge hardware. It functions as a CUDA container orchestrator, automatically detecting the host's CUDA toolkit version and GPU capabilities to ensure container compatibility at runtime, while selecting the correct container image by matching the host's JetPack or L4T version at launch time. The project delivers pre-configured containers for executing quantized large language models and retrieval-augmented generation pipelines optimized for edge devices,
AITemplate is an ahead-of-time deep learning compiler that translates PyTorch neural networks into standalone C++ source code. It functions as a PyTorch to C++ compiler and a GPU kernel fusion engine, producing self-contained executable binaries that run inference without requiring a Python interpreter or deep learning framework runtime. The project generates optimized CUDA and HIP C++ code specifically for NVIDIA TensorCores and AMD MatrixCores. It focuses on maximizing throughput for half-precision floating-point operations through a system that combines multiple neural network operators in
Lightning is a PyTorch training framework and distributed AI training orchestrator designed to decouple core research logic from the engineering boilerplate required for model training. It functions as a deep learning workflow manager that automates the process of pretraining and finetuning models across diverse compute environments. The project distinguishes itself by providing a hardware-agnostic training wrapper, allowing the same model code to execute on CPUs, GPUs, or TPUs without modification. It further manages the scaling of workloads from single devices to multi-node clusters and ser
This repository contains the lab materials and Jupyter notebooks for MIT's introductory deep learning course, using TensorFlow and Keras for hands-on exercises. The courseware is delivered as pre-configured notebooks that run on Google Colaboratory's cloud infrastructure, eliminating the need for local software installation. Learners can toggle the Colab runtime to a GPU-backed hardware accelerator for faster neural network training during lab exercises. A shared Python package provides helper functions that standardize common operations across all notebooks. The course guides students throug