5 repository-uri
Techniques for moving tensors directly to accelerator memory to avoid CPU bottlenecks.
Distinct from Tensor Mappings: None of the candidates accurately cover the specific act of mapping input tensors directly to CUDA memory for inference.
Explore 5 awesome GitHub repositories matching artificial intelligence & ml · GPU Tensor Mapping. Refine with filters or upvote what's useful.
This project is a high-throughput transcription engine and PyTorch inference wrapper designed to convert spoken audio files into text using the OpenAI Whisper model. It functions as a hardware-accelerated speech-to-text transcriber that runs locally on a user's machine. The system focuses on AI model performance tuning to maximize hardware throughput. It utilizes GPU acceleration, half-precision floating point tensors, and Flash-Attention to reduce processing time and memory overhead during transcription. The implementation covers large-scale transcription workflows and local speech-to-text
Maps audio input tensors directly to GPU memory to eliminate CPU bottlenecks in the transcription pipeline.
Deeplearnjs is a JavaScript deep learning framework and automatic differentiation engine designed for building and training artificial intelligence models within a web browser environment. It functions as a machine learning library that leverages WebGL to provide hardware acceleration for neural networks. The project serves as a high-performance linear algebra library, using the GPU to execute operations on multi-dimensional arrays. This enables the implementation of deep learning models and the execution of client-side machine learning inference. The framework covers the complete automatic
Implements tensor storage directly in graphics memory to minimize CPU-to-GPU data transfer overhead.
This project is a comprehensive collection of educational examples and reference implementations for building vision and language models using PyTorch. It serves as a deep learning tutorial covering the end-to-end process of developing neural networks, from initial architecture definition to final production deployment. The repository provides detailed guides on implementing a wide range of domain-specific models, including convolutional neural networks for object detection and segmentation, as well as transformer and recurrent architectures for natural language processing. It emphasizes gene
Transfers input data and network modules to GPU memory to accelerate computations.
This repository provides a curated collection of self-contained Python code examples that demonstrate the core capabilities of the PyTorch deep learning framework. The examples cover automatic differentiation, dynamic computational graphs, GPU‑accelerated tensor operations, and training of neural network models using gradient‑based optimization. The code samples illustrate PyTorch’s dynamic graph construction, where models can change structure with native control flow, and its automatic gradient computation through reverse‑mode differentiation. Additional examples show how to work with tensor
Demonstrates GPU tensor mapping and computation with code samples.
IsaacGymEnvs is a GPU-accelerated physics sandbox and robotics policy training suite designed for reinforcement learning. It serves as a vectorized robotic simulator that runs thousands of parallel environments on GPUs to accelerate the training of neural networks. The project provides a sim-to-real transfer framework that utilizes domain randomization and physics variations to ensure policies trained in simulation are robust enough for deployment on real hardware. It distinguishes itself through a high-performance architecture that uses tensor-based state management to handle observations an
Manages observations and rewards as contiguous GPU memory buffers to eliminate expensive CPU-to-GPU data transfers.