6 repos
Optimized kernels and execution strategies for mapping operations onto GPUs and specialized silicon.
Explore 6 awesome GitHub repositories matching devops & infrastructure · Hardware-Accelerated Compute Backends. Refine with filters or upvote what's useful.
PyTorch is a machine learning framework centered on a GPU-ready tensor library that supports multi-dimensional array operations across both CPU and accelerator hardware. It provides a foundational infrastructure for mathematical computation and dynamic neural network construction, utilizing a tape-based automatic diffe
Enables direct sharing of accelerator-resident memory buffers between separate processes.
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Powers high-performance infrastructure for deploying and serving machine learning models within a recommendation pipeline.
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco
Optimizes compute performance by mapping intensive mathematical operations onto GPU and specialized hardware backends.
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Maps complex mathematical operations onto diverse graphics processing units and specialized silicon using optimized kernels.
YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning
Minimizes memory footprint on resource-constrained hardware by disabling non-essential services and graphical interfaces.
This repository provides a collection of practical demonstrations and implementation guides for machine learning tasks using TensorFlow.js. It serves as a resource for developers to explore model architectures, training workflows, and data manipulation techniques across domains such as computer vision, natural language
Native binary acceleration optimizes linear algebra computations on the CPU across multiple operating systems.