30 open-source projects similar to gpujs/gpu.js, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Gpu.js alternative.
ogl is a WebGL graphics library and 3D scene graph engine designed for rendering three-dimensional scenes. It provides a lightweight framework for managing geometries and coordinating spatial transformations within a hierarchical system. The project includes a PBR shader system for creating realistic materials and a GPGPU computation framework for performing large-scale general-purpose calculations and particle simulations on the graphics processor. It also features a post-processing suite for applying visual filters to rendered scenes via frame buffers. The library covers broader capabiliti
Numba is a just-in-time compiler that translates high-level Python functions into optimized machine code at runtime. By leveraging the LLVM compiler infrastructure, it provides a framework for accelerating numerical data processing and mathematical computations, enabling performance levels comparable to statically compiled languages. The project distinguishes itself through its ability to perform type-inference-based specialization, which generates machine instructions tailored to the specific data types used during execution. It employs a lazy compilation pipeline that defers translation unt
This project is a high-performance numerical computing library designed for large-scale scientific and machine learning workloads. It functions as an automatic differentiation framework and a just-in-time compilation engine, transforming high-level Python code into optimized machine instructions. By enforcing pure functional programming patterns and immutable array semantics, the library ensures that mathematical functions remain compatible with automated graph transformations and symbolic differentiation. The platform distinguishes itself through its distributed array computing capabilities,
RustPython is a Python 3 compatible interpreter implemented in Rust. It functions as a scripting engine that can be embedded directly into host applications, allowing for the execution of dynamic scripts and the customization of software behavior within a memory-safe environment. The project distinguishes itself through its ability to bridge Python and JavaScript runtimes, enabling data exchange and function invocation across language boundaries. It also provides a portable execution environment by compiling Python code into WebAssembly, which allows for the execution of logic directly within
This project serves as a comprehensive educational resource for learning parallel programming and high-performance computing using graphics processing units. It provides technical guidance on the fundamental paradigms required to offload computationally intensive tasks from a host system to specialized hardware accelerators. The materials cover the core methodologies for managing data-parallel operations, including the orchestration of memory between host and device spaces and the organization of threads into structured grids and blocks. It details the execution models necessary to distribute
Boost is a collection of portable, high-performance source libraries that extend the C++ standard library. It provides a wide range of reusable components, data structures, and algorithms designed to add capabilities to the base language across different platforms. The project is distinguished by its extensive focus on compile-time template metaprogramming and generic programming. It implements advanced architectural patterns such as policy-based design, concept-based type validation, and the use of SFINAE for conditional template resolution to minimize runtime overhead. The library covers a
Napajs is an embeddable JavaScript engine and multi-threaded runtime designed to be integrated directly into other software applications as a component. It serves as a parallel computation framework that allows JavaScript code to execute across multiple threads, bypassing the standard single-threaded event loop limitation to handle CPU-intensive tasks. The runtime is distinguished by its ability to load and execute modules from the NPM ecosystem and its pluggable execution environment. This architecture allows for custom implementations of memory allocation, system logging, and performance me
h2o-3 is a distributed machine learning platform and automated machine learning framework designed for training and deploying predictive models using distributed in-memory computing. It functions as a deep learning framework and a distributed model scoring engine, capable of operating as a Kubernetes ML cluster to process large datasets in parallel. The platform distinguishes itself through automated machine learning capabilities that automatically select the best algorithms and hyperparameters to optimize model performance. It provides specialized deep learning toolkits for tasks including i
Surge is a Swift library for high-performance numerical analysis, linear algebra, digital signal processing, and accelerated image manipulation. It utilizes the Accelerate framework to provide hardware-accelerated tools for matrix mathematics and signal processing. The library provides specialized capabilities for digital signal processing, including convolution, signal similarity analysis through cross-correlation, and domain transformations using fast Fourier transforms. It also includes a suite of tools for the rapid transformation and analysis of pixel buffers and image data. Beyond sign
mctx is a framework for executing high-performance tree search and state simulations to generate policy targets for neural networks. It functions as a compiled search engine and neural dynamics simulator that predicts state transitions and rewards using learned representations. The project implements a vectorised tree search capable of running parallel search operations across input batches. It utilizes a policy target generator to convert search results into action weights used for training and refining neural network policies. The system covers reinforcement learning workflows by integrati
This repository is a collection of reference implementations and programming examples for the CUDA Toolkit. It serves as a GPGPU implementation guide and a parallel computing reference, providing code for using graphics hardware to perform general-purpose calculations and high-performance parallel processing. The project provides specific samples for GPU kernel development and resource management. These include demonstrations of multi-GPU communication, peer-to-peer memory access, and system hardware inspection to coordinate distributed GPU resources. The codebase covers a wide range of capa
HVM2 is a high-performance execution environment for pure functional programs, implemented as a systems-level runtime in Rust. It functions as a massively parallel functional runtime that uses interaction combinators to achieve automatic parallelism across multi-core CPUs and GPUs. The project distinguishes itself by using a graph-rewriting computational model to execute programs via local reduction rules, which eliminates the need for manual locks or atomic operations. It employs beta-optimal reduction and lazy evaluation to optimize higher-order functions and eliminate redundant computation
Bend is a high-level parallel programming language and compiler designed to execute code across multi-core CPUs and GPUs automatically. By translating functional source code into a graph-based intermediate representation, it enables massive parallel execution without requiring manual management of threads, locks, or atomic operations. The runtime operates as an interaction net engine, where computations are represented as networks of nodes that reduce through local rewriting rules. This model utilizes a work-stealing scheduler to distribute tasks across thousands of hardware threads, ensuring
Taskflow is a C++ task-parallel framework designed to build high-performance parallel workflows and complex dependency graphs. It provides a programming model that organizes computational work into directed acyclic graphs, enabling developers to manage concurrency, resource scheduling, and task dependencies across multi-core CPUs and GPU accelerators. The framework distinguishes itself through its ability to orchestrate heterogeneous systems, allowing for the integration of hardware-accelerated kernels and memory operations into unified execution pipelines. It supports dynamic runtime subflow
This project serves as an educational resource for learning and implementing low-level assembly language optimizations. It provides a structured guide for developers to master hardware-specific instructions and manual performance tuning, focusing on the translation of high-level code into efficient machine-level operations for resource-constrained environments. The materials emphasize techniques for maximizing computational throughput in multimedia processing. By covering instruction-level parallelism, register management, and data parallelism, the project enables the development of software
Triton is a parallel computing framework and high-level programming language designed for writing custom compute kernels. It functions as a deep learning compiler, translating complex mathematical operations into high-throughput instructions that maximize hardware utilization and memory efficiency on graphics processing units. The framework distinguishes itself through a hardware-agnostic compute abstraction that allows developers to define kernels without manual low-level tuning. It employs just-in-time compilation to generate optimized binary instructions at runtime, utilizing static data f
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
GluonTS is a framework for probabilistic time series forecasting, designed to predict future values as probability distributions with confidence intervals. It supports both traditional model training and zero-shot forecasting, where pretrained models generate predictions for new series without additional training. The project distinguishes itself by integrating a wide variety of forecasting approaches into a unified workflow. This includes deep learning architectures such as recurrent neural networks and causal convolutions, as well as the integration of external statistical models, the Proph
DataFrame is a C++ tabular data library and manipulation engine designed for managing heterogeneous data in contiguous memory. It functions as a statistical analysis framework and time series analysis toolkit, providing the means to store, index, and transform multidimensional datasets. The project distinguishes itself through a high-performance execution model that utilizes column-major storage, SIMD-aligned memory allocation, and a thread-pool for parallel computations. It employs a visitor-based algorithm dispatch system and policy-driven transformations to decouple data processing logic f
xtensor is a C++ multidimensional array library for numerical computing that provides N-dimensional containers with an interface mirroring the NumPy API. It utilizes a lazy evaluation expression engine to defer numerical computations until assignment, which minimizes memory allocations and intermediate copies. The library features a foreign memory array adaptor that allows it to wrap external buffers, such as NumPy arrays, to perform numerical operations in-place without duplicating data. It further optimizes performance through lazy broadcasting and a system that manages the lifetime of temp
Shapely is a library for the manipulation and analysis of planar geometric objects, serving as a Python wrapper for the GEOS C++ engine. It provides a framework for calculating geometric properties, evaluating spatial relationships, and performing topological predicates within a Cartesian plane. The project distinguishes itself through a vectorized geometry processor capable of executing spatial operations across large arrays of shapes to increase throughput. It also includes a spatial indexing system based on R-trees to accelerate the retrieval of intersecting geometries and nearest neighbor
TypeGPU is a tool for type-safe WebGPU development that enables writing shaders in TypeScript. It translates high-level TypeScript function definitions and structures into WebGPU Shading Language source code to automate shader generation and validate logic using a type system. The project provides a mechanism for cross-library GPU interoperability by sharing typed buffers without copying data to system memory. It also integrates the Model Context Protocol to allow AI agents to inspect generated shader code and diagnose runtime errors. The system manages WebGPU resource mapping through typed
This project is an interactive data science environment that combines code execution, rich media visualization, and narrative documentation into a persistent, browser-based platform. It serves as a comprehensive educational resource for scientific computing, providing a framework for iterative data analysis and machine learning prototyping. The environment is distinguished by its focus on high-performance numerical computing, utilizing vectorized array operations and memory-mapped data structures to handle large-scale computations efficiently. It features a unified estimator interface that st
Highway is a portable C++ library and hardware abstraction layer designed for writing single instruction multiple data (SIMD) code. It provides a unified interface that maps data-parallel logic to various CPU instruction sets, enabling the development of high-performance software that runs across different processor architectures without requiring architecture-specific assembly. The project features a dynamic instruction dispatcher that selects the most efficient CPU instruction set at runtime based on detected hardware. It also supports static target specialization and extensible mechanisms
gfx is a hardware-agnostic graphics API abstraction that translates a unified set of graphics and compute commands into native instructions for multiple GPU drivers. It provides a common interface for cross-platform rendering and general-purpose GPU compute programming. The project features an intermediate-representation shader translation system that converts source code and SPIR-V into target-specific languages. It employs a data-driven reference test framework to verify that graphics output remains consistent across different hardware platforms. Capabilities include parallel command buffe
SentencePiece is a text segmentation engine and tokenization library designed for machine learning workflows. It provides a comprehensive toolkit for transforming raw text into subword units or numerical identifiers, enabling consistent data representation for neural network training and inference. The library supports the training of segmentation models from raw text, allowing for the creation of custom vocabularies tailored to specific domain requirements. The project distinguishes itself through its byte-level encoding and fallback mechanisms, which ensure that every input can be represent
TinyGo is a specialized compiler and development toolkit designed to bring the Go programming language to resource-constrained microcontrollers and WebAssembly environments. It provides a bare-metal runtime environment that enables high-level code execution without the need for a traditional operating system, utilizing an LLVM-based backend to generate efficient machine instructions. The project distinguishes itself through aggressive optimization techniques tailored for small hardware, including a static memory allocation strategy and whole-program dead code elimination that significantly re
This project is a cross-platform graphics and compute framework that provides a unified, hardware-agnostic abstraction layer for rendering and parallel processing. It enables developers to build high-performance applications that execute consistently across diverse operating systems and hardware backends, including Vulkan, Metal, and DirectX. By mapping high-level graphics commands to native APIs, it serves as a portable foundation for both real-time 3D rendering and general-purpose GPU computing. The framework distinguishes itself through a robust architecture that supports both native deskt
HHVM is a high-performance execution engine and runtime environment designed for the Hack language. It functions as a persistent web application server that processes incoming network traffic, while also providing command-line utilities for executing standalone scripts and performing automated tasks. The project distinguishes itself through a sophisticated execution model that utilizes just-in-time compilation to translate bytecode into optimized machine code. This process is supported by a static type analysis engine that enforces strict data constraints and identifies type inconsistencies b
OpenRGB is a centralized software suite for controlling colors and lighting effects across various brands of RGB hardware. It functions as a cross-platform controller and hardware control system that provides a unified interface for managing lighting profiles and effects. The project features an extensible plugin framework and a dedicated plugin interface that allow for the addition of new hardware support and integration features. It includes a network gateway that exposes an API for third-party applications to send lighting commands to connected devices. The system supports multi-computer