30 open-source projects similar to thrust/thrust, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Thrust alternative.
Thrust is a C++ parallel algorithms library that provides a suite of standard-library-inspired interfaces for execution on multi-core and accelerator hardware. It serves as a CUDA-accelerated data library and a generic parallel programming interface designed to enable high-performance data processing across GPUs and CPUs. The project implements a portable abstraction layer that allows for heterogeneous computing workflows, enabling the same core algorithm logic to run on different hardware accelerators. This is achieved through a generic programming policy design and a backend-agnostic execut
oneTBB is a C++ parallelism library and framework designed to add multi-core parallelism to applications. It provides a task-based parallelism model that maps logical computational tasks to available hardware cores to eliminate the need for manual thread management. The library functions as a multi-core scaling tool, utilizing generic templates to scale data-parallel operations across processors for portable performance. It employs a task-based framework to ensure computational workloads are distributed across hardware resources. The project covers shared memory parallelism, multi-core task
This project is a C++ Standard Library implementation that provides the foundational classes and functions required by the ISO C++ standard. It serves as a template-based generic programming library, providing the Standard Template Library's set of containers, algorithms, and iterators for data manipulation. The library is a core component of the MSVC toolchain, designed specifically for integration with the Microsoft Visual C++ compiler and build tools. The implementation covers memory management through optimized allocators and buffer strategies, as well as tools for performance benchmarki
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
TensorRT is a deep learning inference engine and software development kit designed to optimize and deploy neural networks for high-performance execution on NVIDIA GPUs. It functions as a GPU acceleration framework that reduces latency and increases throughput for trained models during production deployment. The toolkit imports models from the Open Neural Network Exchange format and transforms them into optimized engines. It utilizes graph-based model optimization, layer-fusion kernel generation, and precision-based quantization to convert floating point weights into lower precision formats.
Dask is a parallel computing framework and distributed task scheduler designed to scale Python data science workflows from single machines to large clusters. It functions as a cluster resource manager that orchestrates computational logic by representing tasks and their dependencies as directed acyclic graphs. This architecture allows the system to automate the distribution of workloads across available hardware while managing complex execution requirements. The project distinguishes itself through a lazy evaluation engine that defers data operations until they are explicitly requested, enabl
Vulkan-Hpp is a header-only C++ binding library for the Vulkan graphics and compute API. It provides a type-safe wrapper around the Vulkan C API, allowing developers to interface with GPU hardware through a C++ interface that introduces no runtime CPU overhead. The library utilizes Resource Acquisition Is Initialization patterns to manage the lifecycle of Vulkan handles and objects, automating the release of GPU resources. It replaces C-style enumerations and bit-fields with strong typing and static type checking to catch invalid API parameter assignments during compilation. The project cove
This project is a comprehensive collection of reference materials, including a language cheatsheet, a standard library reference, and a concurrency reference. It serves as a guide to modern C++ development, focusing on language syntax, standard library utilities, and template metaprogramming patterns. The repository provides specific guidance on template metaprogramming through a dedicated guide covering compile-time evaluation, type deduction, and variadic template execution. The materials cover a broad range of capabilities, including asynchronous programming, memory management, and system
Cpp-taskflow is a C++ task-parallelism framework and task graph scheduler designed to manage and execute complex dependency graphs of parallel tasks across CPU and GPU hardware. It provides a parallel algorithm library for high-performance implementations of reductions, sorts, pipelines, and iterations. The framework distinguishes itself through its ability to offload heavy computational workloads from a task graph to graphics processors for acceleration. It also includes a task profiling tool and a performance analysis interface for visualizing task execution flow and dependency structures t
This project is a technical curriculum and set of educational resources focused on parallel programming, high-performance computing, and systems programming. It provides a structured course covering the implementation of parallel algorithms and multithreading techniques for processing large datasets. The project includes a systems programming guide for modern language features, a framework for lock-free concurrency patterns, and a manual for optimizing CPU and GPU performance through assembly analysis and cache management. The material covers hardware performance tuning, the implementation o
oneAPI Threading Building Blocks (oneTBB)
Taskflow is a C++ task-parallel framework designed to build high-performance parallel workflows and complex dependency graphs. It provides a programming model that organizes computational work into directed acyclic graphs, enabling developers to manage concurrency, resource scheduling, and task dependencies across multi-core CPUs and GPU accelerators. The framework distinguishes itself through its ability to orchestrate heterogeneous systems, allowing for the integration of hardware-accelerated kernels and memory operations into unified execution pipelines. It supports dynamic runtime subflow
golang101 is a comprehensive Go programming knowledge base and technical reference library. It provides structured guides and documentation covering Go syntax, runtime behavior, and idiomatic coding patterns. The project serves as a dedicated guide for performance optimization, offering technical strategies to reduce memory allocations, improve garbage collection, and increase execution speed. It also focuses on the Go type system, including generic programming and concurrent synchronization techniques. The library encompasses broader capabilities for language learning, including the study o
This is a comprehensive tutorial for learning TypeScript, designed for JavaScript programmers who want to understand the language's type system and modern features. The resource covers TypeScript's core identity, including its structural type compatibility, compile-time type erasure, declaration file merging, and the discriminated union pattern for precise type narrowing. The tutorial distinguishes itself by providing a progressive learning path from basic JavaScript concepts to advanced TypeScript patterns. It covers generic type parameter constraints, tuple types with fixed-length positions
This project is a C++ learning resource and study guide consisting of structured notes and programming examples. It provides practical implementations and exercise solutions covering core language syntax, data types, and control flow. The repository features specialized samples for object-oriented design, including class inheritance, polymorphism, and abstract classes. It includes demonstrations of memory management techniques such as dynamic allocation, move semantics, and placement new, as well as template programming examples for creating generic functions and data structures. The codebas
c3c is the compiler for the C3 programming language, transforming source code into executable binaries, static libraries, or dynamic libraries using an LLVM backend. It implements a system based on result-based error handling, scoped memory pooling, and a semantic macro system. The compiler provides first-class support for hardware-backed SIMD vectors that map directly to processor instructions and enables runtime polymorphism through interface-based dynamic dispatch. The project covers a broad set of low-level capabilities, including manual and pooled memory management, inline assembly inte
Crossbeam is a concurrency toolkit for Rust providing low-level primitives for writing multi-threaded programs. It focuses on lock-free data structures and memory management primitives designed for shared-memory concurrent environments. The project includes a work-stealing scheduler that uses double-ended queues to balance workloads across multiple processor cores. This system enables the implementation of work-stealing deques to distribute tasks and prevent bottlenecks. The toolkit covers broader capabilities for parallel algorithm development, multi-threaded task scheduling, and general co
NCCL is a high-performance communication library and distributed GPU computing framework designed for executing collective and point-to-point data exchanges across multiple GPUs in single or multi-node systems. It serves as an RDMA GPU transport layer and memory orchestrator, facilitating high-bandwidth synchronization of data and model gradients for distributed GPU training and inference. The library is distinguished by its ability to execute communication primitives directly from GPU kernels, removing the host CPU from the critical path. It utilizes topology-aware path selection to optimize
CppGuide is a curated collection of educational resources and practical guides focused on C++ server development, Linux kernel internals, concurrent programming, network protocols, and security exploitation. It provides structured learning paths for backend developers, covering everything from interview preparation to building high-performance network servers and understanding operating system fundamentals. The guide distinguishes itself by offering in-depth, hands-on tutorials that walk through real-world implementations, including building a Redis-like server from scratch, designing custom
Catch2 is a comprehensive framework for C++ software validation, providing an environment for unit testing, integration verification, and performance analysis. It enables developers to define and execute automated test suites and micro-benchmarks directly within their applications. The framework is distinguished by its header-only distribution, which allows for integration into existing build systems without requiring complex external dependencies. It utilizes a hierarchical section-based execution model that supports behavior-driven testing, allowing for shared setup and teardown logic acros
This repository provides a curated collection of self-contained Python code examples that demonstrate the core capabilities of the PyTorch deep learning framework. The examples cover automatic differentiation, dynamic computational graphs, GPU‑accelerated tensor operations, and training of neural network models using gradient‑based optimization. The code samples illustrate PyTorch’s dynamic graph construction, where models can change structure with native control flow, and its automatic gradient computation through reverse‑mode differentiation. Additional examples show how to work with tensor
HIP is a C++ GPU kernel language and cross-platform runtime designed for writing portable high-performance compute applications. It provides a programming interface that allows a single source codebase to execute on both AMD and NVIDIA GPU architectures. The project functions as a compatibility layer that enables the conversion and migration of existing CUDA source code to run on AMD hardware. This is achieved through a syntax mapping that mirrors CUDA and a source-to-source translation process during compilation. The toolkit covers the broader surface of cross-platform GPGPU development, in
Goja is a JavaScript engine and ECMAScript compliant interpreter implemented entirely in Go. It serves as an embedded scripting engine that allows Go applications to execute JavaScript code and integrate a programmable scripting layer without relying on Cgo or external native dependencies. The project functions as a bridge between Go and JavaScript, enabling bidirectional data exchange and function invocation. It allows Go hosts to expose native structs, slices, and maps as JavaScript objects and arrays, while providing mechanisms to export script values and functions back into native Go type
This repository serves as the programming language design repository for C#, containing the official language specification and the technical standards governing its grammar, type safety, and memory management. It functions as a collaborative space for the formal design and evolution of the language. The project manages a community-driven evolution process, utilizing a public proposal backlog to debate and adopt new features. This involves formal syntax prototyping and the engineering of the type system to refine the language's behavior and implementation. The scope of the specification cove
doctest is a lightweight C++ unit testing framework and assertion library. It provides a single-header implementation that eliminates complex build dependencies, allowing developers to write and execute test cases directly within their source code. The framework is distinguished by its focus on compile-time performance and binary overhead. It uses conditional compilation guards to strip all testing logic and metadata from production binaries. Additionally, it features hierarchical subcases that re-execute parent setup code to isolate different execution paths within a single test case. Its c
Anko is an Android Kotlin library designed to simplify application development through a set of domain-specific languages and extensions. It functions as a programmatic UI DSL, an SQLite wrapper, an SDK utility, and an asynchronous framework. The project provides a declarative layout system that allows developers to build user interfaces through code instead of static XML markup. It distinguishes itself by offering a fluent database layer that eliminates manual cursor management and a concurrency system that uses weak references to prevent memory leaks in activities. The library covers broad
PureScript is a statically typed, purely functional programming language that compiles to JavaScript. It is designed as a cross-platform frontend language for building safe web applications, utilizing a static type system and a JavaScript compiler to ensure program correctness across browser and server environments. The language is distinguished by its emphasis on mathematical purity, featuring a robust type system with first-class support for monads. It provides a sophisticated toolset for static verification, including algebraic data types, type classes, and automatic type inference to reje
cuml is a GPU-accelerated machine learning library and framework that uses CUDA to accelerate tabular data preprocessing and model execution. It provides a suite of tools for training and deploying classification, regression, and clustering models on NVIDIA GPUs and GPU clusters. The library is designed for scalability, offering a distributed GPU machine learning environment that can spread computation and data across multiple hardware accelerators and nodes to handle datasets exceeding single-device memory. It mirrors standard estimator interfaces to allow the replacement of CPU-based models