7 Repos
Strategies for scaling computational throughput across multiple CPU cores.
Distinct from Computational Parallelization: Candidates are for web parallelization, simulators, or awesome lists; this is C++ language implementation.
Explore 7 awesome GitHub repositories matching programming languages & runtimes · Parallel Computing Implementation. Refine with filters or upvote what's useful.
This project is a comprehensive educational resource and programming course covering C++ language semantics and features from C++03 through C++26. It provides structured tutorials and technical guides focused on modern C++ development. The material offers specialized instruction on template metaprogramming, including the use of type traits and compile-time computations. It features detailed guides on concurrency and parallelism for multi-core execution, as well as a reference for software design applying SOLID principles and RAII. Additionally, it covers build performance optimization to redu
Instructs on distributing computational workloads across multiple CPU cores for increased throughput.
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
Demonstrates strategies for scaling computational throughput across multiple CPU cores using multi-processing.
HVM2 is a high-performance execution environment for pure functional programs, implemented as a systems-level runtime in Rust. It functions as a massively parallel functional runtime that uses interaction combinators to achieve automatic parallelism across multi-core CPUs and GPUs. The project distinguishes itself by using a graph-rewriting computational model to execute programs via local reduction rules, which eliminates the need for manual locks or atomic operations. It employs beta-optimal reduction and lazy evaluation to optimize higher-order functions and eliminate redundant computation
Distributes independent sub-expressions across CPU cores using a work-stealing queue to maximize throughput.
This repository is a collection of reference implementations and programming examples for the CUDA Toolkit. It serves as a GPGPU implementation guide and a parallel computing reference, providing code for using graphics hardware to perform general-purpose calculations and high-performance parallel processing. The project provides specific samples for GPU kernel development and resource management. These include demonstrations of multi-GPU communication, peer-to-peer memory access, and system hardware inspection to coordinate distributed GPU resources. The codebase covers a wide range of capa
Implements advanced parallelism using cooperative groups and execution graphs to optimize GPU workload distribution.
oneTBB ist eine C++-Parallelitätsbibliothek und ein Framework, das darauf ausgelegt ist, Anwendungen um Multi-Core-Parallelität zu erweitern. Es bietet ein auf Tasks basierendes Parallelitätsmodell, das logische Rechenaufgaben auf verfügbare Hardware-Kerne mappt, wodurch die manuelle Thread-Verwaltung entfällt. Die Bibliothek fungiert als Multi-Core-Skalierungstool und nutzt generische Templates, um datenparallele Operationen für portable Performance über Prozessoren hinweg zu skalieren. Sie verwendet ein Task-basiertes Framework, um sicherzustellen, dass Rechenlasten auf Hardware-Ressourcen verteilt werden. Das Projekt deckt Shared-Memory-Parallelität, Multi-Core-Task-Scheduling und die Skalierung von Datenparallelität ab. Es nutzt einen Work-Stealing-Task-Scheduler, rekursive Range-Splitting-Verfahren und dynamisches Load-Balancing, um die Arbeitsverteilung zur Laufzeit über Kerne hinweg zu verwalten.
Provides strategies for scaling computational throughput across multiple CPU cores in C++ applications.
OCaml is a strongly typed functional language featuring a sophisticated type system and a focus on safety and expressiveness. It provides a comprehensive compiling toolchain that transforms source code into either portable bytecode or high-performance native binaries. The project is distinguished by a shared memory parallel runtime that executes computations across multiple processor cores using domains, and an algebraic effect system for managing side effects and control flow through execution context handlers. It also includes a dedicated parser generator to automatically create lexers and
Implements parallel computing through a shared-memory runtime that executes computations across multiple processor cores using domains.
This project serves as a comprehensive educational resource for learning parallel programming and high-performance computing using graphics processing units. It provides technical guidance on the fundamental paradigms required to offload computationally intensive tasks from a host system to specialized hardware accelerators. The materials cover the core methodologies for managing data-parallel operations, including the orchestration of memory between host and device spaces and the organization of threads into structured grids and blocks. It details the execution models necessary to distribute
Offers educational materials focused on managing device memory and optimizing kernel execution for accelerated hardware.