7 مستودعات
Strategies for scaling computational throughput across multiple CPU cores.
Distinct from Computational Parallelization: Candidates are for web parallelization, simulators, or awesome lists; this is C++ language implementation.
Explore 7 awesome GitHub repositories matching programming languages & runtimes · Parallel Computing Implementation. Refine with filters or upvote what's useful.
This project is a comprehensive educational resource and programming course covering C++ language semantics and features from C++03 through C++26. It provides structured tutorials and technical guides focused on modern C++ development. The material offers specialized instruction on template metaprogramming, including the use of type traits and compile-time computations. It features detailed guides on concurrency and parallelism for multi-core execution, as well as a reference for software design applying SOLID principles and RAII. Additionally, it covers build performance optimization to redu
Instructs on distributing computational workloads across multiple CPU cores for increased throughput.
This repository is a comprehensive collection of instructional guides and practical examples for Python development, focusing on machine learning, data science, and web scraping. It provides implementations for neural networks, reinforcement learning algorithms, and deep learning architectures using PyTorch, alongside detailed manuals for scientific computing and data visualization. The project distinguishes itself by offering specialized tutorials on concurrent programming to optimize CPU performance and guides for setting up Linux development environments. It covers the implementation of ad
Demonstrates strategies for scaling computational throughput across multiple CPU cores using multi-processing.
HVM2 is a high-performance execution environment for pure functional programs, implemented as a systems-level runtime in Rust. It functions as a massively parallel functional runtime that uses interaction combinators to achieve automatic parallelism across multi-core CPUs and GPUs. The project distinguishes itself by using a graph-rewriting computational model to execute programs via local reduction rules, which eliminates the need for manual locks or atomic operations. It employs beta-optimal reduction and lazy evaluation to optimize higher-order functions and eliminate redundant computation
Distributes independent sub-expressions across CPU cores using a work-stealing queue to maximize throughput.
This repository is a collection of reference implementations and programming examples for the CUDA Toolkit. It serves as a GPGPU implementation guide and a parallel computing reference, providing code for using graphics hardware to perform general-purpose calculations and high-performance parallel processing. The project provides specific samples for GPU kernel development and resource management. These include demonstrations of multi-GPU communication, peer-to-peer memory access, and system hardware inspection to coordinate distributed GPU resources. The codebase covers a wide range of capa
Implements advanced parallelism using cooperative groups and execution graphs to optimize GPU workload distribution.
oneTBB هي مكتبة وإطار عمل للتوازي بلغة C++ مصممة لإضافة التوازي متعدد النواة إلى التطبيقات. توفر نموذج توازي قائماً على المهام يقوم بتعيين المهام الحسابية المنطقية إلى أنوية الأجهزة المتاحة للقضاء على الحاجة إلى إدارة الخيوط (threads) يدوياً. تعمل المكتبة كأداة توسيع متعددة النواة، وتستخدم قوالب عامة لتوسيع نطاق العمليات المتوازية للبيانات عبر المعالجات للحصول على أداء محمول. توظف إطار عمل قائماً على المهام لضمان توزيع أعباء العمل الحسابية عبر موارد الأجهزة. يغطي المشروع التوازي في الذاكرة المشتركة، وجدولة المهام متعددة النواة، وتوسيع نطاق توازي البيانات. يستخدم مجدول مهام يعتمد على سرقة العمل، وتقسيم النطاق العودي، وموازنة الحمل الديناميكية لإدارة توزيع العمل عبر الأنوية في وقت التشغيل.
Provides strategies for scaling computational throughput across multiple CPU cores in C++ applications.
OCaml is a strongly typed functional language featuring a sophisticated type system and a focus on safety and expressiveness. It provides a comprehensive compiling toolchain that transforms source code into either portable bytecode or high-performance native binaries. The project is distinguished by a shared memory parallel runtime that executes computations across multiple processor cores using domains, and an algebraic effect system for managing side effects and control flow through execution context handlers. It also includes a dedicated parser generator to automatically create lexers and
Implements parallel computing through a shared-memory runtime that executes computations across multiple processor cores using domains.
This project serves as a comprehensive educational resource for learning parallel programming and high-performance computing using graphics processing units. It provides technical guidance on the fundamental paradigms required to offload computationally intensive tasks from a host system to specialized hardware accelerators. The materials cover the core methodologies for managing data-parallel operations, including the orchestration of memory between host and device spaces and the organization of threads into structured grids and blocks. It details the execution models necessary to distribute
Offers educational materials focused on managing device memory and optimizing kernel execution for accelerated hardware.