30 dépôts
Mathematical operations that utilize processor-specific vector instruction sets for parallel throughput.
Distinct from Vectorized Arithmetic: Distinct from general vectorized arithmetic: focuses specifically on SIMD-level hardware instruction utilization for performance.
Explore 30 awesome GitHub repositories matching data & databases · SIMD-Accelerated Arithmetic. Refine with filters or upvote what's useful.
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
Applies a single instruction across multiple data elements simultaneously to accelerate vector operations.
GGML is a machine learning tensor library and neural network engine written in C. It functions as a compute-focused runtime designed to execute transformer-based models and perform complex mathematical operations on multi-dimensional arrays directly on local consumer hardware. The library distinguishes itself by enabling local inference for large language models and edge machine learning deployment without reliance on external cloud infrastructure. It achieves this through a tensor-based computation graph that organizes operations for efficient execution and memory management, alongside stati
Utilizes processor-specific instruction sets to perform parallel arithmetic operations on data arrays for significantly faster mathematical throughput.
John is a command-line security utility designed for password strength auditing and cryptographic hash recovery. It functions as a professional tool for identifying weak user credentials and recovering access to protected files, archives, and private keys across various operating systems, databases, and applications. The software distinguishes itself through a high-performance architecture that utilizes processor-level vector instructions to perform parallel cryptographic operations. It incorporates a rule-based mutation engine that transforms dictionary words into complex candidates based on
Utilizes processor-level vector instructions to perform multiple cryptographic operations in parallel for significantly increased throughput.
libfacedetection is a C++ face detection library and computer vision tool. It utilizes a neural network face detector to identify human faces in images and return bounding box coordinates. The library is designed for low latency and high throughput processing, enabling real-time face detection in image and video streams. It supports automated image analysis for identifying coordinates of human faces across large batches of photos and high-performance video processing.
Utilizes processor-specific SIMD vector instructions to accelerate neural network mathematical computations.
This project serves as an educational resource for learning and implementing low-level assembly language optimizations. It provides a structured guide for developers to master hardware-specific instructions and manual performance tuning, focusing on the translation of high-level code into efficient machine-level operations for resource-constrained environments. The materials emphasize techniques for maximizing computational throughput in multimedia processing. By covering instruction-level parallelism, register management, and data parallelism, the project enables the development of software
Executes multiple data operations in a single instruction cycle to maximize throughput for high-bandwidth multimedia processing tasks.
TurboVec is a high-performance Rust vector database and quantized search index designed for storing and retrieving high-dimensional embeddings. It functions as a pluggable vector store for large language model orchestration frameworks, providing a memory-efficient alternative to standard in-memory storage. The project distinguishes itself through a high-dimensional vector compressor that utilizes random rotation and data-oblivious scalar quantization to reduce memory footprints. Retrieval is accelerated via SIMD kernels that process distance calculations and search operations for increased th
Accelerates nearest neighbor retrieval using SIMD-accelerated kernels for maximum throughput.
xxHash is a high-performance, non-cryptographic hash library designed for rapid checksum generation and data integrity verification. It functions as an incremental hashing engine, allowing for the processing of large or streaming data inputs by maintaining a persistent internal state across sequential chunks. The library is engineered as a computational framework that maximizes throughput by utilizing wide CPU registers and branchless instruction pipelining. It achieves high-speed performance by aligning data access with CPU cache lines and employing multi-stage mixing functions that ensure c
Processes data in parallel using wide CPU registers to maximize throughput during large memory block hashing operations.
This project is a header-only C++ library designed for graphics mathematics, providing a comprehensive suite of vector, matrix, and quaternion types. It is built using template metaprogramming to generate mathematical primitives at compile time, eliminating the need for precompiled binary libraries and allowing for direct integration into existing build systems. The library is distinguished by its strict adherence to the OpenGL Shading Language specification, ensuring that mathematical results remain consistent across both CPU and GPU code. It provides specialized utilities for managing float
Structures mathematical primitives to align with hardware memory requirements for efficient data access.
This project is an open-source 3D game engine designed for building high-fidelity games, simulations, and cinematic environments. It functions as a robotics simulation platform with native integration for ROS 2 to model robot controllers and sensors. The engine features a multi-threaded Forward+ physically based renderer that supports hardware-accelerated ray tracing and global illumination. The system is built on a modular extension architecture using Gems to add or replace features without modifying core binaries. It includes a native SDK for AWS cloud integration, enabling IAM authenticati
Executes precise mathematical calculations using SIMD-accelerated libraries optimized for x64 and ARM architectures.
JUCE is a comprehensive C++ audio framework and digital signal processing library used to build cross-platform audio applications, audio plug-ins, and high-performance user interfaces. It serves as a development kit for creating audio processors compatible with industry-standard plugin formats for digital audio workstations, as well as a tool for MIDI and Open Sound Control communication between musical hardware and software. The framework is distinguished by its ability to maintain a single codebase for native desktop and mobile applications across multiple operating systems. It provides a f
Employs SIMD vector instructions to perform parallel calculations on audio buffers for higher efficiency.
Thorium is a web browser built from the Chromium project, designed for high performance and expanded compatibility. It utilizes aggressive compiler optimizations and CPU-specific instruction sets, such as AVX2 and SIMD, to increase page rendering and JavaScript execution speeds. The project distinguishes itself by providing custom builds that enable modern web browsing on legacy versions of Windows and Linux. It further diverges from standard browser implementations by integrating Widevine DRM and native support for high-efficiency media formats, including HEVC and JPEG XL. Broad capabilitie
Leverages hardware-level instruction set extensions to accelerate the encryption and decryption of secure web pages.
BLAKE3 est une implémentation haute performance de l'algorithme de hachage cryptographique BLAKE3 utilisé pour calculer des digests de données sécurisés et des empreintes digitales. Il fonctionne comme un outil de hachage cryptographique parallèle qui distribue les charges de travail sur plusieurs threads de processeur pour traiter rapidement de grands jeux de données. Le projet fournit des outils spécialisés pour le hachage avec clé et la génération de codes d'authentification de message. Il inclut également une fonctionnalité pour la dérivation de clé cryptographique, permettant la création de sous-clés secrètes uniques à partir d'une clé maîtresse et de chaînes de contexte. L'implémentation prend en charge la vérification de l'intégrité des données via le calcul de hachage parallèle et le streaming de données vérifié. Ces capacités sont fournies sous forme de bibliothèque inter-langages pour les environnements Rust et C et incluent une interface en ligne de commande pour calculer les digests de fichiers ou d'entrée standard.
Maximizes CPU throughput by processing multiple data blocks simultaneously using SIMD lanes.
GNU Radio is an open-source software-defined radio framework that provides a digital signal processing toolkit for building wireless communication systems. At its core, it uses a block-based flow graph architecture where pre-built signal processing blocks are connected into directed graphs to define and execute custom radio signal processing pipelines. The system operates as a flow graph signal processor that enables low-latency streaming radio signal processing, supporting both real-time operation and wireless communication simulation entirely in software. The framework distinguishes itself
Uses a vector-optimized library to automatically select CPU-specific SIMD instructions for signal processing.
Gorgonia is a Go library that provides an automatic differentiation engine and a computation graph framework for building and training neural networks. It functions as a CUDA-accelerated tensor library and a SIMD-optimized math library, enabling machine learning workflows entirely within the Go ecosystem. The library distinguishes itself through a dual-backend architecture that dispatches neural network operations to either a GPU or CPU depending on CUDA availability at runtime. It constructs differentiable directed acyclic graphs of tensor operations, supports reverse-mode automatic gradient
Uses platform-specific SIMD instructions to accelerate mathematical operations for neural network training.
Highway est une bibliothèque C++ portable et une couche d'abstraction matérielle conçue pour écrire du code SIMD (Single Instruction Multiple Data). Elle fournit une interface unifiée qui mappe la logique de parallélisme de données vers divers jeux d'instructions CPU, permettant le développement de logiciels haute performance qui s'exécutent sur différentes architectures de processeurs sans nécessiter d'assembleur spécifique à l'architecture. Le projet dispose d'un répartiteur d'instructions dynamique qui sélectionne le jeu d'instructions CPU le plus efficace à l'exécution en fonction du matériel détecté. Il prend également en charge la spécialisation de cible statique et des mécanismes extensibles pour ajouter de nouvelles cibles matérielles ou des opérations SIMD personnalisées. La bibliothèque couvre un large éventail d'opérations vectorielles, incluant l'arithmétique élément par élément, la réduction de voie, le mélange (shuffling) et l'exécution conditionnelle masquée. Elle inclut une bibliothèque mathématique vectorisée, un gestionnaire de mémoire pour l'allocation alignée et les opérations de chargement-stockage masquées, ainsi que des primitives pour la cryptographie accélérée par matériel. Des outils sont fournis pour la compilation et la validation automatisées des instructions accélérées par matériel sur plusieurs architectures de processeurs.
Provides a portable interface to write data-parallel code that maps to hardware-accelerated SIMD instructions.
c3c is the compiler for the C3 programming language, transforming source code into executable binaries, static libraries, or dynamic libraries using an LLVM backend. It implements a system based on result-based error handling, scoped memory pooling, and a semantic macro system. The compiler provides first-class support for hardware-backed SIMD vectors that map directly to processor instructions and enables runtime polymorphism through interface-based dynamic dispatch. The project covers a broad set of low-level capabilities, including manual and pooled memory management, inline assembly inte
Executes parallel arithmetic and logical operations on hardware-backed vectors to maximize computational throughput.
ZLinq is a zero-allocation LINQ library and memory-efficient collection toolkit for C#. It provides a high-performance replacement for standard query operations by using value-type enumerators and pooled memory to eliminate heap allocations and reduce garbage collection overhead. The library features a C# source generator that automatically routes standard query method calls to these zero-allocation implementations. It further accelerates data processing through a SIMD accelerated data library, using hardware vectorization for numeric aggregations and bulk operations on primitive arrays and s
Utilizes processor-specific SIMD vector instructions to accelerate numeric aggregations and primitive processing.
Rack est un émulateur de synthétiseur modulaire Eurorack virtuel et un SDK de synthèse modulaire. Il fournit un environnement numérique pour créer et router des signaux de musique électronique en utilisant des modules virtuels, des oscillateurs et des filtres, simulant le comportement du matériel analogique via le routage de signal basé sur la tension. Le système fonctionne comme un convertisseur MIDI et CV, traduisant les signaux entre le logiciel et le matériel externe, et peut fonctionner comme un plugin VST ou un instrument standard de l'industrie au sein de stations de travail audio numériques. Il agit également comme un hôte de plugin VST, intégrant des instruments virtuels et des effets externes pour étendre les outils de traitement sonore disponibles. La plateforme inclut une gamme complète de capacités de traitement audio, y compris la synthèse par modélisation physique, le traitement spectral et les effets basés sur le temps. Elle fournit des outils pour la génération de tension de contrôle, le séquençage de notes et le traitement de signal polyphonique, aux côtés d'un kit de développement pour construire des modules audio personnalisés avec génération d'interface pilotée par SVG. Une interface en ligne de commande est disponible pour le lancement d'applications, l'amorçage de projets et l'automatisation de la production de fichiers source à partir de graphiques vectoriels.
Utilizes SIMD vector instructions to process multiple audio channels in parallel for CPU efficiency.
MiniOB is an open-source educational relational database kernel designed for learning the internals of database systems. It implements a dual-engine storage architecture combining B+ Tree and LSM-Tree, supports SQL parsing and query execution, and provides transactional processing with multi-version concurrency control. The system communicates with clients using the MySQL wire protocol and includes a vector database extension for storing and querying high-dimensional vectors. The project distinguishes itself through its comprehensive coverage of core database concepts in a single, learnable c
Uses single-instruction-multiple-data instructions to speed up arithmetic, aggregation, and hash-table operations on vectorized data chunks.
This project is a technical curriculum and set of educational resources focused on parallel programming, high-performance computing, and systems programming. It provides a structured course covering the implementation of parallel algorithms and multithreading techniques for processing large datasets. The project includes a systems programming guide for modern language features, a framework for lock-free concurrency patterns, and a manual for optimizing CPU and GPU performance through assembly analysis and cache management. The material covers hardware performance tuning, the implementation o
Utilizes processor-specific vector instruction sets to accelerate mathematical operations like matrix multiplication.