43 dépôts
Optimized operations for multiplying batches of matrices to process minibatches simultaneously.
Distinct from Batch Processing: Distinct from general batch processing: focuses on the specific linear algebra operation of batch matrix multiplication.
Explore 43 awesome GitHub repositories matching data & databases · Batch Matrix Multiplication Utilities. Refine with filters or upvote what's useful.
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Explains standard matrix multiplication, dot products, and element-wise products.
This is a persistent data structure library for JavaScript that provides collections which prevent the direct mutation of objects and arrays. It serves as an immutable state management tool and functional programming utility, ensuring that data remains unchanged after creation to simplify change detection and state tracking. The library enables the maintenance of application state by producing new versions of data structures during updates. It focuses on efficient data comparison by checking actual content instead of memory references and supports a functional programming workflow to prevent
Enables batching multiple mutations into a temporary state to reduce the overhead of creating intermediate objects.
Immutable.js is a library of persistent data structures and a functional state management toolkit. It provides a collection of immutable objects and arrays that prevent direct mutation to ensure predictable state management in JavaScript applications. The library utilizes structural sharing to efficiently create new versions of data without full copying and implements lazy sequence processing to chain data transformations that execute only when values are requested. It also supports batch mutation processing, allowing multiple changes to be applied to a temporary mutable copy before returning
Supports batch mutation processing using temporary mutable copies to optimize performance during multiple rapid updates.
This project is a comprehensive collection of common computer science algorithms and data structures implemented in Swift. It serves as an educational reference and library for studying computational complexity, algorithmic logic, and data structure engineering through practical code examples. The repository provides a wide suite of data structure implementations, including various types of linked lists, heaps, hash tables, and an extensive range of hierarchical trees such as Red-Black, B-Tree, and Splay trees. It also covers diverse sorting and searching techniques, from basic bubble sort to
Implements matrix multiplication using both iterative and recursive divide-and-conquer strategies.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Multiplies batches of query and key matrices efficiently to support parallel processing.
TanStack Table is a headless, framework-agnostic engine designed for building complex data grids and managing tabular state. By decoupling data processing logic from the visual rendering layer, it allows developers to implement custom user interfaces while offloading sophisticated operations like sorting, filtering, grouping, and pagination to a unified, performant core. The library distinguishes itself through its commitment to type safety and environment flexibility. It leverages strict type definitions to ensure data integrity across the entire application and utilizes an adapter pattern t
Groups individual operations into batches to improve system efficiency and reduce processing overhead.
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Calculates the product of two arrays, supporting batched operations and automatic broadcasting.
Hystrix is a latency and fault tolerance library designed to prevent cascading failures in distributed systems. It functions as a circuit breaker implementation that monitors failure thresholds and opens circuits to isolate remote calls when downstream services degrade. The project distinguishes itself by providing multiple isolation mechanisms, utilizing dedicated thread pools and semaphores to ensure that latency in one dependency does not saturate the entire system. It also features a request collapsing and batching engine that groups concurrent calls into single executions to reduce the t
Runs multiple remote calls concurrently while caching identical requests and collapsing them into a single batch.
FlashAttention is an attention mechanism optimization library and machine learning acceleration framework designed to increase training speed and reduce memory footprint for large-scale neural network models. It functions as a collection of low-level CUDA kernels that optimize memory-bound operations to improve hardware utilization on graphics processing units. The library distinguishes itself through an input-output-aware algorithm design that minimizes data movement between different levels of memory. By employing kernel fusion and tiled matrix multiplication, it combines sequential operati
Divides large attention matrices into smaller blocks that fit into fast on-chip memory to minimize global memory access.
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
Perform preprocessing tasks like data normalization as information is loaded into memory before passing it directly to matrix multiplication kernels.
Draft-js is a framework for building customizable rich text editors within React applications. It functions as a content editable framework that separates the underlying data model from the visual rendering layer, acting as a rich text content engine to manage complex text data and formatting. The project utilizes an immutable state management system to ensure consistent updates and predictable undo history. It manages editor state through persistent data structures, providing an immutable data state manager to prevent accidental mutation. The framework includes capabilities for high perform
Implements a batch-mutation layer that temporarily uses mutable data for performance before returning an immutable state.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Splits single incoming events into multiple discrete outputs by transforming data structures into arrays or individual records.
This project is a cross-platform development framework and managed runtime environment designed for building high-performance applications. It provides a comprehensive toolkit for constructing web services, cloud-native microservices, and desktop applications, utilizing a unified runtime that handles memory management and execution across diverse operating systems. The framework distinguishes itself through a native ahead-of-time compilation toolchain that transforms source code into optimized, self-contained machine code binaries. This capability enables fast startup times and reduced memory
Supports targeting multiple frameworks to ensure compatibility across different runtime environments.
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
Routes and merges multiple trained low-rank modules to solve new tasks.
Apollo Client is a GraphQL client library and data fetching framework used to request data from a GraphQL server and synchronize that state within a frontend application. It functions as a remote state manager and a local state management tool, allowing developers to define client-side schemas and resolvers for data that does not reside on a remote server. The project features a normalized GraphQL cache that identifies objects by ID to ensure referential equality and consistent data updates across different queries. It also includes a GraphQL API mocking tool to simulate server responses and
Combines multiple GraphQL operations into a single network request to reduce overhead and round trips.
Horovod is a distributed deep learning framework designed to scale machine learning training across multiple GPUs and nodes. It functions as an orchestrator for multi-GPU scaling and a tool for distributed gradient averaging, allowing users to increase compute capacity without rewriting core model logic. The project provides a consistent communication interface that supports multi-framework model distribution across TensorFlow, PyTorch, Keras, and MXNet. It leverages an MPI distributed training library to synchronize gradients across processes using collective communication operations. The s
Groups multiple small gradient updates into a single large buffer to reduce network communication frequency.
Horovod is a distributed deep learning framework and gradient synchronizer designed to scale model training across multiple GPUs and compute nodes. It functions as a distributed training orchestrator and an elastic training engine, utilizing an MPI collective communication library to synchronize weights and gradients across TensorFlow, PyTorch, Keras, and MXNet models. The system distinguishes itself through dynamic elastic scaling, which allows it to adjust the number of active workers at runtime and recover from node failures. It optimizes communication efficiency using tensor fusion batchi
Groups multiple small tensors into larger buffers to reduce network overhead during gradient synchronization.
Bytebase is a database DevSecOps platform and management console designed to orchestrate schema migrations, deployments, and security audits across multiple database engines. It serves as a SQL GitOps tool that synchronizes database states with configurations stored in Git repositories to manage infrastructure as code. The platform distinguishes itself through a multi-database management console that provides a single interface for relational and NoSQL databases. It includes a security layer for role-based access control, database activity auditing, and column-level data masking to protect se
Pushes schema or data updates to multiple databases and tenants simultaneously using a centralized interface.
Dask est un framework de calcul parallèle et un planificateur de tâches distribué conçu pour mettre à l'échelle les flux de travail de science des données Python, des machines uniques aux grands clusters. Il fonctionne comme un gestionnaire de ressources de cluster qui orchestre la logique computationnelle en représentant les tâches et leurs dépendances sous forme de graphes acycliques dirigés. Cette architecture permet au système d'automatiser la distribution des charges de travail sur le matériel disponible tout en gérant des exigences d'exécution complexes. Le projet se distingue par un moteur d'évaluation paresseuse qui diffère les opérations sur les données jusqu'à ce qu'elles soient explicitement demandées, permettant une optimisation globale du graphe et une allocation efficace des ressources. Il intègre le déversement de données conscient de la mémoire pour éviter les plantages du système lors du traitement de jeux de données dépassant la mémoire disponible, et il utilise la fusion de graphes de tâches pour combiner des séquences d'opérations en étapes d'exécution uniques, minimisant la surcharge de planification et la communication entre nœuds. La plateforme fournit une surface de capacités complète pour l'analyse de données à grande échelle, incluant le support pour l'apprentissage automatique distribué, l'intégration du calcul haute performance et le traitement de données parallèle. Elle offre des outils étendus pour la gestion du cycle de vie des clusters, le profilage des performances et la surveillance en temps réel de l'exécution des tâches. Les utilisateurs peuvent déployer ces environnements sur diverses infrastructures, incluant le matériel local, les fournisseurs cloud, les systèmes conteneurisés et les clusters de calcul haute performance.
Groups multiple queries into a single execution call to enable parallel processing and reuse of shared intermediate results.
This project is a comprehensive library for numerical linear algebra and scientific computing, designed to provide optimized routines for matrix decomposition, statistical modeling, and high-performance data analysis. It serves as both a toolkit for solving complex linear systems and an educational resource for understanding the fundamental algorithms behind matrix factorizations and numerical solvers. The library distinguishes itself through a focus on randomized numerical linear algebra, utilizing probabilistic algorithms and approximate methods to perform dimensionality reduction and matri
Improves performance and numerical stability using block matrix multiplication and advanced solvers.