95 مستودعات
Utilities for performing bulk data operations efficiently.
Distinguishing note: No existing candidates for batch data operations.
Explore 95 awesome GitHub repositories matching data & databases · Batch Processing. Refine with filters or upvote what's useful.
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Explains standard matrix multiplication, dot products, and element-wise products.
Marker is a comprehensive document processing platform designed to automate the conversion, extraction, and structuring of data from complex files. It functions as an orchestration engine that chains modular processing steps into versioned, reusable pipelines, allowing organizations to standardize document handling and automate repetitive business tasks at scale. The platform distinguishes itself through its support for secure, private infrastructure deployment, enabling users to run containerized services within their own environments to maintain strict data privacy. It features specialized
Handles multiple documents concurrently to increase throughput and improve efficiency.
This is a persistent data structure library for JavaScript that provides collections which prevent the direct mutation of objects and arrays. It serves as an immutable state management tool and functional programming utility, ensuring that data remains unchanged after creation to simplify change detection and state tracking. The library enables the maintenance of application state by producing new versions of data structures during updates. It focuses on efficient data comparison by checking actual content instead of memory references and supports a functional programming workflow to prevent
Enables batching multiple mutations into a temporary state to reduce the overhead of creating intermediate objects.
Immutable.js is a library of persistent data structures and a functional state management toolkit. It provides a collection of immutable objects and arrays that prevent direct mutation to ensure predictable state management in JavaScript applications. The library utilizes structural sharing to efficiently create new versions of data without full copying and implements lazy sequence processing to chain data transformations that execute only when values are requested. It also supports batch mutation processing, allowing multiple changes to be applied to a temporary mutable copy before returning
Supports batch mutation processing using temporary mutable copies to optimize performance during multiple rapid updates.
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
Executes multiple insertions, updates, or deletions within a single atomic request to maintain data consistency.
This project is a comprehensive collection of common computer science algorithms and data structures implemented in Swift. It serves as an educational reference and library for studying computational complexity, algorithmic logic, and data structure engineering through practical code examples. The repository provides a wide suite of data structure implementations, including various types of linked lists, heaps, hash tables, and an extensive range of hierarchical trees such as Red-Black, B-Tree, and Splay trees. It also covers diverse sorting and searching techniques, from basic bubble sort to
Implements matrix multiplication using both iterative and recursive divide-and-conquer strategies.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Multiplies batches of query and key matrices efficiently to support parallel processing.
TanStack Table is a headless, framework-agnostic engine designed for building complex data grids and managing tabular state. By decoupling data processing logic from the visual rendering layer, it allows developers to implement custom user interfaces while offloading sophisticated operations like sorting, filtering, grouping, and pagination to a unified, performant core. The library distinguishes itself through its commitment to type safety and environment flexibility. It leverages strict type definitions to ensure data integrity across the entire application and utilizes an adapter pattern t
Groups individual operations into batches to improve system efficiency and reduce processing overhead.
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Calculates the product of two arrays, supporting batched operations and automatic broadcasting.
Hystrix is a latency and fault tolerance library designed to prevent cascading failures in distributed systems. It functions as a circuit breaker implementation that monitors failure thresholds and opens circuits to isolate remote calls when downstream services degrade. The project distinguishes itself by providing multiple isolation mechanisms, utilizing dedicated thread pools and semaphores to ensure that latency in one dependency does not saturate the entire system. It also features a request collapsing and batching engine that groups concurrent calls into single executions to reduce the t
Runs multiple remote calls concurrently while caching identical requests and collapsing them into a single batch.
FlashAttention is an attention mechanism optimization library and machine learning acceleration framework designed to increase training speed and reduce memory footprint for large-scale neural network models. It functions as a collection of low-level CUDA kernels that optimize memory-bound operations to improve hardware utilization on graphics processing units. The library distinguishes itself through an input-output-aware algorithm design that minimizes data movement between different levels of memory. By employing kernel fusion and tiled matrix multiplication, it combines sequential operati
Divides large attention matrices into smaller blocks that fit into fast on-chip memory to minimize global memory access.
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
Perform preprocessing tasks like data normalization as information is loaded into memory before passing it directly to matrix multiplication kernels.
Draft-js is a framework for building customizable rich text editors within React applications. It functions as a content editable framework that separates the underlying data model from the visual rendering layer, acting as a rich text content engine to manage complex text data and formatting. The project utilizes an immutable state management system to ensure consistent updates and predictable undo history. It manages editor state through persistent data structures, providing an immutable data state manager to prevent accidental mutation. The framework includes capabilities for high perform
Implements a batch-mutation layer that temporarily uses mutable data for performance before returning an immutable state.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Splits single incoming events into multiple discrete outputs by transforming data structures into arrays or individual records.
This project is a cross-platform development framework and managed runtime environment designed for building high-performance applications. It provides a comprehensive toolkit for constructing web services, cloud-native microservices, and desktop applications, utilizing a unified runtime that handles memory management and execution across diverse operating systems. The framework distinguishes itself through a native ahead-of-time compilation toolchain that transforms source code into optimized, self-contained machine code binaries. This capability enables fast startup times and reduced memory
Supports targeting multiple frameworks to ensure compatibility across different runtime environments.
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
Routes and merges multiple trained low-rank modules to solve new tasks.
Surya is a document processing platform designed to transform unstructured files into structured, machine-readable data. It provides a comprehensive suite of tools for text recognition, layout analysis, and reading order detection, enabling the conversion of PDFs and images into formats such as JSON, HTML, or markdown. The platform is built to handle complex document workflows, offering capabilities for data extraction, document segmentation, and automated form completion. The platform distinguishes itself through a robust pipeline-based architecture that allows users to chain analysis tasks
Executes analysis tasks across large document collections simultaneously to improve throughput for high-volume workloads.
Apollo Client is a GraphQL client library and data fetching framework used to request data from a GraphQL server and synchronize that state within a frontend application. It functions as a remote state manager and a local state management tool, allowing developers to define client-side schemas and resolvers for data that does not reside on a remote server. The project features a normalized GraphQL cache that identifies objects by ID to ensure referential equality and consistent data updates across different queries. It also includes a GraphQL API mocking tool to simulate server responses and
Combines multiple GraphQL operations into a single network request to reduce overhead and round trips.
The Gemini Cookbook is a comprehensive collection of implementation patterns, code samples, and development guides designed for building applications with Google Gemini models. It serves as a central resource for developers to integrate multimodal generative artificial intelligence into their software, providing the necessary frameworks to manage model interactions, stateful workflows, and structured data extraction. The repository distinguishes itself by offering specialized toolkits for autonomous agent orchestration, enabling the construction of agents that can execute code, browse the web
Handles high-volume asynchronous requests through dedicated queues to optimize throughput and bypass synchronous rate limits.
Horovod is a distributed deep learning framework designed to scale machine learning training across multiple GPUs and nodes. It functions as an orchestrator for multi-GPU scaling and a tool for distributed gradient averaging, allowing users to increase compute capacity without rewriting core model logic. The project provides a consistent communication interface that supports multi-framework model distribution across TensorFlow, PyTorch, Keras, and MXNet. It leverages an MPI distributed training library to synchronize gradients across processes using collective communication operations. The s
Groups multiple small gradient updates into a single large buffer to reduce network communication frequency.