What are the best Awesome Batch Processing GitHub Repositories?

Utilities for performing bulk data operations efficiently. **Distinguishing note:** No existing candidates for batch data operations. Explore 95 awesome GitHub repositories matching data & databases · Batch Processing. Refine with filters or upvote what's useful. Top picks: exacity/deeplearningbook-chinese, datalab-to/marker, facebook/immutable-js, immutable-js/immutable-js, qdrant/qdrant, kodecocodes/swift-algorithm-club, d2l-ai/d2l-en, tanstack/table, ml-explore/mlx, netflix/hystrix.

Why is exacity/deeplearningbook-chinese a recommended Batch Processing GitHub Repositories repository?

Explains standard matrix multiplication, dot products, and element-wise products.

Why is datalab-to/marker a recommended Batch Processing GitHub Repositories repository?

Handles multiple documents concurrently to increase throughput and improve efficiency.

Why is facebook/immutable-js a recommended Batch Processing GitHub Repositories repository?

Enables batching multiple mutations into a temporary state to reduce the overhead of creating intermediate objects.

Why is immutable-js/immutable-js a recommended Batch Processing GitHub Repositories repository?

Supports batch mutation processing using temporary mutable copies to optimize performance during multiple rapid updates.

Why is qdrant/qdrant a recommended Batch Processing GitHub Repositories repository?

Executes multiple insertions, updates, or deletions within a single atomic request to maintain data consistency.

Why is kodecocodes/swift-algorithm-club a recommended Batch Processing GitHub Repositories repository?

Implements matrix multiplication using both iterative and recursive divide-and-conquer strategies.

Why is d2l-ai/d2l-en a recommended Batch Processing GitHub Repositories repository?

Multiplies batches of query and key matrices efficiently to support parallel processing.

Why is tanstack/table a recommended Batch Processing GitHub Repositories repository?

Groups individual operations into batches to improve system efficiency and reduce processing overhead.

Why is ml-explore/mlx a recommended Batch Processing GitHub Repositories repository?

Calculates the product of two arrays, supporting batched operations and automatic broadcasting.

Why is netflix/hystrix a recommended Batch Processing GitHub Repositories repository?

Runs multiple remote calls concurrently while caching identical requests and collapsing them into a single batch.

95 مستودعات

Awesome GitHub RepositoriesBatch Processing

Utilities for performing bulk data operations efficiently.

Distinguishing note: No existing candidates for batch data operations.

Explore 95 awesome GitHub repositories matching data & databases · Batch Processing. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

exacity/deeplearningbook-chinese
exacity/deeplearningbook-chinese
37,285عرض على GitHub
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Explains standard matrix multiplication, dot products, and element-wise products.
TeX
عرض على GitHub37,285
datalab-to/marker
datalab-to/marker
36,137عرض على GitHub
Marker is a comprehensive document processing platform designed to automate the conversion, extraction, and structuring of data from complex files. It functions as an orchestration engine that chains modular processing steps into versioned, reusable pipelines, allowing organizations to standardize document handling and automate repetitive business tasks at scale. The platform distinguishes itself through its support for secure, private infrastructure deployment, enabling users to run containerized services within their own environments to maintain strict data privacy. It features specialized
Handles multiple documents concurrently to increase throughput and improve efficiency.
Python
عرض على GitHub36,137
facebook/immutable-js
facebook/immutable-js
33,060عرض على GitHub
This is a persistent data structure library for JavaScript that provides collections which prevent the direct mutation of objects and arrays. It serves as an immutable state management tool and functional programming utility, ensuring that data remains unchanged after creation to simplify change detection and state tracking. The library enables the maintenance of application state by producing new versions of data structures during updates. It focuses on efficient data comparison by checking actual content instead of memory references and supports a functional programming workflow to prevent
Enables batching multiple mutations into a temporary state to reduce the overhead of creating intermediate objects.
TypeScript
عرض على GitHub33,060
immutable-js/immutable-js
immutable-js/immutable-js
33,060عرض على GitHub
Immutable.js is a library of persistent data structures and a functional state management toolkit. It provides a collection of immutable objects and arrays that prevent direct mutation to ensure predictable state management in JavaScript applications. The library utilizes structural sharing to efficiently create new versions of data without full copying and implements lazy sequence processing to chain data transformations that execute only when values are requested. It also supports batch mutation processing, allowing multiple changes to be applied to a temporary mutable copy before returning
Supports batch mutation processing using temporary mutable copies to optimize performance during multiple rapid updates.
TypeScript
عرض على GitHub33,060
qdrant/qdrant
qdrant/qdrant
32,372عرض على GitHub
Qdrant is a high-performance vector similarity database designed to store, index, and search high-dimensional vectors alongside structured metadata. It functions as a distributed search engine that manages large-scale data clusters, providing low-latency retrieval and complex filtering capabilities. The system is built to serve as a specialized middleware layer, connecting machine learning pipelines and AI agents to persistent storage for intelligent information retrieval and recommendation tasks. The platform distinguishes itself through advanced retrieval techniques, including support for h
Executes multiple insertions, updates, or deletions within a single atomic request to maintain data consistency.
Rustai-searchai-search-engineembeddings-similarity
عرض على GitHub32,372
kodecocodes/swift-algorithm-club
kodecocodes/swift-algorithm-club
29,099عرض على GitHub
This project is a comprehensive collection of common computer science algorithms and data structures implemented in Swift. It serves as an educational reference and library for studying computational complexity, algorithmic logic, and data structure engineering through practical code examples. The repository provides a wide suite of data structure implementations, including various types of linked lists, heaps, hash tables, and an extensive range of hierarchical trees such as Red-Black, B-Tree, and Splay trees. It also covers diverse sorting and searching techniques, from basic bubble sort to
Implements matrix multiplication using both iterative and recursive divide-and-conquer strategies.
Swiftalgorithmsdata-structuresswift
عرض على GitHub29,099
d2l-ai/d2l-en
d2l-ai/d2l-en
29,001عرض على GitHub
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Multiplies batches of query and key matrices efficiently to support parallel processing.
Pythonbookcomputer-visiondata-science
عرض على GitHub29,001
tanstack/table
TanStack/table
28,119عرض على GitHub
TanStack Table is a headless, framework-agnostic engine designed for building complex data grids and managing tabular state. By decoupling data processing logic from the visual rendering layer, it allows developers to implement custom user interfaces while offloading sophisticated operations like sorting, filtering, grouping, and pagination to a unified, performant core. The library distinguishes itself through its commitment to type safety and environment flexibility. It leverages strict type definitions to ensure data integrity across the entire application and utilizes an adapter pattern t
Groups individual operations into batches to improve system efficiency and reduce processing overhead.
TypeScriptdatagriddatagridsdatatable
عرض على GitHub28,119
ml-explore/mlx
ml-explore/mlx
27,047عرض على GitHub
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Calculates the product of two arrays, supporting batched operations and automatic broadcasting.
C++mlx
عرض على GitHub27,047
netflix/hystrix
Netflix/Hystrix
24,461عرض على GitHub
Hystrix is a latency and fault tolerance library designed to prevent cascading failures in distributed systems. It functions as a circuit breaker implementation that monitors failure thresholds and opens circuits to isolate remote calls when downstream services degrade. The project distinguishes itself by providing multiple isolation mechanisms, utilizing dedicated thread pools and semaphores to ensure that latency in one dependency does not saturate the entire system. It also features a request collapsing and batching engine that groups concurrent calls into single executions to reduce the t
Runs multiple remote calls concurrently while caching identical requests and collapsing them into a single batch.
Java
عرض على GitHub24,461
dao-ailab/flash-attention
Dao-AILab/flash-attention
24,220عرض على GitHub
FlashAttention is an attention mechanism optimization library and machine learning acceleration framework designed to increase training speed and reduce memory footprint for large-scale neural network models. It functions as a collection of low-level CUDA kernels that optimize memory-bound operations to improve hardware utilization on graphics processing units. The library distinguishes itself through an input-output-aware algorithm design that minimizes data movement between different levels of memory. By employing kernel fusion and tiled matrix multiplication, it combines sequential operati
Divides large attention matrices into smaller blocks that fit into fast on-chip memory to minimize global memory access.
Python
عرض على GitHub24,220
pytorch/examples
pytorch/examples
23,752عرض على GitHub
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
Perform preprocessing tasks like data normalization as information is loaded into memory before passing it directly to matrix multiplication kernels.
Python
عرض على GitHub23,752
facebook/draft-js
facebook/draft-js
22,641عرض على GitHub
Draft-js is a framework for building customizable rich text editors within React applications. It functions as a content editable framework that separates the underlying data model from the visual rendering layer, acting as a rich text content engine to manage complex text data and formatting. The project utilizes an immutable state management system to ensure consistent updates and predictable undo history. It manages editor state through persistent data structures, providing an immutable data state manager to prevent accidental mutation. The framework includes capabilities for high perform
Implements a batch-mutation layer that temporarily uses mutable data for performance before returning an immutable state.
JavaScript
عرض على GitHub22,641
vectordotdev/vector
vectordotdev/vector
22,071عرض على GitHub
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Splits single incoming events into multiple discrete outputs by transforming data structures into arrays or individual records.
Rusteventsforwarderhacktoberfest
عرض على GitHub22,071
dotnet/core
dotnet/core
21,897عرض على GitHub
This project is a cross-platform development framework and managed runtime environment designed for building high-performance applications. It provides a comprehensive toolkit for constructing web services, cloud-native microservices, and desktop applications, utilizing a unified runtime that handles memory management and execution across diverse operating systems. The framework distinguishes itself through a native ahead-of-time compilation toolchain that transforms source code into optimized, self-contained machine code binaries. This capability enables fast startup times and reduced memory
Supports targeting multiple frameworks to ensure compatibility across different runtime environments.
PowerShelldotnetdotnet-core
عرض على GitHub21,897
huggingface/peft
huggingface/peft
21,274عرض على GitHub
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
Routes and merges multiple trained low-rank modules to solve new tasks.
Pythonadapterdiffusionfine-tuning
عرض على GitHub21,274
datalab-to/surya
datalab-to/surya
20,889عرض على GitHub
Surya is a document processing platform designed to transform unstructured files into structured, machine-readable data. It provides a comprehensive suite of tools for text recognition, layout analysis, and reading order detection, enabling the conversion of PDFs and images into formats such as JSON, HTML, or markdown. The platform is built to handle complex document workflows, offering capabilities for data extraction, document segmentation, and automated form completion. The platform distinguishes itself through a robust pipeline-based architecture that allows users to chain analysis tasks
Executes analysis tasks across large document collections simultaneously to improve throughput for high-volume workloads.
Python
عرض على GitHub20,889
apollographql/apollo-client
apollographql/apollo-client
19,798عرض على GitHub
Apollo Client is a GraphQL client library and data fetching framework used to request data from a GraphQL server and synchronize that state within a frontend application. It functions as a remote state manager and a local state management tool, allowing developers to define client-side schemas and resolvers for data that does not reside on a remote server. The project features a normalized GraphQL cache that identifies objects by ID to ensure referential equality and consistent data updates across different queries. It also includes a GraphQL API mocking tool to simulate server responses and
Combines multiple GraphQL operations into a single network request to reduce overhead and round trips.
TypeScriptapollo-clientapollographqlgraphql
عرض على GitHub19,798
google-gemini/cookbook
google-gemini/cookbook
17,418عرض على GitHub
The Gemini Cookbook is a comprehensive collection of implementation patterns, code samples, and development guides designed for building applications with Google Gemini models. It serves as a central resource for developers to integrate multimodal generative artificial intelligence into their software, providing the necessary frameworks to manage model interactions, stateful workflows, and structured data extraction. The repository distinguishes itself by offering specialized toolkits for autonomous agent orchestration, enabling the construction of agents that can execute code, browse the web
Handles high-volume asynchronous requests through dedicated queues to optimize throughput and bypass synchronous rate limits.
Jupyter Notebookgeminigemini-api
عرض على GitHub17,418
uber/horovod
uber/horovod
14,686عرض على GitHub
Horovod is a distributed deep learning framework designed to scale machine learning training across multiple GPUs and nodes. It functions as an orchestrator for multi-GPU scaling and a tool for distributed gradient averaging, allowing users to increase compute capacity without rewriting core model logic. The project provides a consistent communication interface that supports multi-framework model distribution across TensorFlow, PyTorch, Keras, and MXNet. It leverages an MPI distributed training library to synchronize gradients across processes using collective communication operations. The s
Groups multiple small gradient updates into a single large buffer to reduce network communication frequency.
Python
عرض على GitHub14,686

Awesome Batch Processing GitHub Repositories

exacity/deeplearningbook-chinese

datalab-to/marker

facebook/immutable-js

immutable-js/immutable-js

qdrant/qdrant

kodecocodes/swift-algorithm-club

d2l-ai/d2l-en

TanStack/table

ml-explore/mlx

Netflix/Hystrix

Dao-AILab/flash-attention

pytorch/examples

facebook/draft-js

vectordotdev/vector

dotnet/core

huggingface/peft

datalab-to/surya

apollographql/apollo-client

google-gemini/cookbook

uber/horovod

استكشف الوسوم الفرعية