3 مستودعات
Methods for processing large documents in segments to maintain accuracy.
Distinguishing note: Focuses on the chunking strategy for large-scale document extraction.
Explore 3 awesome GitHub repositories matching data & databases · Chunked Processing Utilities. Refine with filters or upvote what's useful.
Langextract is a framework designed to transform unstructured text into structured, machine-readable data using language model orchestration. It provides a high-performance pipeline that processes large volumes of narrative text by utilizing parallel execution and sequential extraction passes. The library is built to handle complex data extraction tasks, including specialized support for clinical information and medical entity relationship recognition. The project distinguishes itself through a plugin-based architecture that supports both local hardware execution and cloud-hosted model endpoi
Extracts structured information from large documents by processing text in manageable chunks.
Laravel-Excel is an integration library for importing and exporting spreadsheet data between Laravel applications and Excel or CSV files. It provides a suite of tools for bidirectional spreadsheet integration, including a system for reading workbooks and mapping data into database models. The library distinguishes itself through a background processing system that handles large imports and exports using chunking and job queues. It supports template-driven exports by converting HTML tables from view templates into spreadsheet cells. The toolset covers broad capabilities for large dataset proc
Implements chunked processing for large spreadsheets to maintain a low memory footprint during import.
Execa is a promise-based process execution library that serves as a wrapper for the Node.js child process module. It functions as a shell command runner and subprocess management tool, simplifying the execution of external commands and binaries. The library distinguishes itself through automatic argument escaping to prevent shell injection and the use of abort signals for graceful process termination. It also provides an inter-process communication wrapper for exchanging structured JSON data and messages between parent and child processes. Its capabilities cover a broad range of process I/O
Iterates over subprocess output as arbitrary binary chunks to prevent data corruption from character encoding.