3 dépôts
Tools and frameworks designed for the automated, high-volume processing of data files and document streams.
Distinguishing note: None of the existing candidates were relevant; this category specifically addresses automated document handling and archival workflows.
Explore 3 awesome GitHub repositories matching data & databases · Batch Processing Systems. Refine with filters or upvote what's useful.
OCRmyPDF is a command-line tool designed to transform scanned documents into searchable, selectable PDF files. It functions as a document processing pipeline that adds a hidden text layer to image-based files while simultaneously optimizing the document's file size and image quality. By preserving the original visual fidelity of the input, it ensures that digitized documents remain accessible to screen readers and search engines. The project distinguishes itself through a modular architecture that supports custom plugins and the integration of external recognition engines, allowing users to t
Automates the standardization and text extraction of large document volumes for archival and indexing workflows.
RAG-Anything is a retrieval-augmented generation framework designed to index diverse document formats and perform semantic search using local machine learning models. It functions as a local multimodal data processor, extracting and organizing information from various file types into a unified knowledge base to facilitate private document analysis. The system distinguishes itself through its high-throughput ingestion engine, which processes large batches of documents into searchable vector embeddings. By executing machine learning models directly on local hardware, the framework ensures that
Automates high-volume document ingestion through simultaneous processing tasks.
Waifu2x-Extension-GUI is a desktop application designed for high-fidelity media restoration and enhancement. It functions as a graphical interface that orchestrates specialized deep learning engines to upscale, denoise, and interpolate images and videos, improving visual clarity and motion smoothness. The software distinguishes itself through its ability to manage complex, automated media processing pipelines. Users can chain multiple tasks—such as format conversion, scene detection, and frame rate interpolation—into sequential workflows that execute without manual intervention. It provides g
Automates high-volume media processing tasks including scene detection and post-processing workflows.