1 repo
Tools and frameworks designed for the automated, high-volume processing of data files and document streams.
Distinguishing note: None of the existing candidates were relevant; this category specifically addresses automated document handling and archival workflows.
Explore 1 awesome GitHub repository matching data & databases · Batch Processing Systems. Refine with filters or upvote what's useful.
OCRmyPDF is a command-line tool designed to transform scanned documents into searchable, selectable PDF files. It functions as a document processing pipeline that adds a hidden text layer to image-based files while simultaneously optimizing the document's file size and image quality. By preserving the original visual fidelity of the input, it ensures that digitized documents remain accessible to screen readers and search engines. The project distinguishes itself through a modular architecture that supports custom plugins and the integration of external recognition engines, allowing users to t
Automates the standardization and text extraction of large document volumes for archival and indexing workflows.