1 repo
Tools for converting physical or static media into digital, searchable formats.
Distinguishing note: Focuses on the end-to-end digitization process rather than just the recognition engine.
Explore 1 awesome GitHub repository matching content management & publishing · Digitization Systems. Refine with filters or upvote what's useful.
Umi-OCR is an optical character recognition engine designed to convert visual text from images and documents into machine-readable character data. It functions as a local-first toolkit, processing all visual data directly on the host machine using embedded neural network models to maintain privacy and offline availability. The project distinguishes itself through its focus on automated document digitization and integrated barcode and QR code decoding. By utilizing a modular, Python-based orchestration layer, it enables users to transform static image files and multi-page documents into search
Converts large volumes of scanned documents or images into searchable text files automatically.