Umi OCR

Umi-OCR is an optical character recognition engine designed to convert visual text from images and documents into machine-readable character data. It functions as a local-first toolkit, processing all visual data directly on the host machine using embedded neural network models to maintain privacy and offline availability.

The project distinguishes itself through its focus on automated document digitization and integrated barcode and QR code decoding. By utilizing a modular, Python-based orchestration layer, it enables users to transform static image files and multi-page documents into searchable text formats. The system is built to handle high-volume tasks, employing asynchronous task queueing to maintain throughput during batch processing operations.

Beyond its core recognition capabilities, the software provides a command-line interface that allows for the automation of repetitive extraction workflows. This interface exposes internal processing functions to external scripts, enabling the execution of batch recognition tasks without manual intervention. The project maintains consistent functionality across different operating system environments through its cross-platform native integration.

Features

Optical Character Recognition - Performs optical character recognition on image files to extract text and associated metadata.
Local Inference Engines - Processes visual data entirely on the host machine using embedded neural network models for privacy and offline use.
Document Analysis Tools - Converts document pages into readable text by analyzing page layouts and returning identified character strings.
Barcode Decoders - Extracts hidden text from image files by identifying and decoding visual patterns found within scanned codes.

hiroi-soraUmi-OCR

Name: hiroi-sora/umi-ocr
Author: hiroi-sora

View on GitHub

45,273 stars4,454 forksPythonMIT21 views

Umi OCR

Features

Optical Character Recognition - Performs optical character recognition on image files to extract text and associated metadata.
Local Inference Engines - Processes visual data entirely on the host machine using embedded neural network models for privacy and offline use.
Document Analysis Tools - Converts document pages into readable text by analyzing page layouts and returning identified character strings.
Barcode Decoders - Extracts hidden text from image files by identifying and decoding visual patterns found within scanned codes.

Open-source alternatives to Umi OCR

Similar open-source projects, ranked by how many features they share with Umi OCR.

awesome-selfhosted/awesome-selfhosted
awesome-selfhosted/awesome-selfhosted
299,516View on GitHub
This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure. The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It dis
awesomeawesome-listcloud
View on GitHub299,516
react-native-camera/react-native-camera
react-native-camera/react-native-camera
9,638View on GitHub
This project provides cross-platform programmatic interfaces and UI components for integrating camera hardware into mobile applications. It serves as a tool for implementing image and video capture, as well as specialized scanning and recognition tasks. The library includes specialized capabilities for computer vision, including a barcode scanner for decoding various barcode types, a face detection tool to identify human faces in a live feed, and an optical character recognition engine for extracting written text from the camera stream. The system covers hardware configuration and control, i
Javacameraface-detectionreact-native
View on GitHub9,638
xushengfeng/esearch
xushengfeng/eSearch
6,275View on GitHub
eSearch is a desktop tool that combines screen capture, image annotation, screen recording, optical character recognition (OCR), and text search and translation into a single application. It is built around a modular architecture that coordinates these tasks through an event-driven capture pipeline, allowing users to capture screen regions, annotate them with drawing and shape tools, and then extract text using a local-first OCR engine or optional cloud services. The project distinguishes itself by integrating a command-line interface for triggering capture and recognition tasks, enabling scr
TypeScriptclipboardcolor-pickercross-platform
View on GitHub6,275
oobabooga/text-generation-webui
oobabooga/text-generation-webui
47,323View on GitHub
This project is a comprehensive platform for hosting and interacting with large language models directly on local hardware. It provides a web-based graphical interface that allows users to manage model loading, configure generation parameters, and execute text or chat interactions entirely offline. By running models locally, the software ensures complete data privacy and eliminates reliance on external cloud services for generative tasks. Beyond basic inference, the platform functions as a versatile workbench for generative AI development. It includes an integrated pipeline for fine-tuning mo
Python
View on GitHub47,323

See all 30 alternatives to Umi OCR

Frequently asked questions

What does hiroi-sora/umi-ocr do?

What are the main features of hiroi-sora/umi-ocr?

The main features of hiroi-sora/umi-ocr are: Optical Character Recognition, Local Inference Engines, Document Analysis Tools, Barcode Decoders, Digital Preservation Tools, Orchestration Frameworks, Workflow Automation Tools, Automation Scripts.

What are some open-source alternatives to hiroi-sora/umi-ocr?

Open-source alternatives to hiroi-sora/umi-ocr include: awesome-selfhosted/awesome-selfhosted — This project is a community-curated directory of open-source software designed for deployment in private server… react-native-camera/react-native-camera — This project provides cross-platform programmatic interfaces and UI components for integrating camera hardware into… xushengfeng/esearch — eSearch is a desktop tool that combines screen capture, image annotation, screen recording, optical character… oobabooga/text-generation-webui — This project is a comprehensive platform for hosting and interacting with large language models directly on local… vysheng/tg — This project is a Telegram command line interface and MTProto client. It functions as a userbot framework, providing a… ggml-org/whisper.cpp — Whisper.cpp is a high-performance, local-first speech recognition engine designed to run large-scale machine learning…

Umi OCR

Features

Umi OCR

Features

Open-source alternatives to Umi OCR

awesome-selfhosted/awesome-selfhosted

react-native-camera/react-native-camera

xushengfeng/eSearch

oobabooga/text-generation-webui

Frequently asked questions

Star history

Frequently asked questions

Open-source alternatives to Umi OCR

awesome-selfhosted/awesome-selfhosted

react-native-camera/react-native-camera

xushengfeng/eSearch

oobabooga/text-generation-webui