# OCR Text Extraction Tools

> Search results for `extracting text from images and scans with OCR` on awesome-repositories.com. 111 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/extracting-text-from-images-and-scans-with-ocr

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/extracting-text-from-images-and-scans-with-ocr).**

## Results

- [hiroi-sora/umi-ocr](https://awesome-repositories.com/repository/hiroi-sora-umi-ocr.md) (45,273 ⭐) — Umi-OCR is an optical character recognition engine designed to convert visual text from images and documents into machine-readable character data. It functions as a local-first toolkit, processing all visual data directly on the host machine using embedded neural network models to maintain privacy and offline availability.

The project distinguishes itself through its focus on automated document digitization and integrated barcode and QR code decoding. By utilizing a modular, Python-based orchestration layer, it enables users to transform static image files and multi-page documents into searchable text formats. The system is built to handle high-volume tasks, employing asynchronous task queueing to maintain throughput during batch processing operations.

Beyond its core recognition capabilities, the software provides a command-line interface that allows for the automation of repetitive extraction workflows. This interface exposes internal processing functions to external scripts, enabling the execution of batch recognition tasks without manual intervention. The project maintains consistent functionality across different operating system environments through its cross-platform native integration.
- [kha-white/manga-ocr](https://awesome-repositories.com/repository/kha-white-manga-ocr.md) (2,537 ⭐) — manga-ocr is a Japanese OCR engine and text extraction tool designed to recognize vertical and horizontal Japanese text from manga images. It operates as a vision encoder-decoder model that converts visual text into digital characters.

The project includes an OCR training pipeline and a synthetic data generator. These tools create artificial image-text pairs by overlaying diverse Japanese text fonts onto background images to refine recognition models.

The system provides automation for extracting text by monitoring the system clipboard or directories. This allows for the conversion of manga content into editable text to facilitate image translation and digitization.
- [thejoefin/text-grab](https://awesome-repositories.com/repository/thejoefin-text-grab.md) (4,610 ⭐) — Text-Grab is a desktop utility that captures text from screen regions, images, PDFs, and native user interface elements using on-device optical character recognition (OCR) and Windows UI Automation. It processes text entirely locally without sending data to external services, and extracts text directly from UI controls with perfect accuracy by reading the accessibility tree. The application also includes a persistent snippet dictionary for instant retrieval of frequently used text via a configurable system-wide hotkey.

The tool supports building reusable extraction workflows by saving capture regions alongside pattern-based transformation rules that apply regex cleaning and structuring to OCR results. It can batch-process entire folders of images or PDFs through a single-threaded pipeline, applying the same saved configurations to each file. An integrated editor cleans and restores captured text using line removal, pattern extraction, and a spreadsheet mode for tabular data.

Text-Grab runs as a Windows-native application written in C#, with no additional services or internet connection required for its core text capture and extraction features.
- [microsoft/markitdown](https://awesome-repositories.com/repository/microsoft-markitdown.md) (154,485 ⭐) — This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine-readable content.

The toolkit distinguishes itself through a modular, plugin-based architecture that orchestrates multi-stage extraction pipelines. Users can steer the parsing behavior by injecting custom instructions, enabling the system to adapt to domain-specific document structures and formatting requirements. This flexibility is supported by an integrated optical character recognition capability that ensures text recovery from embedded images during the conversion process.

The system provides both a command-line interface and a programmatic library, facilitating automated batch processing and custom integration into data pipelines. To ensure consistent performance across different environments, the project supports deployment within containerized architectures that encapsulate all necessary system-level dependencies and binaries.
- [camel-ai/camel](https://awesome-repositories.com/repository/camel-ai-camel.md) (17,253 ⭐) — This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer.

The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-evaluate reasoning traces, ensuring high-quality results. To maintain operational integrity, the system enforces schema-based output parsing for reliable workflow integration and utilizes sandboxed environments for secure, isolated code execution.

Beyond its core orchestration capabilities, the project includes a suite of utilities for retrieval-augmented generation and synthetic data production. It supports persistent memory management via vector-based context retrieval and provides extensive tooling for web automation, API integration, and human-in-the-loop oversight. The platform is designed to be model-agnostic, offering a consistent interface for interacting with a wide range of proprietary and open-source language models.
- [tesseract-ocr/tesseract](https://awesome-repositories.com/repository/tesseract-ocr-tesseract.md) (74,751 ⭐) — Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into desktop, mobile, or server-side applications. By utilizing long short-term memory networks, the engine provides robust text extraction across more than one hundred languages and dozens of scripts.

The project distinguishes itself through a sophisticated document layout analysis framework that employs a hybrid approach to resolve complex structures like multi-column text and tables. It offers extensive configurability, allowing users to refine recognition accuracy through custom linguistic models, user-defined dictionaries, and specialized training pipelines. The engine supports the generation of various structured outputs, including searchable PDFs with hidden text layers, and provides hardware-accelerated math kernels to optimize inference performance.

Beyond core recognition, the system includes comprehensive tooling for image pre-processing, page segmentation, and the management of modular language data. It provides C and C++ APIs alongside various language-specific wrappers, enabling integration into diverse software environments. The engine is available as pre-built binary packages or can be compiled from source using standard system compilers.
- [jbarlow83/ocrmypdf](https://awesome-repositories.com/repository/jbarlow83-ocrmypdf.md) (33,901 ⭐) — OCRmyPDF is a tool for converting image-based PDF files into machine-readable documents by adding a searchable text layer via optical character recognition. It functions as a multi-language processor capable of detecting and extracting text in over 100 different languages using linguistic data packs.

The software includes a PDF image optimizer to remove image artifacts and correct page skew to improve recognition accuracy. It also provides a converter to transform scanned documents into the PDF/A standard for long-term digital archiving.

The system manages PDF optimization by compressing embedded raster images to reduce overall file size. It further supports extensibility through an interface that allows the integration of custom text recognition engines.
- [deepseek-ai/deepseek-ocr](https://awesome-repositories.com/repository/deepseek-ai-deepseek-ocr.md) (22,498 ⭐) — DeepSeek-OCR is a vision processing framework designed to convert image-based text into machine-readable tokens for large language models. It functions as a document inference pipeline that encodes visual data into compact representations, enabling automated optical character recognition and document analysis workflows.

The system distinguishes itself through a high-throughput architecture that utilizes hardware-accelerated batch inference to process large volumes of visual data. It incorporates dynamic resolution scaling to manage the balance between visual detail and token consumption, ensuring that image content is compressed into optimized formats for efficient model ingestion.

The framework includes comprehensive capabilities for scaling inference throughput across distributed backends to maintain consistent performance under heavy traffic. It also integrates automated benchmarking tools to evaluate the accuracy and speed of text extraction across diverse datasets, ensuring reliable output quality during system operations.
- [nopechallc/nopecha-extension](https://awesome-repositories.com/repository/nopechallc-nopecha-extension.md) (10,013 ⭐) — This project is a CAPTCHA solver browser extension that automatically detects and resolves image, text, and behavioral challenges using an AI inference engine. It functions as a bot detection bypass tool designed to overcome interactive web barriers and session timeouts to maintain access to protected websites.

The extension provides a bridge between automated solving capabilities and external programming languages or browser automation frameworks via an API integration. It utilizes an AI-powered optical character recognition system to transcribe text from images and auditory challenges into machine-readable strings.

The tool covers a broad range of capabilities including media recognition, form submission automation, and the generation of authentication tokens to simulate successful verification. It also manages web session maintenance to prevent automatic logouts by solving periodic verification challenges.
- [google/osv-scanner](https://awesome-repositories.com/repository/google-osv-scanner.md) (10,565 ⭐) — osv-scanner is a software composition analysis tool and vulnerability scanner that checks project dependencies and container images against the Open Source Vulnerabilities database. It functions as a dependency remediation tool and can be integrated into custom Go applications as a programmable security library.

The project distinguishes itself through a remediation workflow that includes an interactive terminal user interface and automated scripting for upgrading vulnerable packages in lockfiles and manifests. It employs call-graph reachability analysis to determine if vulnerable code is actually invoked and utilizes layer-aware scanning to attribute vulnerabilities to specific stages of a container image.

Broad capabilities cover the identification of known security vulnerabilities, open source license compliance auditing, and the resolution of transitive dependencies. The system supports offline scanning via local database synchronization and integrates into development pipelines through pre-commit hooks and CI/CD security checks.

The scanner can be executed as a standalone command line interface or run from a Docker container.
- [catchthetornado/text-extract-api](https://awesome-repositories.com/repository/catchthetornado-text-extract-api.md) (3,106 ⭐) — Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
- [ckorzen/pdf-text-extraction-benchmark](https://awesome-repositories.com/repository/ckorzen-pdf-text-extraction-benchmark.md) (0 ⭐) — This project is about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles. It provides (1) a benchmark generator, (2) a ready-to-use benchmark and (3) an extensive evaluation, with…
- [naptha/tesseract.js](https://awesome-repositories.com/repository/naptha-tesseract-js.md) (38,141 ⭐) — Tesseract.js is a JavaScript library that provides optical character recognition capabilities directly within web browsers and Node.js environments. It functions as a client-side engine, enabling the conversion of images containing printed text into machine-readable strings without the need for external APIs or server-side infrastructure.

The library distinguishes itself by running the original C++ optical character recognition engine within the browser through WebAssembly modules. To maintain interface responsiveness during intensive computation, it utilizes background threads for parallel processing and employs shared memory buffers to exchange image data efficiently between the main thread and workers.

This tool supports automated data extraction from scanned documents and photographs, facilitating offline processing that preserves user privacy. The library manages complex recognition pipelines through asynchronous, promise-based orchestration and handles large language data files using local binary objects to optimize loading performance.
- [tisfeng/easydict](https://awesome-repositories.com/repository/tisfeng-easydict.md) (13,545 ⭐) — Easydict is a macOS dictionary and translator application that integrates system dictionaries, external translation services, and Large Language Models such as OpenAI and Gemini. It functions as an OCR text extractor and a text-to-speech reader, allowing users to look up words and translate text directly on the desktop.

The application features a local OCR engine that captures screen areas to recognize and translate text that cannot be highlighted or copied. It utilizes a provider-agnostic translation pipeline and adapter-based service integration to standardize responses from various cloud and local services.

The tool covers broad translation capabilities including automated multi-language text translation, dictionary word lookups, and voice synthesis for reading translated text aloud.
- [paarthneekhara/text-to-image](https://awesome-repositories.com/repository/paarthneekhara-text-to-image.md) (2,160 ⭐) — Text to image synthesis using thought vectors
- [mediar-ai/screenpipe](https://awesome-repositories.com/repository/mediar-ai-screenpipe.md) (19,337 ⭐) — Screenpipe is a local screen and audio recorder that captures and indexes digital activity to create a searchable archive of computer usage. It functions as an AI context engine, providing a local database of visual and auditory history to ground large language models.

The system serves as a Model Context Protocol server, delivering screen history and meeting transcriptions to external AI assistants. It utilizes an OCR screen search tool to extract text from visual data and a speech-to-text transcription tool for identifying speakers in system and microphone audio.

The software includes capabilities for natural language activity search, chronological activity indexing, and local vector storage for semantic retrieval. It also provides OS-level permission filtering to restrict AI agent access to sensitive content and a local REST API for programmatic activity analysis.
- [zsdonghao/text-to-image](https://awesome-repositories.com/repository/zsdonghao-text-to-image.md) (599 ⭐) — Generative Adversarial Text to Image Synthesis / Please Star -->
- [lmeszinc/azurlaneautoscript](https://awesome-repositories.com/repository/lmeszinc-azurlaneautoscript.md) (9,292 ⭐) — AzurLaneAutoScript is a mobile game automation system designed to perform repetitive gameplay tasks unattended. It functions as a screenshot-driven bot that controls Android devices, emulators, and cloud phones via ADB and uiautomator2, using computer vision to make interaction decisions instead of fixed timers.

The project distinguishes itself through an advanced computer vision suite that includes local optical character recognition and perspective-aware grid detection. These tools allow the bot to parse 3D game maps, compute vanishing points, and normalize grid-centered objects for precise entity identification.

The system covers a broad range of operational capabilities, including the automation of combat missions, daily routines, resource harvesting, and fleet management. It features a centralized task scheduler to coordinate independent jobs and can be deployed as a cross-platform Electron desktop application, a web-based remote controller, or a headless Docker container.

The software supports a variety of environments, including ARM-based hardware, single-board computers, and multiple Android emulator distributions.
- [humansignal/label-studio](https://awesome-repositories.com/repository/humansignal-label-studio.md) (27,619 ⭐) — Label Studio is a multi-modal data annotation platform designed to create and manage high-quality training datasets for machine learning. It functions as a self-hosted, containerized environment that supports secure, private deployments, including air-gapped configurations. The platform provides a centralized workspace for labeling diverse media types, such as images, text, audio, and time-series data, to support supervised and reinforcement learning workflows.

The platform distinguishes itself through deep integration with machine learning backends, enabling active learning loops, automated pre-labeling, and real-time model-assisted annotation. It features a declarative interface configuration system that uses markup to define custom labeling tools, alongside plugin-based extensibility that allows for the injection of custom logic. To support enterprise-scale operations, it includes granular role-based access control, collaborative feedback tools, and automated task distribution management.

The system covers a broad capability surface, including automated data ingestion from cloud storage, programmatic pipeline management via REST APIs, and comprehensive data export options. It also provides built-in observability tools to monitor annotator performance, inter-annotator agreement, and model quality.

The application is packaged as a portable, container-ready microservice designed for deployment in scalable, cloud-native environments.
- [copytranslator/copytranslator](https://awesome-repositories.com/repository/copytranslator-copytranslator.md) (17,749 ⭐) — CopyTranslator is a clipboard-based translation tool and multi-engine translation client that monitors the system clipboard to provide automatic language conversion. It functions as an assistant that integrates large language models, cloud translation APIs, and digital dictionaries to produce context-aware translations and side-by-side reading views.

The application includes a specialized PDF text cleaner to remove formatting artifacts and line breaks from copied content. It also features an optical character recognition extractor to convert images or screen captures into editable text for immediate translation.

The tool supports a variety of content processing workflows, including the aggregation of multiple clipboard segments to translate long paragraphs and the use of local language detection to identify source languages without a network connection. Users can customize the interface layout, visual styles, and translation display modes.

The software supports a portable execution mode that loads configuration files from the local directory to allow for installation-free use.
- [tony-xlh/chat-with-scanned-documents](https://awesome-repositories.com/repository/tony-xlh-chat-with-scanned-documents.md) (6 ⭐) — A demo chatting with documents scanned with Dynamic Web TWAIN
- [microsoft/powertoys](https://awesome-repositories.com/repository/microsoft-powertoys.md) (135,047 ⭐) — PowerToys is a collection of background-resident system utilities designed to extend native operating system functionality and streamline desktop workflows. It operates as a modular toolkit, utilizing a central plugin-based host architecture that allows users to dynamically enable or disable specific features for system configuration and automation. By leveraging native system hooking, the suite intercepts global input and window events to provide advanced control over the computing environment.

The project distinguishes itself through its focus on cross-device input orchestration and spatial window management. It enables users to synchronize peripherals and clipboard data across multiple networked computers, creating a unified multi-machine workstation. Additionally, it features a declarative window management engine that enforces custom grid zones and persistent overlay frames, allowing for granular control over window positioning and desktop organization.

The toolkit encompasses a broad range of productivity and system management capabilities, including keyboard-driven command launching, bulk file processing, and visual design aids. It integrates directly into the operating system shell to provide context-menu actions for file manipulation, image resizing, and registry inspection. Users can also customize system behavior through input remapping, environment variable management, and automated command-line tool suggestions.
- [keygraphhq/shannon](https://awesome-repositories.com/repository/keygraphhq-shannon.md) (44,672 ⭐) — Shannon is an integrated security platform designed for autonomous penetration testing, static and dynamic analysis, and automated vulnerability remediation within self-hosted, private infrastructure. It functions as a unified security suite that orchestrates the entire lifecycle of vulnerability management, from initial discovery and reachability prioritization to the generation and verification of code-level patches.

The platform distinguishes itself through its agentic approach to security, deploying autonomous agents to execute both black-box and white-box exploits against running applications to confirm vulnerabilities. It utilizes graph-based data flow analysis to trace execution paths from user inputs to sensitive sinks, ensuring that security findings are based on reachable threats rather than raw scan results. By operating in isolated or air-gapped environments, the system maintains strict data sovereignty and residency, ensuring that source code and sensitive analysis data remain within the local perimeter.

Beyond core testing, the platform provides comprehensive security observability and supply chain auditing. It correlates static code analysis with dynamic runtime exploitation to provide a unified view of risk, while automatically deduplicating findings to reduce alert noise. The system also supports the software supply chain by generating compliant manifests and inspecting container images without requiring a local container runtime.

The platform integrates directly into existing development workflows, delivering verified patches to source control and synchronizing remediation status with external project management tools. It includes robust support for compliance reporting, audit trails, and risk acceptance management to meet regulatory requirements.
- [chuongtrh/palette-from-image](https://awesome-repositories.com/repository/chuongtrh-palette-from-image.md) (37 ⭐) — Inspire from https://earthview.withgoogle.com/
- [canjie-luo/text-image-augmentation](https://awesome-repositories.com/repository/canjie-luo-text-image-augmentation.md) (0 ⭐) — A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provide the tool to avoid overfitting and gain robustness of text recognizers.
- [docling-project/docling](https://awesome-repositories.com/repository/docling-project-docling.md) (61,674 ⭐) — Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing diverse input formats into a consistent internal representation, the library enables uniform processing across various document types.

The project distinguishes itself through a schema-driven approach that maps document regions to strongly-typed objects, ensuring data accuracy through validation against predefined templates. Its pipeline-based architecture supports pluggable processing backends, allowing for the dynamic integration of specialized engines for optical character recognition and complex visual layout analysis. Users can control parsing behavior and extraction parameters through declarative configuration files, facilitating integration into automated workflows and server-based architectures.

The library provides both a programmatic interface and a command-line toolkit to support automated document processing and format conversion. It utilizes optional dependency management to allow for modular installation of specific features, such as media rendering or advanced processing capabilities, depending on the requirements of the application.
- [google/langextract](https://awesome-repositories.com/repository/google-langextract.md) (36,898 ⭐) — Langextract is a framework designed to transform unstructured text into structured, machine-readable data using language model orchestration. It provides a high-performance pipeline that processes large volumes of narrative text by utilizing parallel execution and sequential extraction passes. The library is built to handle complex data extraction tasks, including specialized support for clinical information and medical entity relationship recognition.

The project distinguishes itself through a plugin-based architecture that supports both local hardware execution and cloud-hosted model endpoints. By providing a unified abstraction layer, it allows users to switch between different inference providers without modifying core application logic. The framework enforces output consistency through schema-guided generation and prompt-driven templates, ensuring that extracted entities adhere to predefined formats.

Beyond its core extraction capabilities, the library includes administrative utilities for managing model authentication, custom provider registration, and system integration testing. It supports scalable workflows through batch processing and chunked document analysis, while offering interactive visualization tools to verify extracted results against original source text. Data can be exported in standard formats to facilitate integration with external analysis environments.
- [othersideai/self-operating-computer](https://awesome-repositories.com/repository/othersideai-self-operating-computer.md) (10,153 ⭐) — This project is a computer control framework that uses multimodal vision models to simulate mouse and keyboard inputs for automating desktop tasks. It functions as an autonomous agent and vision-based orchestrator that interprets screen visuals to interact with user interfaces.

The system employs vision language models and object detection to locate and click interface elements. It utilizes visual grounding to overlay numerical markers on UI components and uses optical character recognition to map on-screen text to precise pixel coordinates.

The framework supports voice-controlled computing by translating spoken commands into text-based objectives. It manages a full automation loop encompassing state observation through screenshots, action planning via cloud or local APIs, and the execution of synthetic inputs.
- [rednote-hilab/dots.ocr](https://awesome-repositories.com/repository/rednote-hilab-dots-ocr.md) (7,695 ⭐) — dots.ocr is a suite of software utilities for document layout analysis, multilingual optical character recognition, and scene text digitization. It functions as an engine for extracting digital text and structured layout data from images and PDFs across various human scripts.

The project includes a specialized transformer for converting charts, diagrams, and chemical formulas from raster images into scalable vector graphics. It also provides a pipeline to transform extracted text and structural layout from documents and web screenshots into formatted Markdown files.

The system covers capabilities for identifying bounding boxes and categories of layout elements to produce structured JSON representations. It further includes tools for scene text detection within natural images and an evaluation framework for measuring text and table extraction accuracy against ground truth data.
- [meiguangjin/learning-to-extract-a-video-sequence-from-a-single-motion-blurred-image](https://awesome-repositories.com/repository/meiguangjin-learning-to-extract-a-video-sequence-from-a-single-motion-blurred-image.md) (32 ⭐) — This repository is a PyTorch implementation of the paper "Learning to Extract a Video Sequence from a Single Motion-Blurred Image" from CVPR 2018…
- [pantsudango/dango-translator](https://awesome-repositories.com/repository/pantsudango-dango-translator.md) (8,411 ⭐) — Dango-Translator is an OCR translation system and multi-engine translation client designed to extract text from images or screens and replace it with translated content. It functions as an image text translator and real-time screen translator, utilizing optical character recognition to convert text between different languages automatically.

The software distinguishes itself through coordinate-based image typesetting and a glossary manager. These tools allow for the replacement of original image content with translated text in the same area and the use of specialized dictionaries to ensure consistent translation of specific terms and phrases.

The system supports real-time screen polling to monitor visual changes, plugin-based adapters for integrating various third-party translation services or local language models, and cloud-synced configuration to maintain user preferences across devices.
- [firerpa/lamda](https://awesome-repositories.com/repository/firerpa-lamda.md) (7,834 ⭐) — This project is an Android RPA framework designed for automating user interfaces and system tasks on rooted Android devices using Python and ADB. It provides a suite of tools for rooted device management, allowing for programmatic control of system settings, application lifecycles, and shell command execution via a remote API.

The framework distinguishes itself through a combination of dynamic instrumentation and AI integration. It can inject scripts into running processes to hook Java interfaces and modifies application behavior in real time. Additionally, it supports large language model integration through a standardized protocol, enabling the translation of natural language prompts into executable device actions.

The system covers a broad range of capabilities, including network traffic analysis via man-in-the-middle proxies, remote administration with real-time screen streaming and touch simulation, and a comprehensive security analysis toolset for binary patching and disassembly. It also provides an emulated Debian runtime environment for native code compilation and a variety of UI automation primitives such as optical character recognition and image-based element location.

The framework supports remote connectivity through VPNs, port forwarding, and a WebSocket-based control interface.
- [tesseract-ocr/tessdata](https://awesome-repositories.com/repository/tesseract-ocr-tessdata.md) (7,586 ⭐) — This repository provides the pre-trained neural network and legacy data files used by Tesseract to recognize and extract printed text from images. It serves as a multilingual training data repository and a collection of Long Short-Term Memory models designed for high-accuracy optical character recognition across various global scripts and languages.

The data includes specialized models for analyzing image layouts to determine text rotation and script direction. It provides the necessary language-specific datasets and linguistic patterns required to enable Tesseract OCR engines to function.

These files cover a wide range of capabilities including multilingual text extraction and document digitization. The repository contains trained models for a variety of specific languages and scripts, including Japanese, Korean, Portuguese, German, Latin, Filipino, and Armenian.
- [pot-app/pot-desktop](https://awesome-repositories.com/repository/pot-app-pot-desktop.md) (17,110 ⭐) — This application is a cross-platform desktop utility designed for automated translation, optical character recognition, and speech synthesis. It functions as a modular client that integrates various local and remote language services, allowing users to process text through hotkeys, clipboard monitoring, or direct input.

The software distinguishes itself through a plugin-based architecture and a built-in automation framework. By exposing a local network interface, it enables external applications and scripts to programmatically trigger its translation and recognition workflows. Users can further customize their experience by configuring proxy-based traffic routing to bypass regional restrictions and managing window positioning to ensure context-aware display across the desktop.

The application supports a wide range of language processing tasks, including automated language detection, text formatting, and the synchronization of vocabulary data with external study tools. It provides flexible input methods, such as screen capture and text selection integration, while offering silent background processing options to streamline multilingual workflows.
- [goldbergyoni/nodebestpractices](https://awesome-repositories.com/repository/goldbergyoni-nodebestpractices.md) (105,356 ⭐) — This project provides a comprehensive collection of industry-standard guidelines for developing, testing, and deploying Node.js applications. It covers the entire software lifecycle, offering actionable advice on code style, architectural patterns, and security measures to ensure maintainability and consistency across large-scale codebases.

The documentation details strategies for robust error management, containerization, and production readiness. It addresses operational requirements such as observability, scalability, and infrastructure configuration, while providing specific methodologies for validating software quality through automated testing and dependency management.
- [cinnamon/kotaemon](https://awesome-repositories.com/repository/cinnamon-kotaemon.md) (25,139 ⭐) — Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines.

The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex queries through iterative processing and tool-calling, while its hybrid retrieval orchestration combines vector similarity and full-text search with re-ranking to improve the accuracy of retrieved context. The framework also features event-driven streaming, which delivers incremental results from long-running pipelines to the user interface in real-time.

Beyond its core reasoning capabilities, the platform includes a suite of functional modules for the entire lifecycle of document-based applications. This includes multi-modal parsing for extracting text, tables, and visual elements from diverse file formats, as well as administrative tools for managing document collections, vector stores, and multi-user access. The system is designed to be interface-agnostic, allowing developers to wrap third-party libraries and external services into standardized, reusable processing units.

The project provides a web-based user interface for interactive querying and configuration, and it supports deployment of private, isolated instances through predefined templates.
- [ripperhe/bob](https://awesome-repositories.com/repository/ripperhe-bob.md) (9,693 ⭐) — Bob is an extensible macOS utility designed for screen text extraction, translation aggregation, and speech synthesis. It functions as a wrapper that integrates multiple optical character recognition and translation services into a single interface, allowing users to capture screen areas, decode QR codes, and convert visual text into editable strings.

The tool distinguishes itself through a plugin-based architecture that supports the integration of custom translation, speech synthesis, and image recognition APIs. It enables multi-engine parallel execution, allowing a single request to be processed by several providers simultaneously for side-by-side result comparison.

The application covers a broad range of capabilities, including hybrid cloud and offline text recognition with layout restoration and silent text capture. It also provides text-to-speech synthesis using local system voices or cloud providers, and manages a history of translations and bookmarked favorites.
- [xiaoyifang/goldendict-ng](https://awesome-repositories.com/repository/xiaoyifang-goldendict-ng.md) (2,516 ⭐) — GoldenDict-ng is a multi-source dictionary application and offline dictionary reader that enables users to search for word definitions across local files, DICT servers, and web sources in a single interface. It functions as a web-based definition browser, rendering entries using a browser engine to support HTML, CSS, and JavaScript for rich content presentation.

The project distinguishes itself by integrating with Anki flashcard systems to facilitate language learning workflows and offering specialized translation tools that support clipboard monitoring and character set conversion. It also provides advanced visual customization, allowing users to modify the lexicon interface through custom CSS and JavaScript injection.

The application covers a broad capability surface, including full-text search with Unicode normalization and stemming, OCR-based text lookup, and the management of multimedia pronunciations. It includes organizational tools for grouping dictionaries, exporting headword lists, and managing a hierarchical system of favorite words.

The software includes native Wayland support for optimized display and scaling on compatible environments.
- [getmaxun/maxun](https://awesome-repositories.com/repository/getmaxun-maxun.md) (15,049 ⭐) — Maxun is an open-source web scraping and automation platform designed to transform dynamic website content into structured data. By leveraging artificial intelligence to interpret natural language prompts, the system identifies page elements and extracts information without requiring manual selector configuration. It serves as a bridge between raw web content and intelligent workflows, providing structured outputs in formats optimized for large language model ingestion and agent-based applications.

The platform distinguishes itself through its ability to handle complex, authenticated, and dynamic web environments. It synchronizes local browser sessions to access password-protected content and employs proxy rotation and browser fingerprinting to bypass anti-scraping measures. Users can orchestrate multi-step browser interactions—such as clicking buttons and filling forms—to replicate human navigation, while the self-hosted infrastructure ensures full control over data pipelines and extraction robots.

Beyond core extraction, the platform supports a broad range of automation capabilities, including recurring task scheduling, web search integration, and visual content capture. It provides programmatic access through a command-line interface and a dedicated software development kit, allowing for seamless integration with external systems via webhooks. The platform also includes monitoring tools to track website changes and distill large volumes of information into actionable insights.
- [kba/awesome-ocr](https://awesome-repositories.com/repository/kba-awesome-ocr.md) (0 ⭐) — Awesome OCR
- [pratiksonone/ngx-i18n-scan](https://awesome-repositories.com/repository/pratiksonone-ngx-i18n-scan.md) (1 ⭐) — A powerful CLI tool for scanning Angular source code and managing i18n translation keys. It automatically extracts keys from your project and keeps your translation files (like en.json) clean and updated.
- [hillya51/lunatranslator](https://awesome-repositories.com/repository/hillya51-lunatranslator.md) (12,030 ⭐) — LunaTranslator is a real-time translation tool designed for visual novels and games. It functions as a multi-engine translation hub and text extractor that captures dialogue via memory hooking or optical character recognition to convert it into a target language.

The project distinguishes itself through specialized linguistic tools, including a Japanese text analyzer for sentence segmentation and phonetic readings. It also operates as a digital dictionary aggregator, querying multiple online and offline databases simultaneously to provide comprehensive vocabulary definitions for language learners.

The system covers a broad range of capabilities, including text-to-speech synthesis for audio generation and a sequential text processing pipeline for refining translations using custom glossaries and transformation rules. It supports various translation methods including commercial APIs, free online services, and offline engines, with the ability to extend functionality via custom translation scripting.

Text capture is handled through OCR engines, memory hooking, or system clipboard monitoring.
- [dair-ai/prompt-engineering-guide](https://awesome-repositories.com/repository/dair-ai-prompt-engineering-guide.md) (75,678 ⭐) — This project is a comprehensive educational resource and technical guide focused on the development, optimization, and application of large language models. It provides a structured curriculum for mastering prompt engineering, ranging from foundational principles of instruction design to advanced techniques for improving model reasoning, accuracy, and reliability.

The guide distinguishes itself by offering deep technical insights into agentic workflows and autonomous system design. It covers the implementation of multi-step reasoning chains, tool integration through function calling, and stateful memory management. Beyond basic prompting, it explores sophisticated frameworks that combine reasoning and acting, as well as methodologies for retrieval-augmented generation and the creation of synthetic datasets to address data scarcity in specialized domains.

The documentation also addresses the broader engineering surface of AI development, including defensive strategies for application security and automated evaluation loops for model verification. These resources are designed to support developers in building complex, task-oriented AI systems that can interact with external APIs and maintain continuity across long-running processes.
- [jaykali/maskphish](https://awesome-repositories.com/repository/jaykali-maskphish.md) (3,020 ⭐) — Maskphish is a comprehensive security toolkit that integrates capabilities for digital forensics, network vulnerability scanning, open-source intelligence, penetration testing, and social engineering. It functions as a multi-purpose framework for automating reconnaissance and executing security audits across diverse network environments.

The project features a specialized phishing and social engineering toolkit used for cloning websites, masking URLs, and deploying deceptive pages to capture user credentials. It also includes a remote access Trojan builder for generating platform-specific executables and mobile application packages to establish remote command sessions.

The framework covers a broad surface of capabilities, including web application penetration testing, OSINT reconnaissance, memory and disk forensics, and wireless network auditing. It provides tools for payload generation, credential theft, and the automation of information gathering from public data sources.

This project is implemented primarily as a shell-based application.
- [rapidai/rapidocr](https://awesome-repositories.com/repository/rapidai-rapidocr.md) (5,968 ⭐) — RapidOCR is an offline deep-learning OCR engine that detects and recognizes text in images using ONNX Runtime, operating entirely without an internet connection. It provides a unified inference pipeline that runs across multiple platforms including Windows, Linux, macOS, Android, and Raspberry Pi, with programming language bindings for Python, C++, Java, and C#.

The engine separates text detection and recognition into independent modules that can be swapped or fine-tuned individually, and abstracts the inference backend behind a unified interface allowing seamless switching between ONNX Runtime, OpenVINO, PaddlePaddle, PyTorch, MNN, and TensorRT. It supports over 80 languages by combining language-specific recognition models with a unified text detection backbone, and offers both lightweight mobile-optimized and higher-accuracy server-grade model variants selected at runtime.

The project includes a command-line tool for extracting text from images and URLs with bounding boxes and confidence scores, and provides structured programmatic output with separate fields for bounding boxes, recognized text, and confidence scores. It can classify text line orientation before recognition to improve accuracy, and visualize results by drawing detected text regions onto the original image.

For deployment, the OCR engine can be packaged into a Docker container for consistent environments across platforms, or bundled into a standalone executable using PyInstaller that removes the Python runtime dependency. The project also includes utilities for converting PaddleOCR models to ONNX format and fine-tuning them on custom data for specialized text recognition scenarios.
- [yobix-ai/extractous](https://awesome-repositories.com/repository/yobix-ai-extractous.md) (1,756 ⭐) — Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
- [infisical/infisical](https://awesome-repositories.com/repository/infisical-infisical.md) (27,374 ⭐) — Infisical is a centralized secrets management platform designed to store, synchronize, and control access to sensitive credentials and configuration data across distributed development, staging, and production environments. It employs client-side encryption to ensure that secrets remain unreadable to the underlying storage infrastructure, while providing a hierarchical permission model to govern both user and machine access.

The platform distinguishes itself through dynamic credential provisioning, which generates short-lived access tokens that are automatically revoked after use. It supports complex security workflows by integrating with external identity providers for federated authentication and offering a reverse tunneling gateway that allows secure access to private network resources without exposing inbound ports. Additionally, the system includes an event-driven audit engine that maintains an immutable record of all configuration changes and access requests to support compliance requirements.

Beyond core secret storage, the platform provides comprehensive orchestration capabilities, including automated secret injection into containerized environments and infrastructure pipelines. It also features integrated public key infrastructure management for the lifecycle of digital certificates and automated scanning to detect hardcoded secrets in source code and CI pipelines.

The platform supports flexible deployment models, allowing teams to either utilize managed cloud services or self-host the infrastructure within their own private networks. It provides a broad ecosystem of SDKs and a command-line interface to facilitate integration across various programming languages and deployment workflows.
- [daybreak-u/chineseocr_lite](https://awesome-repositories.com/repository/daybreak-u-chineseocr-lite.md) (12,324 ⭐) — chineseocr_lite is a lightweight Chinese optical character recognition engine designed to detect text regions, analyze orientation, and convert Chinese characters from images into digital text. It supports both horizontal and vertical reading layouts and can be deployed as a web service for image uploads and result visualization.

The system utilizes a multi-backend inference framework that supports ncnn, mnn, and tnn, allowing it to run across diverse hardware and platforms. It is specifically engineered for lightweight deployment on mobile and desktop environments through the use of small model files.

The engine implements a pipeline for text orientation analysis and region detection. It also provides a command line interface for processing images and exporting structured data for automated document digitization.
- [honojs/hono](https://awesome-repositories.com/repository/honojs-hono.md) (30,994 ⭐) — Hono is a lightweight web framework built on Web Standard APIs that executes across JavaScript runtimes including Cloudflare Workers, Deno, Bun, and Node.js.
- [airbernard/scene-text-detection-with-spcnet](https://awesome-repositories.com/repository/airbernard-scene-text-detection-with-spcnet.md) (0 ⭐) — Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.08605] with tensorflow. 网络实现主要借鉴Keras版本的Mask-RCNN,训练数据接口参考了argman/EAST.论文作者在知乎的文章介绍SPCNet. 训练数据放在data/下，训练数据准备在data/icdar.py：…
