# yuliang-liu/monkeyocr

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/yuliang-liu-monkeyocr).**

6,487 stars · 448 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/Yuliang-Liu/MonkeyOCR
- awesome-repositories: https://awesome-repositories.com/repository/yuliang-liu-monkeyocr.md

## Description

MonkeyOCR is a GPU-accelerated document parsing server that converts PDFs and images into structured markdown while preserving the spatial layout of text, formulas, and tables. It provides both an interactive Gradio web interface for uploading files and viewing parsed output in real time, and a RESTful HTTP API endpoint that accepts document uploads and returns structured JSON results for programmatic consumption.

The system routes document pages through specialized OCR sub-models for text, formula, and table recognition based on the selected extraction task, and packages the entire model stack into a Docker container for reproducible GPU-accelerated deployment. Users can selectively extract only text, formulas, or tables from a document page, or convert the full page into markdown that retains the spatial relationships among all content elements.

The project offers a demo web interface for interactive use and a Docker deployment option for production environments, both leveraging GPU hardware for fast document processing.

## Tags

### Artificial Intelligence & ML

- [OCR Acceleration](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-acceleration/ocr-acceleration.md) — A containerized server that uses GPU acceleration to perform optical character recognition on PDFs and images.
- [Document Processing Accelerators](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-acceleration/document-processing-accelerators.md) — Runs document parsing models on GPU hardware inside Docker containers for fast, reproducible extraction.
- [Multi-Model Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/computer-vision-segmentation-models/ocr-model-configurations/multi-model-pipelines.md) — Routes document pages through specialized OCR sub-models for text, formula, and table recognition.

### Content Management & Publishing

- [Document Parsing Services](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/document-automation-interfaces/document-parsing-services.md) — Exposes a RESTful endpoint that accepts document uploads and returns structured JSON parsed results. ([source](https://cdn.jsdelivr.net/gh/yuliang-liu/monkeyocr@main/README.md))
- [PDF to Markdown Converters](https://awesome-repositories.com/f/content-management-publishing/pdf-to-html-converters/pdf-to-markdown-converters.md) — Converts PDFs and images into markdown while preserving spatial relationships of text, formulas, and tables.

### DevOps & Infrastructure

- [Docker Container Deployments](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/container-runtimes/runtime-configuration-interfaces/docker-socket-orchestrators/docker-target-configurators/docker-container-deployments.md) — Packages the OCR model stack into a Docker container for reproducible GPU-accelerated document parsing. ([source](https://cdn.jsdelivr.net/gh/yuliang-liu/monkeyocr@main/README.md))
- [GPU-Accelerated Containers](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/container-runtimes/runtime-configuration-interfaces/docker-socket-orchestrators/docker-target-configurators/docker-container-deployments/gpu-accelerated-containers.md) — Packages the OCR model stack into a Docker image with GPU acceleration for reproducible deployment.

### Software Engineering & Architecture

- [API Gateways](https://awesome-repositories.com/f/software-engineering-architecture/api-gateways.md) — Exposes a RESTful API gateway that accepts document uploads and returns structured JSON results.

### User Interface & Experience

- [Gradio Interfaces](https://awesome-repositories.com/f/user-interface-experience/text-editors/graphical-frontends/web-based-debugger-frontends/gradio-interfaces.md) — Provides a Gradio web interface for uploading documents and viewing parsed markdown output in real time.
- [Interactive AI Demos](https://awesome-repositories.com/f/user-interface-experience/interactive-ai-demos.md) — Ships a Gradio-based demo web interface for uploading documents and viewing parsed markdown output interactively. ([source](https://cdn.jsdelivr.net/gh/yuliang-liu/monkeyocr@main/README.md))

### Data & Databases

- [Content Extraction](https://awesome-repositories.com/f/data-databases/content-extraction.md) — Selectively extracts only text, formulas, or tables from document pages based on user-specified tasks.
- [Selective Content Extractors](https://awesome-repositories.com/f/data-databases/content-extraction/selective-content-extractors.md) — Recognizes only text, formulas, or tables from a document page based on a user-selected extraction task. ([source](https://cdn.jsdelivr.net/gh/yuliang-liu/monkeyocr@main/README.md))
