MonkeyOCR | Awesome Repository

MonkeyOCR is a GPU-accelerated document parsing server that converts PDFs and images into structured markdown while preserving the spatial layout of text, formulas, and tables. It provides both an interactive Gradio web interface for uploading files and viewing parsed output in real time, and a RESTful HTTP API endpoint that accepts document uploads and returns structured JSON results for programmatic consumption.

The system routes document pages through specialized OCR sub-models for text, formula, and table recognition based on the selected extraction task, and packages the entire model stack into a Docker container for reproducible GPU-accelerated deployment. Users can selectively extract only text, formulas, or tables from a document page, or convert the full page into markdown that retains the spatial relationships among all content elements.

The project offers a demo web interface for interactive use and a Docker deployment option for production environments, both leveraging GPU hardware for fast document processing.

Features

OCR Acceleration - A containerized server that uses GPU acceleration to perform optical character recognition on PDFs and images.
Document Processing Accelerators - Runs document parsing models on GPU hardware inside Docker containers for fast, reproducible extraction.
Multi-Model Pipelines - Routes document pages through specialized OCR sub-models for text, formula, and table recognition.
Document Parsing Services - Exposes a RESTful endpoint that accepts document uploads and returns structured JSON parsed results.

Features

OCR Acceleration - A containerized server that uses GPU acceleration to perform optical character recognition on PDFs and images.
Document Processing Accelerators - Runs document parsing models on GPU hardware inside Docker containers for fast, reproducible extraction.
Multi-Model Pipelines - Routes document pages through specialized OCR sub-models for text, formula, and table recognition.
Document Parsing Services - Exposes a RESTful endpoint that accepts document uploads and returns structured JSON parsed results.

The project offers a demo web interface for interactive use and a Docker deployment option for production environments, both leveraging GPU hardware for fast document processing.