MonkeyOCR is a GPU-accelerated document parsing server that converts PDFs and images into structured markdown while preserving the spatial layout of text, formulas, and tables. It provides both an interactive Gradio web interface for uploading files and viewing parsed output in real time, and a RESTful HTTP API endpoint that accepts document uploads and returns structured JSON results for programmatic consumption.
The system routes document pages through specialized OCR sub-models for text, formula, and table recognition based on the selected extraction task, and packages the entire model stack into a Docker container for reproducible GPU-accelerated deployment. Users can selectively extract only text, formulas, or tables from a document page, or convert the full page into markdown that retains the spatial relationships among all content elements.
The project offers a demo web interface for interactive use and a Docker deployment option for production environments, both leveraging GPU hardware for fast document processing.