dots.ocr is a suite of software utilities for document layout analysis, multilingual optical character recognition, and scene text digitization. It functions as an engine for extracting digital text and structured layout data from images and PDFs across various human scripts.
The project includes a specialized transformer for converting charts, diagrams, and chemical formulas from raster images into scalable vector graphics. It also provides a pipeline to transform extracted text and structural layout from documents and web screenshots into formatted Markdown files.
The system covers capabilities for identifying bounding boxes and categories of layout elements to produce structured JSON representations. It further includes tools for scene text detection within natural images and an evaluation framework for measuring text and table extraction accuracy against ground truth data.