Pix2Text is an optical character recognition system and document conversion tool designed to transform images and PDFs into Markdown. It functions as a multilingual OCR engine supporting over 80 languages, a LaTeX formula recognizer for mathematical notations, and a parser integrated with vision language models.
The project utilizes a hybrid pipeline to separate plain text from mathematical formulas and tabular structures within a single pass. It converts recognized formulas into LaTeX expressions and transforms detected tables and layouts into structured Markdown formatting.
The system includes a command line interface for document conversion and a local HTTP web API for programmatic image processing. It supports GPU acceleration to increase model inference speed.