Pix2Text

Features

Document to Markdown Converters - Transforms images and PDFs into formatted Markdown documents by extracting layout, tables, and formulas.

Image-to-LaTeX Converters - Automatically transcribes visual mathematical notation from images into structured LaTeX code.

OCR Pipelines - Implements a hybrid OCR pipeline that separates plain text and mathematical formulas in a single pass.

Multilingual Glyph Mappings - Uses extended language packs to map visual glyphs from over 80 different languages to digital text.

Multilingual OCR Systems - Implements an OCR system capable of processing diverse character sets for over 80 global languages.

Document Layout Analysis - Analyzes page layouts to distinguish between text blocks, tables, and mathematical formulas before recognition.

Optical Character Recognition - Extracts printed text from images across over 80 different global languages into digital format.

Multilingual Text Recognition - Recognizes and converts printed characters from over 80 different languages into digital text.

Optical Character Recognitions - Uses optical character recognition to convert images containing plain text into digital strings.

Formula Extractors - Isolates mathematical notation from document images for conversion into digital LaTeX expressions.

Formula Recognition Engines - Translates images containing mathematical formulas into standardized LaTeX code.

OCR Document Conversion - Provides a command line interface to convert images and PDFs into structured Markdown via OCR.

GPU-Accelerated Inference - Utilizes GPU hardware acceleration to increase the inference speed of image and document processing models.

Model Serving APIs - Wraps the recognition engine in a local web server to provide OCR capabilities via a REST API.

Image-to-Markdown Table Generators - Recognizes tabular data within images and converts it into formatted Markdown tables.

Vision-Based Document Parsers - Uses multimodal vision language models to interpret document layouts and structural organization.

PDF to Markdown Conversion - Transforms PDF documents into structured Markdown files while preserving original text and table structures.

Document Reconstruction Serializers - Transforms recognized layout, tables, and formulas into a structured Markdown format for document reconstruction.

OCR Integration APIs - Provides a local HTTP service and API for programmatically processing images to extract text and formulas.

Tabular Data Extraction - Detects tables within documents and extracts content while preserving the original tabular structure.

OCR Web Services - Provides a local HTTP web API for programmatic image processing and OCR result retrieval.

Service Hosting - Exposes the OCR engine as a local HTTP service for programmatic image processing via API.

Pix2Text is an optical character recognition system and document conversion tool designed to transform images and PDFs into Markdown. It functions as a multilingual OCR engine supporting over 80 languages, a LaTeX formula recognizer for mathematical notations, and a parser integrated with vision language models.

The project utilizes a hybrid pipeline to separate plain text from mathematical formulas and tabular structures within a single pass. It converts recognized formulas into LaTeX expressions and transforms detected tables and layouts into structured Markdown formatting.

The system includes a command line interface for document conversion and a local HTTP web API for programmatic image processing. It supports GPU acceleration to increase model inference speed.

Features