# breezedeus/pix2text

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/breezedeus-pix2text).**

3,012 stars · 261 forks · Jupyter Notebook · mit

## Links

- GitHub: https://github.com/breezedeus/Pix2Text
- Homepage: https://p2t.breezedeus.com
- awesome-repositories: https://awesome-repositories.com/repository/breezedeus-pix2text.md

## Topics

`image-to-markdown` `latex` `latex-pdf` `layout-analysis` `math-formula` `math-formula-recognition` `math-ocr` `mathpix` `ocr` `python` `pytorch` `table-ocr`

## Description

Pix2Text is an optical character recognition system and document conversion tool designed to transform images and PDFs into Markdown. It functions as a multilingual OCR engine supporting over 80 languages, a LaTeX formula recognizer for mathematical notations, and a parser integrated with vision language models.

The project utilizes a hybrid pipeline to separate plain text from mathematical formulas and tabular structures within a single pass. It converts recognized formulas into LaTeX expressions and transforms detected tables and layouts into structured Markdown formatting.

The system includes a command line interface for document conversion and a local HTTP web API for programmatic image processing. It supports GPU acceleration to increase model inference speed.

## Tags

### Content Management & Publishing

- [Document to Markdown Converters](https://awesome-repositories.com/f/content-management-publishing/document-to-markdown-converters.md) — Transforms images and PDFs into formatted Markdown documents by extracting layout, tables, and formulas. ([source](https://cdn.jsdelivr.net/gh/breezedeus/pix2text@main/README.md))
- [Vision-Based Document Parsers](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/document-automation-interfaces/plugin-based-document-parsers/vision-based-document-parsers.md) — Uses multimodal vision language models to interpret document layouts and structural organization.
- [PDF to Markdown Conversion](https://awesome-repositories.com/f/content-management-publishing/pdf-to-markdown-conversion.md) — Transforms PDF documents into structured Markdown files while preserving original text and table structures. ([source](https://pix2text.readthedocs.io/zh-cn/stable/usage/))

### Scientific & Mathematical Computing

- [Image-to-LaTeX Converters](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/mathematical-typesetting-engines/mathematical-typesetting/latex-math-rendering/image-to-latex-converters.md) — Automatically transcribes visual mathematical notation from images into structured LaTeX code.
- [Formula Extractors](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/mathematical-typesetting-engines/mathematical-typesetting/formula-typesetters/formula-extractors.md) — Isolates mathematical notation from document images for conversion into digital LaTeX expressions.
- [Formula Recognition Engines](https://awesome-repositories.com/f/scientific-mathematical-computing/numerical-mathematical-foundations/mathematical-typesetting-engines/mathematical-typesetting/latex-math-rendering/formula-recognition-engines.md) — Translates images containing mathematical formulas into standardized LaTeX code. ([source](https://pix2text.readthedocs.io/zh-cn/stable/usage/))

### Artificial Intelligence & ML

- [OCR Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/inference-engines/onnx-runtime-inference/ocr-pipelines.md) — Implements a hybrid OCR pipeline that separates plain text and mathematical formulas in a single pass.
- [Multilingual Glyph Mappings](https://awesome-repositories.com/f/artificial-intelligence-ml/multilingual-glyph-mappings.md) — Uses extended language packs to map visual glyphs from over 80 different languages to digital text.
- [Multilingual OCR Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/multilingual-ocr-systems.md) — Implements an OCR system capable of processing diverse character sets for over 80 global languages.
- [Document Layout Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/document-layout-analysis.md) — Analyzes page layouts to distinguish between text blocks, tables, and mathematical formulas before recognition.
- [Optical Character Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/optical-character-recognition.md) — Extracts printed text from images across over 80 different global languages into digital format.
- [Multilingual Text Recognition](https://awesome-repositories.com/f/artificial-intelligence-ml/optical-character-recognition/multilingual-text-recognition.md) — Recognizes and converts printed characters from over 80 different languages into digital text. ([source](https://pix2text.readthedocs.io/zh-cn/stable/models/))
- [GPU-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference.md) — Utilizes GPU hardware acceleration to increase the inference speed of image and document processing models.
- [Model Serving APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving-apis.md) — Wraps the recognition engine in a local web server to provide OCR capabilities via a REST API.

### Part of an Awesome List

- [Optical Character Recognitions](https://awesome-repositories.com/f/awesome-lists/more/text-extraction-and-ocr/optical-character-recognitions.md) — Uses optical character recognition to convert images containing plain text into digital strings. ([source](https://pix2text.readthedocs.io/zh-cn/stable/examples_en/))
- [Image-to-Markdown Table Generators](https://awesome-repositories.com/f/awesome-lists/devtools/linting-and-formatting/markdown-formatting/markdown-table-generators/image-to-markdown-table-generators.md) — Recognizes tabular data within images and converts it into formatted Markdown tables. ([source](https://cdn.jsdelivr.net/gh/breezedeus/pix2text@main/README.md))

### Web Development

- [OCR Document Conversion](https://awesome-repositories.com/f/web-development/document-conversion-apis/ocr-document-conversion.md) — Provides a command line interface to convert images and PDFs into structured Markdown via OCR. ([source](https://pix2text.readthedocs.io/zh-cn/stable/command/))
- [OCR Web Services](https://awesome-repositories.com/f/web-development/document-conversion-apis/ocr-document-conversion/ocr-web-services.md) — Provides a local HTTP web API for programmatic image processing and OCR result retrieval.
- [Service Hosting](https://awesome-repositories.com/f/web-development/service-hosting.md) — Exposes the OCR engine as a local HTTP service for programmatic image processing via API. ([source](https://pix2text.readthedocs.io/))

### Data & Databases

- [Document Reconstruction Serializers](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-serialization/json-serializers/markdown-serializers/document-reconstruction-serializers.md) — Transforms recognized layout, tables, and formulas into a structured Markdown format for document reconstruction.

### Development Tools & Productivity

- [OCR Integration APIs](https://awesome-repositories.com/f/development-tools-productivity/api-development-sdks/software-development-kits/ocr-integration-apis.md) — Provides a local HTTP service and API for programmatically processing images to extract text and formulas.

### User Interface & Experience

- [Tabular Data Extraction](https://awesome-repositories.com/f/user-interface-experience/html-content-processing/pdf-and-html-content-extraction/tabular-data-extraction.md) — Detects tables within documents and extracts content while preserving the original tabular structure. ([source](https://pix2text.readthedocs.io/zh-cn/stable/models/))
