# ucas-haoranwei/got-ocr2.0

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/ucas-haoranwei-got-ocr2-0).**

8,141 stars · 703 forks · Python

## Links

- GitHub: https://github.com/Ucas-HaoranWei/GOT-OCR2.0
- awesome-repositories: https://awesome-repositories.com/repository/ucas-haoranwei-got-ocr2-0.md

## Description

GOT-OCR2.0 is an end-to-end optical character recognition system and document text extractor. It utilizes a unified transformer architecture to recognize and extract plain and formatted text from diverse images and documents.

The system features a multi-crop processing method that divides high-resolution or dense documents into smaller sections to maintain recognition detail. It also includes a renderer that transforms recognized text into HTML to preserve the original structure and layout of the document.

The project provides a framework for fine-tuning pre-trained models on custom datasets for specialized domains. It further includes utilities for model performance evaluation and benchmarking using multi-GPU acceleration.

## Tags

### Artificial Intelligence & ML

- [End-to-End Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/end-to-end-architectures.md) — Utilizes a unified transformer architecture to process images directly into structured text sequences without multi-stage pipelines.
- [High-Resolution Document OCR](https://awesome-repositories.com/f/artificial-intelligence-ml/high-resolution-document-ocr.md) — Capturing high-detail text from large or dense documents by processing the image in smaller cropped sections.
- [Image Tiling](https://awesome-repositories.com/f/artificial-intelligence-ml/tiled-processing/image-tiling.md) — Divides high-resolution images into smaller overlapping tiles to maintain pixel density for fine-grained text recognition.
- [Visual-Textual Alignments](https://awesome-repositories.com/f/artificial-intelligence-ml/cross-modal-representations/visual-textual-alignments.md) — Maps image features and text tokens into a shared latent space to correlate visual structure with linguistic meaning.
- [GPU-Accelerated Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-accelerated-inference.md) — Implements multi-GPU acceleration to increase throughput during large-scale document processing.
- [Vision Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/fine-tuning-and-alignment/fine-tuning-frameworks/vision-model-fine-tuning.md) — Provides a framework for adjusting pre-trained vision transformers using specialized datasets to improve domain-specific vocabulary recognition.
- [OCR Model Fine-Tuners](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/ocr-model-fine-tuners.md) — Provides procedures for retraining OCR models on custom datasets to improve recognition accuracy for specialized domains.

### Part of an Awesome List

- [Document Text Extractors](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/document-text-extractors.md) — Captures plain and formatted text from diverse document types and complex image layouts.
- [Multi-Crop Processing](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/ocr-document-parsers/multi-crop-processing.md) — Divides large or dense documents into smaller sections to maintain high recognition detail.
- [Text Extraction and OCR](https://awesome-repositories.com/f/awesome-lists/more/text-extraction-and-ocr.md) — Extracts plain and formatted text from diverse images and documents using a unified deep learning model.
- [Optical Character Recognitions](https://awesome-repositories.com/f/awesome-lists/more/text-extraction-and-ocr/optical-character-recognitions.md) — Recognizes plain and formatted text from diverse document types using a unified deep learning model. ([source](https://github.com/ucas-haoranwei/got-ocr2.0#readme))
- [Data Processing](https://awesome-repositories.com/f/awesome-lists/data/data-processing.md) — Optical character recognition model for document understanding.
- [Data Processing Tools](https://awesome-repositories.com/f/awesome-lists/data/data-processing-tools.md) — OCR model for document understanding and text extraction.

### Graphics & Multimedia

- [Layout Recovery](https://awesome-repositories.com/f/graphics-multimedia/format-preservation/document-format-preservations/layout-recovery.md) — Converts scanned images of complex documents into HTML outputs that preserve the original layout and formatting.
- [Multi-Crop Processing](https://awesome-repositories.com/f/graphics-multimedia/optical-character-recognition/multi-crop-processing.md) — Captures high-detail text across large or dense documents by dividing complex images into smaller sections. ([source](https://github.com/ucas-haoranwei/got-ocr2.0#readme))

### User Interface & Experience

- [HTML Document Renderers](https://awesome-repositories.com/f/user-interface-experience/html-document-renderers.md) — Transforms recognized text and layout coordinates into structured HTML to preserve the original document formatting. ([source](https://github.com/ucas-haoranwei/got-ocr2.0#readme))
