2 repos

Awesome GitHub RepositoriesStructured Document Extraction

Processes that convert visual document layouts into machine-readable formats like JSON or Markdown.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Structured Document Extraction. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Transforms visual document layouts into structured, machine-readable formats like JSON or Markdown while correcting for perspective and artifacts.
Pythonai4sciencechineseocrdocument-parsing
opendatalab/MinerU
opendatalab/MinerU
54,523GitHubView on GitHub
MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w
Generates visual overlays that highlight detected text segments and reading order to verify parsing accuracy.
Pythonai4sciencedocument-analysisextract-data

Explore sub-tags

Visual Debugging UtilitiesTools that generate visual overlays to verify the accuracy of automated document parsing and text detection.

2 repos

Awesome GitHub RepositoriesStructured Document Extraction

Processes that convert visual document layouts into machine-readable formats like JSON or Markdown.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Structured Document Extraction. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

PaddlePaddle/PaddleOCR
PaddlePaddle/PaddleOCR
70,931GitHubView on GitHub
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Transforms visual document layouts into structured, machine-readable formats like JSON or Markdown while correcting for perspective and artifacts.
Pythonai4sciencechineseocrdocument-parsing
opendatalab/MinerU
opendatalab/MinerU
54,523GitHubView on GitHub
MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w
Generates visual overlays that highlight detected text segments and reading order to verify parsing accuracy.
Pythonai4sciencedocument-analysisextract-data

Explore sub-tags

Visual Debugging UtilitiesTools that generate visual overlays to verify the accuracy of automated document parsing and text detection.

Awesome Structured Document Extraction GitHub Repositories

PaddlePaddle/PaddleOCR

opendatalab/MinerU

Explore sub-tags

Awesome Structured Document Extraction GitHub Repositories

PaddlePaddle/PaddleOCR

opendatalab/MinerU

Explore sub-tags