awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Structured Document Extraction · Awesome GitHub Repositories

2 repos

Awesome GitHub RepositoriesStructured Document Extraction

Processes that convert visual document layouts into machine-readable formats like JSON or Markdown.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Structured Document Extraction. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Natural Language Processing
  4. Structured Document Extraction

Awesome Structured Document Extraction GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • PaddlePaddle/PaddleOCR

    PaddlePaddle/PaddleOCR

    70,931GitHubView on GitHub↗

    PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen

    Transforms visual document layouts into structured, machine-readable formats like JSON or Markdown while correcting for perspective and artifacts.

    Pythonai4sciencechineseocrdocument-parsing
  • opendatalab/MinerU

    opendatalab/MinerU

    54,523GitHubView on GitHub↗

    MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w

    Generates visual overlays that highlight detected text segments and reading order to verify parsing accuracy.

    Pythonai4sciencedocument-analysisextract-data

Explore sub-tags

  • Visual Debugging UtilitiesTools that generate visual overlays to verify the accuracy of automated document parsing and text detection.