awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Extraction and Analysis · Awesome GitHub Repositories

3 repos

Awesome GitHub RepositoriesData Extraction and Analysis

Tools that convert unstructured visual or binary document content into structured, machine-readable data formats.

Explore 3 awesome GitHub repositories matching content management & publishing · Data Extraction and Analysis. Refine with filters or upvote what's useful.

  1. Home
  2. Content Management & Publishing
  3. Content Processing and Transformation
  4. Document Processing and Conversion
  5. Document Processing
  6. Data Extraction and Analysis

Awesome Data Extraction and Analysis GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • opendatalab/MinerU

    opendatalab/MinerU

    54,523GitHubView on GitHub↗

    MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w

    Pythonai4sciencedocument-analysisextract-data
  • docling-project/docling

    docling-project/docling

    53,584GitHubView on GitHub↗

    Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing

    Pythonaiconvertdocument-parser
  • mozilla/pdf.js

    mozilla/pdf.js

    52,848GitHubView on GitHub↗

    This project is a portable document rendering engine designed to parse and display complex document layouts directly within standard web browser environments. It functions as a web-native viewer that enables the presentation of documents without requiring external software or browser plugins. The engine utilizes a can

    JavaScript

Explore sub-tags

  • Automated Data ExtractionTools that convert scanned or digital documents into structured data formats for large-scale analysis.
  • Document Data ExtractionUtilities that extract text and visual data from documents locally within a browser environment.
  • Document Layout AnalyzersTools that utilize computer vision and text processing to map spatial relationships within document layouts.
Layout Reconstruction Algorithms
Algorithms that apply geometric heuristics and spatial analysis to reassemble fragmented text blocks into coherent document structures.