awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Document Processing Pipelines · Awesome GitHub Repositories

3 repos

Awesome GitHub RepositoriesDocument Processing Pipelines

Workflows that ingest, parse, and normalize diverse file formats into standardized content for downstream integration.

Explore 3 awesome GitHub repositories matching data & databases · Document Processing Pipelines. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Processing Pipelines
  4. Document and LLM Preparation
  5. Document Processing Pipelines

Awesome Document Processing Pipelines GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • zylon-ai/private-gpt

    zylon-ai/private-gpt

    57,116GitHubView on GitHub↗

    This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to prov

    Python
  • opendatalab/MinerU

    opendatalab/MinerU

    54,523GitHubView on GitHub↗

    MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w

    Pythonai4sciencedocument-analysisextract-data
  • docling-project/docling

    docling-project/docling

    53,584GitHubView on GitHub↗

    Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing

    Pythonaiconvertdocument-parser