awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Document Processing Tools · Awesome GitHub Repositories

3 repos

Awesome GitHub RepositoriesDocument Processing Tools

Focuses on the parsing, conversion, and structural extraction of static files and documents rather than live web or telemetry streams.

Explore 3 awesome GitHub repositories matching data & databases · Document Processing Tools. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Engineering and Infrastructure
  4. Data Extraction & Ingestion
  5. Document Processing Tools

Awesome Document Processing Tools GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • papers-we-love/papers-we-love

    papers-we-love/papers-we-love

    103,417GitHubView on GitHub↗

    Papers We Love is a community-driven repository and learning network dedicated to the study and discussion of foundational computer science literature. It functions as a centralized educational archive, providing a structured environment where software professionals can engage with academic research to bridge the gap b

    Shellawesomecomputer-sciencemeetup
  • microsoft/markitdown

    microsoft/markitdown

    87,305GitHubView on GitHub↗

    This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine

    Pythonautogenautogen-extensionlangchain
  • infiniflow/ragflow

    infiniflow/ragflow

    73,425GitHubView on GitHub↗

    This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin

    Pythonagentagenticagentic-ai

Explore sub-tags

  • Academic Paper DownloadersAutomated scripts designed to parse documentation and retrieve external academic papers or research materials.
  • Automated Document IngestionAutomated mechanisms for uploading and transforming diverse file formats into structured text for processing pipelines.
  • LLM-Powered ParsersExtraction frameworks that leverage language models to interpret and parse complex document content.