awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Document Processing Tools · Awesome GitHub Repositories

6 repos

Awesome GitHub RepositoriesDocument Processing Tools

Tools for automating document workflows, including format conversion, data extraction, and structure parsing.

Explore 6 awesome GitHub repositories matching content management & publishing · Document Processing Tools. Refine with filters or upvote what's useful.

  1. Home
  2. Content Management & Publishing
  3. Content Processing and Transformation
  4. Document Processing and Conversion
  5. Document Processing Tools

Awesome Document Processing Tools GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • avelino/awesome-go

    avelino/awesome-go

    165,543GitHubView on GitHub↗

    This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently di

    Goawesomeawesome-listgo
  • oven-sh/bun

    oven-sh/bun

    87,491GitHubView on GitHub↗

    Bun is a high-performance runtime environment designed to execute JavaScript and TypeScript applications with minimal latency and high throughput. Built on a native core implemented in Zig, it provides a unified execution engine that leverages JavaScriptCore for efficient memory management and low-latency startup. The

    Zigbunbundlerjavascript
  • microsoft/markitdown

    microsoft/markitdown

    87,305GitHubView on GitHub↗

    This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine

    Pythonautogenautogen-extensionlangchain
  • Stirling-Tools/Stirling-PDF

    Stirling-Tools/Stirling-PDF

    74,357GitHubView on GitHub↗

    Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-

    TypeScriptdockerhacktoberfestjava
  • infiniflow/ragflow

    infiniflow/ragflow

    73,425GitHubView on GitHub↗

    This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasonin

    Pythonagentagenticagentic-ai
  • tesseract-ocr/tesseract

    tesseract-ocr/tesseract

    72,460GitHubView on GitHub↗

    Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d

    C++hacktoberfestlstmmachine-learning

Explore sub-tags

  • Document Automation Interfaces5 sub-tagsProgrammatic interfaces and pipelines designed for integrating document processing tasks into larger software workflows.
  • Format Conversion Toolkits5 sub-tagsUtilities for programmatic transformation between diverse file formats, including office suites, Markdown, and PDF.
  • Intelligent Extraction Frameworks6 sub-tagsSystems utilizing machine learning and spatial analysis to interpret document structure and extract data from complex layouts.
Markup and Structure Parsers2 sub-tagsTools specifically for manipulating the internal structure, DOM, or lightweight markup syntax of documents.
  • PDF Manipulation Utilities3 sub-tagsTools for merging, splitting, and restructuring PDF document pages.
  • PDF Processing Engines3 sub-tagsServer-side environments and orchestrators designed for high-volume, automated PDF transformation and pipeline management.