1 repo
Tools for segmenting and preparing large documents for downstream processing engines.
Distinguishing note: Focuses on structural document manipulation for engine compatibility rather than general text parsing.
Explore 1 awesome GitHub repository matching software engineering & architecture · Document Processing Utilities. Refine with filters or upvote what's useful.
Marker is a comprehensive document processing platform designed to automate the conversion, extraction, and structuring of data from complex files. It functions as an orchestration engine that chains modular processing steps into versioned, reusable pipelines, allowing organizations to standardize document handling and automate repetitive business tasks at scale. The platform distinguishes itself through its support for secure, private infrastructure deployment, enabling users to run containerized services within their own environments to maintain strict data privacy. It features specialized
The platform divides large files into smaller page ranges to ensure that every individual request stays within the maximum page count limit allowed by the engine.