1 repo
Tools for dividing documents into logical sections based on content schemas.
Distinguishing note: Focuses on structural segmentation rather than general text splitting.
Explore 1 awesome GitHub repository matching content management & publishing · Document Segmentation. Refine with filters or upvote what's useful.
Marker is a comprehensive document processing platform designed to automate the conversion, extraction, and structuring of data from complex files. It functions as an orchestration engine that chains modular processing steps into versioned, reusable pipelines, allowing organizations to standardize document handling and automate repetitive business tasks at scale. The platform distinguishes itself through its support for secure, private infrastructure deployment, enabling users to run containerized services within their own environments to maintain strict data privacy. It features specialized
Divides long or batch documents into logical sections by defining a schema that identifies specific parts.