1 repo
Utilities that transform diverse file formats into standardized, machine-readable outputs for automated data pipelines.
Explore 1 awesome GitHub repository matching development tools & productivity · Document Conversion Toolkits. Refine with filters or upvote what's useful.
Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing