pdf2htmlEX is a tool that converts PDF documents into HTML while preserving the original text, fonts, and layout. It uses CSS positioning and font embedding to replicate the PDF's appearance in a browser, producing output that works without JavaScript. The tool can generate a single self-contained HTML file with all resources embedded, or split the document into separate HTML files per page for individual loading and navigation.
The converter offers extensive control over the output, including the ability to embed fonts directly into the HTML using base64-encoded Data URIs, or keep them as separate files for caching. It supports page range selection, output location configuration, and image fallback rendering when vector conversion fails. The tool also provides options for custom CSS overrides, template customization, and resource embedding control to balance file size against HTTP requests.
Additional capabilities include font metadata inspection, duplicate font optimization, and font size precision maintenance. The output can preserve hyperlinks, bookmarks, and print functionality from the original PDF, and supports vertical writing mode for certain text layouts. For deployment, the tool can be run in a Docker container and supports HTTP compression and mobile optimization.