# coolwanglu/pdf2htmlex

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/coolwanglu-pdf2htmlex).**

10,603 stars · 1,844 forks · HTML · NOASSERTION · archived

## Links

- GitHub: https://github.com/coolwanglu/pdf2htmlEX
- Homepage: http://coolwanglu.github.com/pdf2htmlEX/
- awesome-repositories: https://awesome-repositories.com/repository/coolwanglu-pdf2htmlex.md

## Description

pdf2htmlEX is a tool that converts PDF documents into HTML while preserving the original text, fonts, and layout. It uses CSS positioning and font embedding to replicate the PDF's appearance in a browser, producing output that works without JavaScript. The tool can generate a single self-contained HTML file with all resources embedded, or split the document into separate HTML files per page for individual loading and navigation.

The converter offers extensive control over the output, including the ability to embed fonts directly into the HTML using base64-encoded Data URIs, or keep them as separate files for caching. It supports page range selection, output location configuration, and image fallback rendering when vector conversion fails. The tool also provides options for custom CSS overrides, template customization, and resource embedding control to balance file size against HTTP requests.

Additional capabilities include font metadata inspection, duplicate font optimization, and font size precision maintenance. The output can preserve hyperlinks, bookmarks, and print functionality from the original PDF, and supports vertical writing mode for certain text layouts. For deployment, the tool can be run in a Docker container and supports HTTP compression and mobile optimization.

## Tags

### Content Management & Publishing

- [PDF to HTML Converters](https://awesome-repositories.com/f/content-management-publishing/pdf-to-html-converters/pdf-to-html-converters.md) — Renders PDF files as single HTML documents, preserving text, fonts, layout, and embedded elements. ([source](https://cdn.jsdelivr.net/gh/coolwanglu/pdf2htmlex@master/README.md))
- [Self-Contained Document Exports](https://awesome-repositories.com/f/content-management-publishing/content-formats-exporting/export-formats/html-exports/self-contained-document-exports.md) — Embeds all resources like fonts and images into a single HTML file for portable distribution.
- [Dimension Adjustments](https://awesome-repositories.com/f/content-management-publishing/dimension-adjustments.md) — Sets zoom factor or maximum page dimensions to scale rendered HTML output. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Command-Line-Options))
- [Page Layout Adjustments](https://awesome-repositories.com/f/content-management-publishing/page-layout-adjustments.md) — Adjusts page scaling to accommodate differences between PDF and HTML formats. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Unfeatures))

### Part of an Awesome List

- [Self-Contained HTML Reports](https://awesome-repositories.com/f/awesome-lists/data/report-generation/financial-report-generators/web-report-servers/self-contained-html-reports.md) — Generates single-file HTML outputs with embedded fonts and images for portable distribution.
- [PDF Layout Preservers](https://awesome-repositories.com/f/awesome-lists/media/pdf/text-extraction/font-and-position-metadata/pdf-layout-preservers.md) — Maintains exact font sizes, styles, and text positioning from PDF in the HTML output.
- [Document Output Appearance Configurations](https://awesome-repositories.com/f/awesome-lists/security/logging-and-configuration/log-appearance-configurations/document-output-appearance-configurations.md) — Adjusts the visual style of the generated HTML through configuration options for fonts, colors, and layout. ([source](http://coolwanglu.github.io/pdf2htmlEX/doc/tb108wang.html))

### Business & Productivity Software

- [Page-to-File Splitting](https://awesome-repositories.com/f/business-productivity-software/file-splitting-and-joining/page-to-file-splitting.md) — Splits PDFs into separate HTML files per page for individual loading and navigation.
- [Semantic HTML Structures](https://awesome-repositories.com/f/business-productivity-software/recipe-repositories/semantic-html-structures.md) — Converts PDF documents into structured HTML that preserves text, fonts, and layout for web display. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Related-Projects))
- [PDF Navigational Bookmarks](https://awesome-repositories.com/f/business-productivity-software/bookmark-managers/pdf-navigational-bookmarks.md) — Retains hyperlinks and table-of-contents outlines from PDFs for navigation in HTML. ([source](https://cdn.jsdelivr.net/gh/coolwanglu/pdf2htmlex@master/README.md))

### Graphics & Multimedia

- [PDF Interactive Elements](https://awesome-repositories.com/f/graphics-multimedia/pdf-interactive-elements.md) — Preserves hyperlinks, bookmarks, and print functionality from PDFs in HTML output. ([source](https://cdn.jsdelivr.net/gh/coolwanglu/pdf2htmlex@master/README.md))

### User Interface & Experience

- [Data URI Embeddings](https://awesome-repositories.com/f/user-interface-experience/image-embeddings/data-uri-embeddings.md) — Embeds fonts and images as data URIs in HTML for self-contained documents. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Feature-List))
- [Font Data URI Embedders](https://awesome-repositories.com/f/user-interface-experience/image-embeddings/data-uri-embeddings/font-data-uri-embedders.md) — Embeds fonts as base64 Data URIs to preserve original typography without external dependencies.
- [PDF Layout Replicators](https://awesome-repositories.com/f/user-interface-experience/layout-management/component-layout-containers/css-driven-layout-containers/pdf-layout-replicators.md) — Provides CSS-based layout replication to preserve exact PDF page geometry in HTML output.
- [PDF Font Embedding](https://awesome-repositories.com/f/user-interface-experience/styling-theming-systems/typography-and-iconography/typography/font-libraries/custom-font-registrars/pdf-font-embedding.md) — Embeds fonts from PDF files directly into HTML output to maintain original typography.
- [CSS Styling](https://awesome-repositories.com/f/user-interface-experience/css-styling.md) — Allows overriding default styles with custom CSS to control the visual appearance of converted output. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Feature-List))
- [Conversion Output Template Customizers](https://awesome-repositories.com/f/user-interface-experience/layout-utilities/presentation-engines/template-engines/server-side-rendering-engines/html-template-renderers/conversion-output-template-customizers.md) — Modifies the HTML, CSS, and JavaScript templates that control how the PDF content is rendered and styled in the browser. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Customizing-Output))
- [Conversion Output CSS Overrides](https://awesome-repositories.com/f/user-interface-experience/third-party-client-styling/internal-css-overrides/conversion-output-css-overrides.md) — Includes custom CSS after the default styles to override the visual appearance of elements extracted from the PDF. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Customizing-Output))
- [Output Zoom Setters](https://awesome-repositories.com/f/user-interface-experience/viewport-navigation-controls/zoom-animations/exact-zoom-level-setting/output-zoom-setters.md) — Sets a zoom factor to scale rendered page dimensions in the output HTML. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Quick-Start))

### Web Development

- [PDF Layout Preservation Patterns](https://awesome-repositories.com/f/web-development/css-layout-patterns/pdf-layout-preservation-patterns.md) — Uses CSS positioning and font embedding to replicate PDF layout and text flow in the browser.
- [JavaScript-Free Web Interfaces](https://awesome-repositories.com/f/web-development/javascript-free-web-interfaces.md) — Generates fully functional HTML output that displays correctly without JavaScript, relying solely on CSS.
- [PDF Rendering Configurators](https://awesome-repositories.com/f/web-development/renderer-output-customizers/renderer-output-customizers/pdf-rendering-configurators.md) — Configures zoom, page range, and output width to control PDF rendering in HTML. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Quick-Start))
- [Conversion Resource Embedding Controls](https://awesome-repositories.com/f/web-development/web-element-embedding/conversion-resource-embedding-controls.md) — Chooses which elements like CSS, fonts, images, JavaScript, and outlines to embed in the HTML or keep as separate files. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Command-Line-Options))

### Artificial Intelligence & ML

- [HTML Page Decomposition](https://awesome-repositories.com/f/artificial-intelligence-ml/structured-data-extraction/asynchronous-extraction-engines/document-extraction-engines/per-page-content-separators/html-page-decomposition.md) — Splits PDF into separate HTML files per page for lazy loading and dynamic navigation via AJAX.

### Data & Databases

- [Document Page Loaders](https://awesome-repositories.com/f/data-databases/on-load-data-fetchers/on-demand-subset-loading/document-page-loaders.md) — Loads individual pages on scroll to reduce initial download size and wait time. ([source](https://cdn.jsdelivr.net/gh/coolwanglu/pdf2htmlex@master/README.md))

### Development Tools & Productivity

- [PDF Interactive Feature Preservers](https://awesome-repositories.com/f/development-tools-productivity/documentation-navigation/document-structure-navigators/pdf-interactive-feature-preservers.md) — Preserves hyperlinks, bookmarks, and page structure from the original PDF for interactive browsing.

### Programming Languages & Runtimes

- [PDF-to-HTML Performance Optimizations](https://awesome-repositories.com/f/programming-languages-runtimes/performance-optimization-libraries/document-performance-optimizations/pdf-to-html-performance-optimizations.md) — Optimizes PDF-to-HTML output for performance, caching, and mobile rendering.

### Software Engineering & Architecture

- [Batch Document Processing](https://awesome-repositories.com/f/software-engineering-architecture/batch-document-processing.md) — Converts multiple PDF pages or documents with configurable output settings and resource management.
- [Resource Separation](https://awesome-repositories.com/f/software-engineering-architecture/resource-separation.md) — Stores fonts, images, CSS, and JavaScript in external files for browser caching. ([source](https://github.com/coolwanglu/pdf2htmlEX/wiki/Quick-Start))
- [Browser Cache Optimizers](https://awesome-repositories.com/f/software-engineering-architecture/resource-separation/browser-cache-optimizers.md) — Outputs CSS, fonts, and images as separate files for independent browser caching and reduced HTTP requests.
