# pdf2htmlex/pdf2htmlex

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/pdf2htmlex-pdf2htmlex).**

5,412 stars · 501 forks · HTML · other

## Links

- GitHub: https://github.com/pdf2htmlEX/pdf2htmlEX
- Homepage: https://pdf2htmlEX.github.io/pdf2htmlEX/
- awesome-repositories: https://awesome-repositories.com/repository/pdf2htmlex-pdf2htmlex.md

## Topics

`html` `pdf` `pdf-document-processor` `pdf-viewer`

## Description

pdf2htmlEX is a PDF to HTML converter that transforms documents into web pages while preserving the original layout, fonts, and formatting. It functions as a layout engine and text extractor, mapping PDF coordinate data to HTML and CSS to maintain visual fidelity.

The tool converts PDF content into searchable and selectable native HTML text by embedding original document fonts. It maintains document interactivity by preserving internal links, bookmarks, and outlines, converting them into functional web navigation.

The conversion process supports flexible output structures, allowing documents to be generated as a single file or split into separate files per page for lazy loading. Assets such as styles, fonts, and images can be stored in dedicated directories to optimize browser caching. Selective page export and high-accuracy image rendering with hidden text layers are also available to ensure compatibility with complex files.

## Tags

### Content Management & Publishing

- [PDF to HTML Converters](https://awesome-repositories.com/f/content-management-publishing/pdf-to-html-converters.md) — Transforms static PDF documents into accessible HTML pages while maintaining precise layout, fonts, and formatting. ([source](https://cdn.jsdelivr.net/gh/pdf2htmlex/pdf2htmlex@master/README.md))
- [Document Link Mapping](https://awesome-repositories.com/f/content-management-publishing/content-management-systems/content-architecture-modeling/document-models/document-sectioning/anchor-links/reference-anchors/document-link-mapping.md) — Translates PDF-specific destination offsets into HTML anchors to preserve internal document links and bookmarks.
- [PDF Structural Elements](https://awesome-repositories.com/f/content-management-publishing/documentation-knowledge-management/pdf-structural-elements.md) — Retains interactive elements such as links, outlines, bookmarks, and backgrounds during the PDF to HTML conversion process. ([source](https://cdn.jsdelivr.net/gh/pdf2htmlex/pdf2htmlex@master/README.md))
- [Web Font Re-encoding](https://awesome-repositories.com/f/content-management-publishing/embedded-font-extraction/web-font-re-encoding.md) — Extracts PDF embedded fonts and converts them into web-compatible formats to ensure consistent character rendering across browsers.
- [Coordinate-Based Layout Mapping](https://awesome-repositories.com/f/content-management-publishing/pdf-to-html-converters/pdf-to-html-converters/coordinate-based-layout-mapping.md) — Maps PDF coordinate data to HTML and CSS to maintain precise visual fidelity of the original document.
- [Searchable Text Preservation](https://awesome-repositories.com/f/content-management-publishing/searchable-text-preservation.md) — Embeds native text within the HTML output to enable selecting, copying, and searching of the document content. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Feature-List))
- [Digital Publication Renderers](https://awesome-repositories.com/f/content-management-publishing/digital-publication-renderers.md) — Provides a high-fidelity rendering engine that transforms PDF documents into web-compatible formats while preserving professional typesetting.
- [HTML Layout Configurations](https://awesome-repositories.com/f/content-management-publishing/html-layout-configurations.md) — Produces output as a single file with embedded assets, separate asset files, or individual page files. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Feature-List))
- [Output Structure Configuration](https://awesome-repositories.com/f/content-management-publishing/output-structure-configuration.md) — Allows choosing between generating the converted document as a single unified file or as separate files for each page. ([source](https://pdf2htmlEX.github.io/pdf2htmlEX/))
- [Selective Page Export](https://awesome-repositories.com/f/content-management-publishing/selective-page-export.md) — Enables the conversion of specific page ranges or individual pages instead of processing the entire document. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Quick-Start))

### Part of an Awesome List

- [Text Extractors](https://awesome-repositories.com/f/awesome-lists/media/pdf/text-extractors.md) — Retrieves written text and structural metadata from PDF layers to generate searchable native HTML text.

### Data & Databases

- [Outline Extraction](https://awesome-repositories.com/f/data-databases/document-extraction-tools/outline-extraction.md) — Converts the PDF table of contents into a structured web outline for easier navigation. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Feature-List))
- [PDF Parsers](https://awesome-repositories.com/f/data-databases/pdf-parsers.md) — Parses PDF files to extract outlines, bookmarks, and internal links for use in web-based navigation.
- [Web Asset Distribution](https://awesome-repositories.com/f/data-databases/file-asset-management/distribution-asset-managers/web-asset-distribution.md) — Stores fonts, images, and styles as separate files in a target directory to optimize browser caching. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Quick-Start))

### Graphics & Multimedia

- [Page Coordinate Mapping](https://awesome-repositories.com/f/graphics-multimedia/visualization-mapping/visualization-frameworks/coordinate-systems/page-coordinate-mapping.md) — Maps PDF text and image locations to absolute page coordinates using CSS for precise visual fidelity.
- [PDF to Image Rendering](https://awesome-repositories.com/f/graphics-multimedia/pdf-to-image-rendering.md) — Creates high-accuracy fallback versions by rendering PDF pages as images with hidden text layers. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Quick-Start))

### User Interface & Experience

- [Converted Document Interactivity](https://awesome-repositories.com/f/user-interface-experience/converted-document-interactivity.md) — Keeps functional links and text selection capabilities active within the converted web page. ([source](https://pdf2htmlEX.github.io/pdf2htmlEX/))
- [Web-Compatible Font Embedding](https://awesome-repositories.com/f/user-interface-experience/font-configurations/font-overrides/pdf-font-optimizers/web-compatible-font-embedding.md) — Re-encodes and embeds fonts into the HTML output to ensure text renders correctly across different web browsers. ([source](https://pdf2htmlEX.github.io/pdf2htmlEX/))
- [PDF and HTML Content Extraction](https://awesome-repositories.com/f/user-interface-experience/html-content-processing/pdf-and-html-content-extraction.md) — Extracts PDF text and structural elements into native HTML to ensure content is searchable, selectable, and accessible.
- [Internal Page Link Resolvers](https://awesome-repositories.com/f/user-interface-experience/links/internal-page-link-resolvers.md) — Converts internal and external PDF references into functional HTML navigation links that jump to specific pages or URLs. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Feature-List))
- [Accessible Text Layers](https://awesome-repositories.com/f/user-interface-experience/text-display-widgets/parallel-text-viewers/parallel-text-readers/accessible-text-layers.md) — Places searchable native text on a hidden layer above page images to maintain accessibility and selection capabilities.

### Artificial Intelligence & ML

- [HTML Page Decomposition](https://awesome-repositories.com/f/artificial-intelligence-ml/structured-data-extraction/asynchronous-extraction-engines/document-extraction-engines/per-page-content-separators/html-page-decomposition.md) — Splits a single PDF into separate HTML files per page to allow for lazy loading and improved browser performance.

### Business & Productivity Software

- [Page-to-File Splitting](https://awesome-repositories.com/f/business-productivity-software/file-splitting-and-joining/page-to-file-splitting.md) — Generates individual files for each page and a main index to enable lazy loading and custom organization. ([source](https://github.com/pdf2htmlEX/pdf2htmlEX/wiki/Quick-Start))

### Web Development

- [Asset Distribution Strategies](https://awesome-repositories.com/f/web-development/asset-distribution-strategies.md) — Separates styles, fonts, and images into dedicated directories to enable browser caching and reduce redundant data.
- [HTML Delivery Formats](https://awesome-repositories.com/f/web-development/html-delivery-formats.md) — Generates either a single all-in-one file or a multi-page site with on-demand loading. ([source](https://cdn.jsdelivr.net/gh/pdf2htmlex/pdf2htmlex@master/README.md))