DOCX to HTML Converters - Reads .docx files and produces clean, semantic HTML by mapping document styles to HTML elements.
Document Text Extractors - Strips all formatting and returns only the plain text content of a .docx file with paragraph separation.
Document Data Extraction - Extracts plain text from Word documents with paragraph separation for further processing or analysis.
PDF to HTML Converters - Transforms Word documents into clean HTML by mapping semantic styles rather than replicating visual formatting.
Document Text Extractors - Extracts plain text from DOCX files by stripping all formatting and returning content with paragraph separation.
Extractors - Strips all formatting from DOCX files and outputs only plain text with paragraph separators.
Base64 Asset Embedding - Embeds images as base64 data URIs directly into HTML output for self-contained documents.
Style-to-Element Mappings - Lets users define rules that convert named paragraph or run styles into specified HTML tags with optional CSS classes.
Document Style Mappings - Lets users define custom rules mapping named DOCX paragraph and run styles to specified HTML tags and CSS classes.
Document Style Mappings - Maps document styles to HTML elements using a configurable style map that defines transformation rules.
Document Conversion - Applies user-defined functions to the document's internal representation to modify paragraphs or runs before generating HTML.
Pre-Conversion Hooks - Applies user-defined functions to modify paragraphs and runs in the document model before HTML generation.
HTML Document Transformation - Applies user-defined functions to modify paragraphs and runs in a .docx file's internal structure before HTML generation.
Document Transformation Pipelines - Applies custom transformations to the internal document structure before generating the final HTML output.
Data URI Handlers - Embeds image data directly into HTML output as base64-encoded data URIs for self-contained documents.
Data URI Embeddings - Includes images from DOCX files as inline data URIs in the HTML output for self-contained documents.