# mwilliamson/mammoth.js

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/mwilliamson-mammoth-js).**

6,101 stars · 647 forks · JavaScript · bsd-2-clause

## Links

- GitHub: https://github.com/mwilliamson/mammoth.js
- awesome-repositories: https://awesome-repositories.com/repository/mwilliamson-mammoth-js.md

## Tags

### Content Management & Publishing

- [DOCX to HTML Converters](https://awesome-repositories.com/f/content-management-publishing/pdf-to-html-converters/pdf-to-html-converters/docx-to-html-converters.md) — Reads .docx files and produces clean, semantic HTML by mapping document styles to HTML elements. ([source](https://github.com/mwilliamson/mammoth.js#readme))
- [Document Data Extraction](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing/data-extraction-analysis/document-data-extraction.md) — Extracts plain text from Word documents with paragraph separation for further processing or analysis.
- [PDF to HTML Converters](https://awesome-repositories.com/f/content-management-publishing/pdf-to-html-converters/pdf-to-html-converters.md) — Transforms Word documents into clean HTML by mapping semantic styles rather than replicating visual formatting. ([source](https://github.com/mwilliamson/mammoth.js#readme))
- [Document Text Extractors](https://awesome-repositories.com/f/content-management-publishing/plain-text-persistence/document-text-extractors.md) — Extracts plain text from DOCX files by stripping all formatting and returning content with paragraph separation.
- [Extractors](https://awesome-repositories.com/f/content-management-publishing/plain-text-persistence/extractors.md) — Strips all formatting from DOCX files and outputs only plain text with paragraph separators.
- [Document Conversion](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-conversion.md) — Applies user-defined functions to the document's internal representation to modify paragraphs or runs before generating HTML. ([source](https://github.com/mwilliamson/mammoth.js#readme))
- [Pre-Conversion Hooks](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-conversion/pre-conversion-hooks.md) — Applies user-defined functions to modify paragraphs and runs in the document model before HTML generation.
- [HTML Document Transformation](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/markup-and-structure-parsers/html-document-transformation.md) — Applies user-defined functions to modify paragraphs and runs in a .docx file's internal structure before HTML generation.
- [Document Transformation Pipelines](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-transformation-pipelines.md) — Applies custom transformations to the internal document structure before generating the final HTML output.

### Part of an Awesome List

- [Document Text Extractors](https://awesome-repositories.com/f/awesome-lists/data/document-parsing-and-extraction/document-text-extractors.md) — Strips all formatting and returns only the plain text content of a .docx file with paragraph separation. ([source](https://github.com/mwilliamson/mammoth.js#readme))
- [Document Processing](https://awesome-repositories.com/f/awesome-lists/devtools/document-processing.md) — Conversion of Word documents into clean HTML.

### Development Tools & Productivity

- [Base64 Asset Embedding](https://awesome-repositories.com/f/development-tools-productivity/asset-pipelines/base64-asset-embedding.md) — Embeds images as base64 data URIs directly into HTML output for self-contained documents.

### User Interface & Experience

- [Style-to-Element Mappings](https://awesome-repositories.com/f/user-interface-experience/buttons/custom-html-elements/style-to-element-mappings.md) — Lets users define rules that convert named paragraph or run styles into specified HTML tags with optional CSS classes. ([source](https://github.com/mwilliamson/mammoth.js#readme))
- [Data URI Embeddings](https://awesome-repositories.com/f/user-interface-experience/image-embeddings/data-uri-embeddings.md) — Includes images from DOCX files as inline data URIs in the HTML output for self-contained documents.

### Web Development

- [Document Style Mappings](https://awesome-repositories.com/f/web-development/modular-architectures/css-style-modules/class-mapping/document-style-mappings.md) — Lets users define custom rules mapping named DOCX paragraph and run styles to specified HTML tags and CSS classes.
- [Document Style Mappings](https://awesome-repositories.com/f/web-development/rendering-templating/server-side-rendering-utilities/style-extraction-utilities/dom-based-style-mapping/document-style-mappings.md) — Maps document styles to HTML elements using a configurable style map that defines transformation rules.

### Graphics & Multimedia

- [Data URI Handlers](https://awesome-repositories.com/f/graphics-multimedia/data-uri-handlers.md) — Embeds image data directly into HTML output as base64-encoded data URIs for self-contained documents. ([source](https://github.com/mwilliamson/mammoth.js#readme))
