30 open-source projects similar to phpoffice/phpword, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best PHPWord alternative.
python-docx is an OOXML document manipulation library used for creating, reading, and updating Microsoft Word files. It functions as a generator for building formatted documents and a parser for extracting text, metadata, and structural elements from existing files. The project provides a comprehensive style management system for defining and applying character, paragraph, and table styles within OpenXML documents. It allows for the programmatic control of document appearance through an object-oriented approach to the underlying XML schema. Capabilities cover a wide range of document generat
PyMuPDF is a comprehensive PDF manipulation library and document analysis tool. It serves as a text extraction tool, OCR engine, and image converter, providing a programmatic interface to edit, merge, split, and optimize PDF and Office documents. The project distinguishes itself through high-performance capabilities, including the use of C-bindings for low-level manipulation and parallelized page processing to accelerate workloads. It provides specialized conversion paths, such as transforming PDF content into Markdown for retrieval-augmented generation and large language model pipelines. It
docx is a JavaScript and TypeScript library for the programmatic generation and manipulation of Word documents. It serves as an OOXML document generator, allowing developers to create formatted office files through code instead of manual editing. The library enables document automation across both Node.js and web browser environments. It supports client-side document export, allowing users to generate and download files directly in the browser without a backend server. Capabilities include the ability to define page layouts, margins, and orientation. Users can programmatically insert documen
unioffice is a comprehensive document processing suite that provides a PDF document processor, an Open XML document library, a document security toolkit, and a document content extractor. It is designed to programmatically create, read, and modify Word, Excel, and PowerPoint files, as well as generate and edit PDF documents. The project is distinguished by its native language implementation of the Open XML standard, which removes native binary dependencies to simplify container deployments. It features advanced capabilities for digital document security, including hardware-based PDF signing,
This repository contains the HTML specification, which defines the core standards for web page structuring, content organization, and document rendering. It establishes the fundamental algorithms for state-machine-based tokenization, tree construction for the document object model, and origin-based security isolation. The specification provides a framework for defining custom elements with independent lifecycles and registries. It also details the requirements for cross-document communication, session history management, and the synchronization of interface properties with content attributes.
OfficeCLI is a headless office suite and automation tool designed for programmatically reading, editing, and generating Microsoft Office documents. It functions as an OOXML manipulation library and a document templating engine, providing a standalone binary that allows for the management of Word, Excel, and PowerPoint files without requiring a local installation of office software. The project distinguishes itself by exposing document operations as tools for AI agents via a JSON-RPC server and the Model Context Protocol. It enables advanced customization through raw XML manipulation using XPa
SoftwareCopyright-Skill is a software copyright application generator and documentation tool designed to automate the creation of registration materials and operation manuals. It analyzes local project source code to produce formatted documents and source code extracts required for official software copyright filings. The project synthesizes software operation manuals by translating project business logic and technical features into descriptive functional text. It utilizes template-based generation to inject extracted project data into standardized Word documents, facilitating the creation of
PyPDF2 is a pure Python library for reading, writing, and manipulating PDF files. It functions as a document manipulator, text extractor, and encryption tool, allowing users to process PDF files without relying on external C libraries or native binaries. The library provides specialized tools for modifying document structures, such as merging multiple files into one, splitting documents into separate files, and transforming page layouts through cropping. It also includes capabilities for securing documents via passwords and encryption. Additional capabilities include the extraction of writte
This project is a static site generator and documentation builder designed to transform markdown files into styled HTML documents. It functions as a programmatic conversion engine, allowing developers to integrate markdown processing and layout rendering directly into automated build scripts and content workflows. The tool distinguishes itself through a pipeline-oriented architecture that supports custom plugin integration and metadata-driven template injection. Users can define global or per-file metadata to dynamically populate page content, while the system’s template-based generation allo
pypdf is a Python library for parsing, manipulating, and generating PDF documents. It provides high-level operations for document processing, such as merging multiple files into one or splitting a single document into smaller files. The project includes specialized tools for managing interactive elements, including the creation and modification of annotations, hyperlinks, and form fields. It also supports advanced metadata management, allowing for the extraction and modification of standard document properties and XML-based XMP metadata. Beyond basic structural changes, the library covers pa
This project is an open-source office suite that provides a collection of productivity applications for word processing, spreadsheets, presentations, and databases. It implements the Open Document Format standard to ensure interoperability between different office productivity tools. The suite includes a specialized formula equation editor for mathematical formatting, a vector graphics editor for scalable images, and a relational database manager for organizing structured data. The project covers broad functional domains including text document generation, spreadsheet management, presentatio
python-docx-template is a template engine for generating Microsoft Word documents by merging .docx files with data contexts using a logic-based markup syntax. It functions as a document automator that injects variables, images, and sub-documents into Word files while maintaining the original styling. The project uses a rendering system based on Jinja2 to apply template logic and filters to Office Open XML files. It allows for the creation of custom template filters to transform data during the rendering phase and includes a command line interface for producing documents by passing a template
TinyXML-2 is a lightweight C++ library for parsing, manipulating, and generating XML documents. It functions as a UTF-8 XML processor that represents data through a hierarchical Document Object Model. The library provides tools for both DOM parsing and direct document generation via data streams. It includes capabilities for navigating the XML tree to locate specific elements, modifying attributes and content, and resolving character entities and Unicode numeric references into UTF-8 text. The processor includes syntax validation and diagnostic utilities that track line-number metadata for e
pdfminer.six is a programmatic tool for extracting text, layout information, and metadata from PDF documents into machine-readable formats. It functions as a document parser that converts internal PDF objects and structures into accessible data objects for analysis. The project includes utilities for decrypting RC4 and AES encrypted files to enable content extraction. It also provides a layout analyzer to identify fonts, colors, and text locations to determine the organizational structure of pages. The system covers a broad range of extraction capabilities, including the retrieval of embedde
NPOI is a pure .NET library for reading and writing Microsoft Office files in both legacy binary (.xls) and modern OpenXML (.xlsx, .docx) formats, operating entirely without requiring Microsoft Office or COM interop. It runs on Windows and Linux under .NET Standard and .NET Framework runtimes, using only managed code to parse and generate Office documents. The library provides comprehensive spreadsheet capabilities, including creating, editing, and reading Excel workbooks in both .xls and .xlsx formats, with support for cell formatting, styles, and formulas. It includes a streaming row-by-row
Kraken is a cross-platform UI framework and web standards runtime designed to build native applications using standard web markup and styling. It utilizes a Flutter-based rendering engine to process HTML and CSS, producing visually consistent user interfaces across mobile and desktop platforms. The system distinguishes itself by compiling the runtime to machine code and employing a synchronous rasterization pipeline to ensure animations and scrolling match the fluidity of native applications. It further integrates high-performance native components directly into a web-standard document object
This project is an Open XML document library and generator used to programmatically create and modify Microsoft Office files. It enables the production of Word, Excel, and PowerPoint documents by manipulating the underlying Open XML structure. The library provides capabilities for Open XML document processing, including the automated modification of existing files and the generation of formatted office reports and spreadsheets for server-side production.
This project is a technical educational guide focused on browser architecture and the internal processes used to render web pages. It provides a detailed breakdown of the web request lifecycle, from the initial networking phase to the final visual output on a screen. The guide covers specific technical sequences including the DNS resolution process across browser, operating system, and ISP caches, and the establishment of secure connections through the TLS handshake. It also details the communication flow between clients and servers using the HTTP protocol and server-side request handling. T
pdfmake is a JavaScript PDF generation library and declarative document engine that transforms structured JavaScript objects into formatted PDF files. It functions as a layout engine capable of producing documents on both the client side within a web browser and on the server side using Node.js. The library utilizes a declarative approach to translate object-based document definitions into final PDFs. It distinguishes itself through a virtual layout engine that calculates element positions and page breaks and an inheritance-based style system that uses dictionaries to maintain visual consiste
This project is a formal markdown specification standard that provides a detailed markup syntax definition and a definitive set of rules for parsing plain text into consistent HTML output. It establishes a standardized grammar for structural blocks and inline elements to ensure uniform rendering across different software implementations. The specification is supported by a parser conformance suite and a reference implementation in C and JavaScript to verify that implementations adhere to the standard. It includes a system for implementation verification that compares transformed input strings
This toolkit provides a suite of tools, templates, and guidelines for preparing software copyright registration documents required by Chinese authorities. It automates the creation of necessary legal filings and technical documentation to facilitate software copyright registration within the Chinese regulatory system. The system includes a code metric calculator to extract quantitative data and line counts from source files for application forms. It also features a legal compliance checklist and verification utilities to ensure submission materials adhere to official formatting, pagination, a
Prawn is a Ruby library and document layout tool used for the programmatic generation of PDF files. It functions as a vector graphics engine that allows for the creation of portable documents containing formatted text, custom shapes, and organized page layouts. The library differentiates itself through a coordinate-based vector rendering system that supports multi-stop gradient fills, complex polygons, and layer-based blending. It provides a comprehensive typography system capable of embedding TrueType and OpenType fonts to support UTF-8 characters and right-to-left text for multilingual publ
Career-ops is an AI-driven job search automation system designed to manage the entire application lifecycle, from discovery to tracking. It functions as a career copilot that utilizes autonomous agents to identify vacancies, evaluate professional fit, and generate tailored application materials. The project distinguishes itself through a multi-archetype persona management system and writing style calibration, allowing users to maintain different professional identities and a consistent voice across documents. It employs a multi-dimensional weighted scoring system to evaluate job suitability a
Asciidoctor is a Ruby-based text processing engine and command-line toolchain designed to convert AsciiDoc content into structured publishing formats, including HTML5 and DocBook 5. It functions as a static content publishing toolchain that transforms raw source files into formatted documents. The system utilizes a pluggable converter interface and template-driven output, allowing the default conversion logic to be overridden via custom converters or templates. This enables the generation of specific document structures and the export of content into various publishing formats for diverse dis
thuthesis is a LaTeX thesis template and dissertation layout engine designed to format academic dissertations according to university and national standards. It serves as a university document standard for various academic disciplines, professional fields, and engineering domains. The project includes an academic bibliography manager that implements national formatting standards for references, patents, and author-year styles. It also provides a mathematical typography toolkit to manage math fonts, symbols, and equation numbering using unicode-math. The system covers academic thesis formatti
tbls is a Go-based command line utility used for documenting, analyzing, and linting relational database schemas. It functions as a documentation tool that generates structured reports and entity-relationship diagrams in Markdown, JSON, or Excel formats, as well as a schema diff tool for identifying discrepancies between a live database and its documentation. The project allows for schema augmentation and the definition of virtual relationships through external configuration files, enabling metadata overrides and table connections without requiring database migrations or native constraints. I
This project is a command-line utility designed to automate the creation of formatted project documentation. It functions as a markdown generator that produces structured files by combining interactive user prompts with metadata extracted from package and git files. The tool uses a template-based generation system, allowing the application of custom layout files to ensure consistent structural organization across different software projects. It automates the collection of project details to populate documentation values and suggest defaults. The system covers operational workflows for projec
This project is a plugin-based WYSIWYG document layout engine and rich text editor that uses Canvas and SVG for rendering. It functions as a collaborative editor utilizing conflict-free replicated data types to enable real-time synchronization across multiple users. The system serves as an interactive form builder, allowing for the embedding of input controls such as checkboxes and date pickers directly into documents. It is designed for high-fidelity output, ensuring the visual representation during editing matches the final format for PDF and image exports. The editor covers broad capabili
InvenTree is an open-source inventory management platform built on Django, designed for tracking parts, stock levels, and supply chain operations through a web interface and REST API. The system uses barcodes—including QR codes, 1D barcodes, and Data Matrix codes—as primary identifiers for scanning, linking, and triggering inventory actions, and extends core functionality through a Python plugin framework supporting custom actions, UI panels, barcode handlers, and scheduled tasks. The platform distinguishes itself through a comprehensive plugin-based extensibility system that allows custom in
FriendsDontLetFriends is a scientific data visualization guide and framework designed to help users create accurate plots while avoiding common data representation mistakes. It provides a collection of scripts and guidelines for selecting distribution plots, color scales, and layouts that accurately represent complex experimental data. The project distinguishes itself through specialized toolkits for revealing hidden patterns in large datasets. It includes systems for heatmap optimization via dimension reordering and outlier management, as well as spatial layout algorithms to improve the inte