30 open-source projects similar to librepdf/openpdf, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best OpenPDF alternative.
PyMuPDF is a comprehensive PDF manipulation library and document analysis tool. It serves as a text extraction tool, OCR engine, and image converter, providing a programmatic interface to edit, merge, split, and optimize PDF and Office documents. The project distinguishes itself through high-performance capabilities, including the use of C-bindings for low-level manipulation and parallelized page processing to accelerate workloads. It provides specialized conversion paths, such as transforming PDF content into Markdown for retrieval-augmented generation and large language model pipelines. It
unioffice is a comprehensive document processing suite that provides a PDF document processor, an Open XML document library, a document security toolkit, and a document content extractor. It is designed to programmatically create, read, and modify Word, Excel, and PowerPoint files, as well as generate and edit PDF documents. The project is distinguished by its native language implementation of the Open XML standard, which removes native binary dependencies to simplify container deployments. It features advanced capabilities for digital document security, including hardware-based PDF signing,
mPDF is a PHP library that transforms UTF-8 encoded HTML and CSS into formatted PDF documents. It serves as a PDF generation engine and document architect capable of converting web pages and HTML forms into professional files. The project is distinguished by its multilingual rendering capabilities, providing comprehensive support for bidirectional text, right-to-left scripts, and CJK languages using Unicode font embedding and OpenType layout processing. It further enables professional print-ready design through advanced color modeling in CMYK, precise page dimensioning, and compliance with PD
pypdf is a Python library for parsing, manipulating, and generating PDF documents. It provides high-level operations for document processing, such as merging multiple files into one or splitting a single document into smaller files. The project includes specialized tools for managing interactive elements, including the creation and modification of annotations, hyperlinks, and form fields. It also supports advanced metadata management, allowing for the extraction and modification of standard document properties and XML-based XMP metadata. Beyond basic structural changes, the library covers pa
PyPDF2 is a pure Python library for transforming, securing, and extracting data from PDF documents. It provides a comprehensive suite of tools to modify page layouts, manage document security, and retrieve embedded metadata without relying on external C libraries. The toolkit enables document assembly through the merging of multiple files and the splitting of documents into smaller parts. It also supports page-level transformations, including the ability to rotate pages and adjust visible crop areas. The library includes capabilities for security management via password-based encryption and
pdf-lib is a JavaScript PDF manipulation library used for creating, modifying, and editing PDF documents programmatically. It functions as a cross-runtime tool compatible with Node, Browser, Deno, and mobile JavaScript environments. The library provides a programmatic interface for document editing and form generation. It supports building interactive PDF forms, populating existing fields with custom data, and flattening forms into static content. Its broader capabilities include generating new documents from scratch, rearranging or copying pages between files, and managing document metadata
This is a Go-based PDF library used for the programmatic generation of PDF documents without relying on external C dependencies. It functions as a document generator, layout engine, security tool, and vector graphics engine for creating files containing text, images, and geometric shapes. The project distinguishes itself through a cell-based layout engine that manages automatic text wrapping, page breaks, and structured positioning. It provides specialized capabilities for vector graphics, including the rendering of Bézier curves and polygons, as well as a security toolkit for applying passwo
TCPDF is a PHP library for the programmatic generation of PDF documents, allowing developers to create files containing text, graphics, and forms directly from application code. It includes a typography engine that supports Unicode strings, bidirectional scripts, and the embedding of TrueType or OpenType fonts. The library provides specialized tools for producing documents that adhere to PDF/A, PDF/X, and PDF/UA standards for archiving, printing, and accessibility. It features a digital signing system that applies CMS signatures and RFC 3161 timestamps, supporting long-term validation through
PyPDF2 is a pure Python library for reading, writing, and manipulating PDF files. It functions as a document manipulator, text extractor, and encryption tool, allowing users to process PDF files without relying on external C libraries or native binaries. The library provides specialized tools for modifying document structures, such as merging multiple files into one, splitting documents into separate files, and transforming page layouts through cropping. It also includes capabilities for securing documents via passwords and encryption. Additional capabilities include the extraction of writte
Pdfcraft is a containerized service for self-managed PDF processing, editing, and conversion. It provides a toolkit for document manipulation, a multi-format converter, and OCR software to transform scanned documents into searchable and editable text. The project features a visual, node-based workflow editor that allows users to build automated pipelines by chaining together various PDF conversion and optimization operations. The service covers a broad range of capabilities, including document management for merging and splitting files, format conversion between PDFs and office documents or
pdfsam is a PDF manipulation software and desktop application designed for splitting, merging, rotating, and extracting pages from PDF documents. It functions as a PDF editor, converter, and security tool, providing capabilities to modify document structures and manage file formats. The project distinguishes itself through specialized processing capabilities, including an OCR document processor for extracting editable text from scanned images and PDF interleaving to alternate pages from multiple files. It also provides a security suite for encrypting documents, managing access permissions, an
PDF-Guru is an AI-powered document processor and study material converter designed to transform textbooks, research papers, and multimedia content into structured flashcards for spaced repetition systems like Anki. It functions as a content pipeline that uses language models to extract key concepts and facts from unstructured documents to generate question-and-answer pairs, cloze deletions, and multiple-choice cards. The system distinguishes itself through a comprehensive PDF management suite and multi-format parsing. It provides advanced document utilities including optical character recogni
Snappy is a PHP library that acts as a wrapper for the wkhtmltopdf and wkhtmltoimage binaries. It provides a utility suite for transforming HTML strings and remote URLs into PDF documents and image snapshots. The project enables the generation of downloadable or streamable PDFs and the creation of web page thumbnails. It supports the aggregation of multiple separate URLs into a single unified PDF document. Users can adjust document output through configuration settings for JavaScript execution, cookie handling, background removal, and custom stylesheet application.
Gutenberg is a CSS print formatting framework and web-to-PDF layout engine. It provides a system for managing stylesheets, element visibility, and structural layout rules to ensure web content maintains a consistent visual structure when converted to physical documents or PDF files. The project manages print-specific rendering through a stylesheet manager that filters out non-essential interface elements and prevents the automatic expansion of URLs. It includes utilities to enforce the rendering of background colors and images, which browsers typically strip during the printing process. The
QuestPDF is a C# PDF generation library and layout engine used to create structured documents, reports, and invoices. It utilizes a fluent API and a component-based layout approach to convert code into high-fidelity PDF and XPS files. The library distinguishes itself with a dedicated layout debugger that provides real-time previews, hot-reload capabilities, and visual boundary tools to map rendered elements back to source code. It also functions as an accessibility tool, providing semantic tagging and navigational aids to ensure documents comply with international accessibility and archival s
Embed PDF Viewer is a browser-based PDF rendering library that uses a WebAssembly port of the PDFium engine to display documents entirely on the client side, with no server-side processing required. It provides a framework-agnostic core engine layer that manages the PDF document lifecycle, memory allocation, and WebAssembly resource cleanup, with dedicated integration hooks for React and Vue 3 that handle initialization, document loading, and reactive state management. The library offers both a pre-built, embeddable viewer that can be inserted into any web page with a single initialization ca
This project provides a markdown penetration testing report template and a set of utilities for converting security documentation into formatted PDF reports. It serves as a security certification documentation template designed to meet professional reporting standards for certifications such as the OSCP. The toolkit includes a markdown to PDF report generator that uses static CSS styling for code block highlighting and a specialized exam submission packaging tool. This packaging utility compresses report files into archives and generates integrity hashes to ensure data consistency during subm
pdfme is a schema-based PDF generation engine and dynamic document builder. It provides a system for producing PDF documents by merging predefined templates with dynamic input data across different runtime environments. The project includes a browser-based WYSIWYG PDF editor and template designer, allowing for the arrangement of elements via a drag-and-drop interface. It distinguishes itself through a plugin-based architecture that enables schema extensions and custom rendering logic for new content types. The capability surface covers dynamic content generation, including variable placehold
Prawn is a Ruby library and document layout tool used for the programmatic generation of PDF files. It functions as a vector graphics engine that allows for the creation of portable documents containing formatted text, custom shapes, and organized page layouts. The library differentiates itself through a coordinate-based vector rendering system that supports multi-stop gradient fills, complex polygons, and layer-based blending. It provides a comprehensive typography system capable of embedding TrueType and OpenType fonts to support UTF-8 characters and right-to-left text for multilingual publ
docetl is an AI-powered document ETL tool and map-reduce orchestrator designed to transform large collections of unstructured documents into structured, queryable tables using language models. It provides a declarative pipeline framework for extracting, cleaning, and transforming data from sources such as PDFs and text files into predefined schemas. The project distinguishes itself through a semantic data integration suite that enables joining datasets and resolving duplicate entities based on embedding-based similarity. It includes an interactive prompt playground for developing and optimizi
DesktopEditors is an office suite application designed for creating and editing text documents, spreadsheets, and presentations across different operating systems. It serves as an OOXML compatible editor, ensuring that files are read and written according to Office Open XML standards for cross-platform document exchange. The suite functions as a collaborative document platform featuring real-time co-authoring, version tracking, and integrated communication tools. It also acts as an AI-powered document assistant and PDF editor, providing capabilities for content generation, automated spreadshe
Reader is an iOS PDF rendering engine and user interface component designed to display PDF documents on Apple platforms. It functions as a document viewer that enables users to navigate pages and interact with embedded content. The framework manages encrypted files by requesting and verifying security credentials to unlock protected document streams. It utilizes tiled layer rendering and multi-threaded processing to maintain scrolling performance and responsiveness. The system covers document navigation via gestures and thumbnails, as well as the resolution of internal and external links. It
pdf2htmlEX is a PDF to HTML converter that transforms documents into web pages while preserving the original layout, fonts, and formatting. It functions as a layout engine and text extractor, mapping PDF coordinate data to HTML and CSS to maintain visual fidelity. The tool converts PDF content into searchable and selectable native HTML text by embedding original document fonts. It maintains document interactivity by preserving internal links, bookmarks, and outlines, converting them into functional web navigation. The conversion process supports flexible output structures, allowing documents
Kreuzberg is a document extraction engine that converts PDFs, Office files, images, and over 90 other formats into clean, structured text and metadata. It is built around a compiled Rust core that can be used as a native library, a command-line tool, a REST API server, or a WebAssembly module for browser-based processing. The system is designed to run entirely on self-hosted infrastructure, with no data leaving the user's environment. What distinguishes Kreuzberg is its breadth of integration surfaces and its pipeline architecture. It exposes extraction capabilities through native bindings fo
pdfminer is a Python library for parsing PDF files to extract text, analyze layouts, decrypt content, and convert documents into HTML or XML formats. It functions as a text extraction engine and layout analysis tool designed to retrieve characters and words while preserving the structural organization of the original document. The project provides utilities for converting PDF content into structured HTML or XML to maintain visual layout and a decryption tool for unlocking restricted documents using encryption keys. It identifies the positions and groupings of text elements to reconstruct page
pdfcpu is a Go PDF processing library and command-line interface designed for programmatically manipulating, optimizing, and validating PDF files. It provides a toolkit for document content modification and structural management. The project distinguishes itself as an optimization tool and layout engine, capable of reducing file sizes and improving loading speeds by streamlining internal structures. It also functions as a security manager, providing password-based encryption, decryption, and digital signature verification. Its capability surface includes page management for merging, splittin
wkhtmltopdf is a command-line utility that renders web pages into PDF documents or image files. It functions as a headless browser engine, utilizing the Qt WebKit rendering environment to process HTML, CSS, and JavaScript into visual representations suitable for server-side tasks. The tool distinguishes itself by translating standard web styling rules into physical page dimensions and layout constraints, allowing for the creation of structured documents from web-based source files. It supports the generation of automated tables of contents and provides granular control over document layout, i
qpdf is a collection of specialized utility tools for the structural transformation, metadata inspection, file optimization, and cryptographic management of PDF documents. It provides a command line tool for transforming and inspecting internal PDF structures, a structural transformer for reorganizing pages and merging documents, and an encryption engine for managing passwords and restrictions. The project distinguishes itself through a technical approach to document manipulation, utilizing an object-based structural representation to modify files as a graph of unique objects. It includes a m
Pdfarranger is a PDF page organizer, document editor, image converter, and booklet generator. It provides a visual drag-and-drop interface to reorder, merge, split, and delete pages within PDF documents. The application includes specialized tools for creating booklet printing layouts and converting image files into PDF pages or exporting PDF pages as PNG and JPEG images. It allows for the modification of document metadata while preserving internal outlines and hyperlinks. The software covers a range of structural manipulations, including page rotation, resizing, cropping, and overlaying. It
Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-based interface for interactive editing and a programmatic, API-first architecture that allows for the automation of document workflows through standard HTTP requests. The project distinguishes itself through its focus on private, infrastructure-agnostic deployment and granular