30 open-source projects similar to pdfcpu/pdfcpu, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pdfcpu alternative.
pypdf is a Python library for parsing, manipulating, and generating PDF documents. It provides high-level operations for document processing, such as merging multiple files into one or splitting a single document into smaller files. The project includes specialized tools for managing interactive elements, including the creation and modification of annotations, hyperlinks, and form fields. It also supports advanced metadata management, allowing for the extraction and modification of standard document properties and XML-based XMP metadata. Beyond basic structural changes, the library covers pa
qpdf is a collection of specialized utility tools for the structural transformation, metadata inspection, file optimization, and cryptographic management of PDF documents. It provides a command line tool for transforming and inspecting internal PDF structures, a structural transformer for reorganizing pages and merging documents, and an encryption engine for managing passwords and restrictions. The project distinguishes itself through a technical approach to document manipulation, utilizing an object-based structural representation to modify files as a graph of unique objects. It includes a m
PyMuPDF is a comprehensive PDF manipulation library and document analysis tool. It serves as a text extraction tool, OCR engine, and image converter, providing a programmatic interface to edit, merge, split, and optimize PDF and Office documents. The project distinguishes itself through high-performance capabilities, including the use of C-bindings for low-level manipulation and parallelized page processing to accelerate workloads. It provides specialized conversion paths, such as transforming PDF content into Markdown for retrieval-augmented generation and large language model pipelines. It
unioffice is a comprehensive document processing suite that provides a PDF document processor, an Open XML document library, a document security toolkit, and a document content extractor. It is designed to programmatically create, read, and modify Word, Excel, and PowerPoint files, as well as generate and edit PDF documents. The project is distinguished by its native language implementation of the Open XML standard, which removes native binary dependencies to simplify container deployments. It features advanced capabilities for digital document security, including hardware-based PDF signing,
PyPDF2 is a pure Python library for reading, writing, and manipulating PDF files. It functions as a document manipulator, text extractor, and encryption tool, allowing users to process PDF files without relying on external C libraries or native binaries. The library provides specialized tools for modifying document structures, such as merging multiple files into one, splitting documents into separate files, and transforming page layouts through cropping. It also includes capabilities for securing documents via passwords and encryption. Additional capabilities include the extraction of writte
pdfsam is a PDF manipulation software and desktop application designed for splitting, merging, rotating, and extracting pages from PDF documents. It functions as a PDF editor, converter, and security tool, providing capabilities to modify document structures and manage file formats. The project distinguishes itself through specialized processing capabilities, including an OCR document processor for extracting editable text from scanned images and PDF interleaving to alternate pages from multiple files. It also provides a security suite for encrypting documents, managing access permissions, an
Stirling-PDF is a self-hosted document processing suite designed for secure, private file management. It functions as a comprehensive transformation engine that executes complex operations—such as merging, splitting, converting, and redacting documents—directly on the host machine. The platform provides both a browser-based interface for interactive editing and a programmatic, API-first architecture that allows for the automation of document workflows through standard HTTP requests. The project distinguishes itself through its focus on private, infrastructure-agnostic deployment and granular
Pdfarranger is a PDF page organizer, document editor, image converter, and booklet generator. It provides a visual drag-and-drop interface to reorder, merge, split, and delete pages within PDF documents. The application includes specialized tools for creating booklet printing layouts and converting image files into PDF pages or exporting PDF pages as PNG and JPEG images. It allows for the modification of document metadata while preserving internal outlines and hyperlinks. The software covers a range of structural manipulations, including page rotation, resizing, cropping, and overlaying. It
PyPDF2 is a pure Python library for transforming, securing, and extracting data from PDF documents. It provides a comprehensive suite of tools to modify page layouts, manage document security, and retrieve embedded metadata without relying on external C libraries. The toolkit enables document assembly through the merging of multiple files and the splitting of documents into smaller parts. It also supports page-level transformations, including the ability to rotate pages and adjust visible crop areas. The library includes capabilities for security management via password-based encryption and
SumatraPDF is a lightweight, multi-format document viewer designed for rendering PDF, eBook, and comic book files within a unified interface. It functions as both a graphical reading environment and a command-line document processor, enabling users to automate file conversion, merging, and extraction tasks without requiring a graphical interface. The application distinguishes itself through a single-executable binary distribution that utilizes direct-to-GDI rendering and memory-mapped file access to maintain high performance and minimal memory overhead. Users can personalize their workspace b
sd is a command line text manipulation utility designed for searching and replacing text patterns across multiple files. It functions as a regex-based find and replace tool that allows for in-place file editing directly from the terminal. The project supports both regular expression replacements, including the use of capture groups for complex transformations, and fixed string replacement for literal text substitutions. It specifically handles multi-line text replacement by processing file contents as single blocks to match patterns that span across newline characters. The tool provides capa
DesktopEditors is an office suite application designed for creating and editing text documents, spreadsheets, and presentations across different operating systems. It serves as an OOXML compatible editor, ensuring that files are read and written according to Office Open XML standards for cross-platform document exchange. The suite functions as a collaborative document platform featuring real-time co-authoring, version tracking, and integrated communication tools. It also acts as an AI-powered document assistant and PDF editor, providing capabilities for content generation, automated spreadshe
Pdfcraft is a containerized service for self-managed PDF processing, editing, and conversion. It provides a toolkit for document manipulation, a multi-format converter, and OCR software to transform scanned documents into searchable and editable text. The project features a visual, node-based workflow editor that allows users to build automated pipelines by chaining together various PDF conversion and optimization operations. The service covers a broad range of capabilities, including document management for merging and splitting files, format conversion between PDFs and office documents or
pdfminer is a Python library for parsing PDF files to extract text, analyze layouts, decrypt content, and convert documents into HTML or XML formats. It functions as a text extraction engine and layout analysis tool designed to retrieve characters and words while preserving the structural organization of the original document. The project provides utilities for converting PDF content into structured HTML or XML to maintain visual layout and a decryption tool for unlocking restricted documents using encryption keys. It identifies the positions and groupings of text elements to reconstruct page
pdfminer.six is a programmatic tool for extracting text, layout information, and metadata from PDF documents into machine-readable formats. It functions as a document parser that converts internal PDF objects and structures into accessible data objects for analysis. The project includes utilities for decrypting RC4 and AES encrypted files to enable content extraction. It also provides a layout analyzer to identify fonts, colors, and text locations to determine the organizational structure of pages. The system covers a broad range of extraction capabilities, including the retrieval of embedde
This project is a community-curated directory of open-source software designed for deployment in private server environments and home labs. It serves as a comprehensive resource for discovering independent, self-hosted alternatives to mainstream cloud services, enabling users to maintain full data ownership and control over their digital infrastructure. The directory is structured through a hierarchical taxonomy that organizes a vast collection of applications into logical categories, ranging from media management and data analytics to private communication and team productivity tools. It dis
QuestPDF is a C# PDF generation library and layout engine used to create structured documents, reports, and invoices. It utilizes a fluent API and a component-based layout approach to convert code into high-fidelity PDF and XPS files. The library distinguishes itself with a dedicated layout debugger that provides real-time previews, hot-reload capabilities, and visual boundary tools to map rendered elements back to source code. It also functions as an accessibility tool, providing semantic tagging and navigational aids to ensure documents comply with international accessibility and archival s
OpenPDF is a Java library and document processor used for creating, editing, rendering, and encrypting PDF documents. It functions as a toolkit for generating new files from scratch, modifying existing document structures, and extracting text content. The project includes a dedicated engine for transforming HTML and CSS content into PDF documents by parsing markup and applying styles. It also provides a rendering engine to convert PDF pages into image formats for thumbnails and previews, alongside a security utility for protecting content via document encryption. The library supports the add
PDFPatcher is a specialized suite of PDF utility tools designed for editing navigational bookmarks, modifying document structure, managing metadata, and processing pages. It provides a toolkit for altering PDF structures and properties without changing the original content stream. The project is distinguished by its focus on bookmark management, featuring bulk editing and the ability to generate clickable bookmarks from visual tables of contents using optical character recognition. It also includes capabilities for font optimization through substitution and embedding to ensure consistent char
Remotion is a programmatic video framework that enables the creation of video content using component-based logic and standard web technologies. By leveraging a declarative animation engine, it allows developers to structure visual content as a hierarchy of reusable components, ensuring that animations and state updates remain consistent through deterministic frame execution. The framework distinguishes itself by utilizing a headless browser renderer that captures visual output frame-by-frame to generate high-quality video files. This architecture supports a cloud-native media pipeline, allow
This project is a privacy-focused VPN manager and WireGuard client application designed to establish encrypted tunnels that mask user IP addresses and activity. It focuses on maintaining anonymity through a system that supports account creation without personal identifying information. The application distinguishes itself with advanced privacy tools, including a multi-hop orchestrator for routing traffic through multiple sequential servers and a network traffic obfuscator that uses Shadowsocks, TCP, and QUIC to bypass deep packet inspection and censorship. It also implements quantum-resistant
Nextcloud is a self-hosted platform designed for private cloud storage, file synchronization, and collaborative team workspaces. It provides a comprehensive suite of tools for document editing, groupware services like calendars and contacts, and secure data management, all while ensuring users maintain full control over their infrastructure and data sovereignty. The platform distinguishes itself through a decentralized federated architecture that allows independent server instances to securely share data and collaborate across a network. It features a highly modular plugin ecosystem, enabling
A package to allow one to concurrently go through a filesystem with ease
Golang wrapper for Exiftool : extract as much metadata as possible (EXIF, ...) from files (pictures, pdf, office documents, ...)