pypdf is a Python library for parsing, manipulating, and generating PDF documents. It provides high-level operations for document processing, such as merging multiple files into one or splitting a single document into smaller files.
The project includes specialized tools for managing interactive elements, including the creation and modification of annotations, hyperlinks, and form fields. It also supports advanced metadata management, allowing for the extraction and modification of standard document properties and XML-based XMP metadata.
Beyond basic structural changes, the library covers page management through rotation, cropping, and scaling, as well as text and image extraction with layout-preserving options. It provides security utilities for document encryption and decryption, and optimization tools to reduce file size by removing images or applying lossless compression.