Paperless is a self-hosted document management system designed to digitize, index, and archive paper documents. It functions as an optical character recognition system that converts scanned images and PDFs into a searchable digital library, providing a web-based interface for querying and retrieving documents from a database.
The system features an automated file ingestion pipeline that monitors specific directories and email inboxes to process and import documents without manual uploading. To maintain a private archive, it includes on-disk encryption for sensitive files and the ability to organize physical storage using metadata-driven filename templates.
The platform covers broad capabilities for document processing, including image cleaning to remove speckles and correct skewing for better text recognition. It also provides tools for exporting archived documents to local directories for external backups and allows for user interface customization via custom styles and scripts.
The application is packaged as a containerized deployment to ensure consistent installation across different environments.