ArchiveBox is a self-hosted web archiving system designed to capture and preserve permanent static copies of webpages, media, and PDFs on personal infrastructure. It functions as a digital content curator and personal web archive manager, allowing users to import URLs from bookmarks, RSS feeds, and browser history to create a centralized, searchable knowledge base.
The project is distinguished by its ability to archive private, paywalled, or login-protected content using browser cookies and authenticated session persistence. It ensures long-term availability by saving pages in multiple concurrent formats, including HTML, PDF, and PNG, and can automatically mirror these local snapshots to external preservation services.
The system includes capabilities for multimedia asset extraction, full-text archive indexing, and scheduled content mirroring. Users can manage their collections through a web-based interface, a command-line interface, or a remote API, with options to export the entire collection as a standalone static HTML site for offline browsing.