1 repo
Utilities for capturing and maintaining long-term accessible versions of web content.
Distinguishing note: Specifically targets the creation of multiple preservation-grade formats like PDFs and screenshots for long-term access.
Explore 1 awesome GitHub repository matching content management & publishing · Digital Preservation Tools. Refine with filters or upvote what's useful.
ArchiveBox is a self-hosted archiving tool designed for personal digital preservation and research data management. It functions as an automated web preservation engine that monitors URL inputs from bookmarks, browser history, or manual entries to capture and store permanent, offline copies of web content. By utilizing headless browser automation, the system renders dynamic web pages to ensure that captured snapshots, PDFs, and media assets remain accurate and accessible even if the original source disappears. The project distinguishes itself through a modular extractor pipeline and a task-qu
Create multiple versions of web pages including screenshots, PDFs, and media files to ensure that content remains readable and accessible for long-term digital preservation and reference.