1 repo
Systems for structuring and managing file-based data collections and metadata.
Distinguishing note: Focuses on the directory-level organization of archived assets and metadata for local retrieval.
Explore 1 awesome GitHub repository matching content management & publishing · Data Organization Tools. Refine with filters or upvote what's useful.
ArchiveBox is a self-hosted archiving tool designed for personal digital preservation and research data management. It functions as an automated web preservation engine that monitors URL inputs from bookmarks, browser history, or manual entries to capture and store permanent, offline copies of web content. By utilizing headless browser automation, the system renders dynamic web pages to ensure that captured snapshots, PDFs, and media assets remain accurate and accessible even if the original source disappears. The project distinguishes itself through a modular extractor pipeline and a task-qu
Structure saved data into organized directories that keep metadata, snapshots, and raw assets clearly separated to simplify file retrieval and local management of your archived collections.