Self-hosted applications and browser extensions for saving web articles to read in a distraction-free environment.
Omnivore is an open-source, self-hostable read-it-later application designed to centralize web articles, newsletters, and digital documents into a personal library. It functions as a comprehensive content archiver that captures web pages and stores them locally, ensuring permanent access and readability regardless of internet connectivity. The platform distinguishes itself through an event-sourced synchronization engine that maintains a consistent state across multiple devices by replaying user actions. It utilizes a headless web scraping service to extract clean text and metadata from raw web pages, providing a uniform reading experience. Users can manage their collections through a research-oriented workflow that supports highlighting passages and attaching personal notes to saved content. The application provides a full suite of content management capabilities, including offline reading, cross-device progress synchronization, and structured data persistence. It is distributed as an open-source project, allowing users to maintain full control over their personal data and reading history.
Omnivore is a comprehensive, self-hostable read-it-later platform that provides article parsing, offline reading, full-text search, and robust organization tools, making it a perfect fit for your requirements.
Wallabag is a self-hosted, open-source bookmark manager designed to archive web content for later reading. It functions as a personal knowledge management tool, allowing users to collect, store, and organize web pages into a centralized, searchable library. The platform provides a distraction-free reading experience by extracting the primary text and images from web pages while removing advertisements and navigation menus. This process ensures that saved articles remain accessible for offline reading, preserving the content even if the original source is removed from the internet. The system supports a range of organizational features, including tagging and full-text storage, to help manage large collections of research materials. It utilizes a standardized interface for external client interaction and employs asynchronous processing to handle resource-intensive tasks like content parsing and image fetching.
Wallabag is a dedicated self-hosted application that provides article parsing, offline reading, tagging, and full-text search, making it a comprehensive solution for your read-it-later needs.
This project is a self-hosted RSS feed aggregator and reader designed to collect and organize content from RSS, Atom, and JSON feeds. It functions as a privacy-focused client that blocks pixel trackers and strips URL parameters to prevent third-party tracking and referrer leakage. The system is built as a REST API feed reader, exposing its data and user accounts through a programmable interface for third-party clients. It maintains compatibility with the OPML standard for importing and exporting subscriptions and provides tools for web content extraction using readability parsers and custom rules to retrieve main body text. Broad capabilities include full-text article search via database indexing, content filtering using regular expressions, and automated workflows through webhooks. The project also includes multi-method user authentication and tools for user account administration. The application is distributed via Docker container images, cross-platform binaries, and Linux system packages.
This is a self-hosted RSS feed reader that provides article parsing and full-text search, serving as a capable alternative for managing and reading web content even though it is primarily feed-based rather than a manual bookmarking service.
ArchiveBox is a self-hosted web archiving system designed to capture and preserve permanent static copies of webpages, media, and PDFs on personal infrastructure. It functions as a digital content curator and personal web archive manager, allowing users to import URLs from bookmarks, RSS feeds, and browser history to create a centralized, searchable knowledge base. The project is distinguished by its ability to archive private, paywalled, or login-protected content using browser cookies and authenticated session persistence. It ensures long-term availability by saving pages in multiple concurrent formats, including HTML, PDF, and PNG, and can automatically mirror these local snapshots to external preservation services. The system includes capabilities for multimedia asset extraction, full-text archive indexing, and scheduled content mirroring. Users can manage their collections through a web-based interface, a command-line interface, or a remote API, with options to export the entire collection as a standalone static HTML site for offline browsing.
ArchiveBox is a self-hosted web archiving system that excels at preserving permanent copies of articles for offline access and full-text search, though it focuses more on archival preservation than the reading-optimized interface typical of read-it-later services.
Feeder is an RSS and Atom feed reader that aggregates content into a single interface. It functions as a full-text content extractor that removes website clutter to isolate the main body of articles, and a self-hosted feed synchronizer that maintains subscription lists and read statuses across devices via a private backend server. The application integrates AI services and external API keys to translate and generate concise summaries of long-form articles. It also features a text-to-speech reader that uses system engines with automatic language detection to convert written content into spoken audio. The system includes tools for content curation such as bookmarks, pinned entries, and a customizable blocklist to filter out unwanted items. It provides offline reading access by caching feed lists and full article text locally. Additional capabilities cover URL tracking parameter cleansing, subscription import and export, and reading appearance customization.
This is a self-hosted feed reader that provides core read-it-later features like full-text extraction, offline caching, and bookmarking, though it is primarily designed for RSS/Atom feeds rather than manual URL-based bookmarking.
ArchiveBox is a self-hosted archiving tool designed for personal digital preservation and research data management. It functions as an automated web preservation engine that monitors URL inputs from bookmarks, browser history, or manual entries to capture and store permanent, offline copies of web content. By utilizing headless browser automation, the system renders dynamic web pages to ensure that captured snapshots, PDFs, and media assets remain accurate and accessible even if the original source disappears. The project distinguishes itself through a modular extractor pipeline and a task-queue-based processing model, which allow it to handle long-running ingestion jobs reliably and at scale. It organizes all captured data into a predictable, file-system-based directory structure, ensuring that archives remain portable and accessible without the need for a dedicated database engine. This architecture supports the generation of static, self-contained archives that can be hosted on any standard web server. To maintain high fidelity across diverse web environments, the system includes configuration-driven dependency management that coordinates the necessary browser binaries and command-line tools. The platform provides a comprehensive suite of command-line interfaces, configuration options, and core modules to support operational management and integration. Detailed documentation is available to guide users through installation, dependency maintenance, and the security considerations of managing archived web content.
This is a powerful self-hosted web archiving tool that excels at preserving permanent offline copies of articles, though it focuses more on long-term digital preservation than the reading-focused interface typical of a traditional read-it-later service.
Readest is a comprehensive digital reading platform designed to manage, annotate, and consume electronic books across multiple devices. It functions as a versatile library manager and reading environment, supporting a wide range of user needs from standard ebook consumption to specialized study and accessibility-focused workflows. The platform distinguishes itself through advanced features like parallel text study, which enables side-by-side document rendering with synchronized scrolling, and a robust text-to-speech engine that provides hands-free reading with synchronized visual highlighting. It offers deep customization for the reading experience, including granular control over typography, page layouts, and E-ink display optimization, alongside the ability to inject custom styles for personalized visual overrides. Beyond core reading, the application includes extensive tools for library organization, such as metadata tagging and automated grouping, as well as content processing capabilities like CJK text support and web article clipping. Users can maintain a unified library state across devices through state synchronization middleware, while also securing their collections with PIN-based access controls and managing data through portable backups and annotation exports.
Readest is a self-hostable digital reading platform that includes web article clipping and full-text search, making it a capable tool for managing and reading saved content even though its primary focus is on ebook library management.
TagSpaces is an offline-first file tagging and organization platform that lets you manage local files with portable metadata stored directly in filenames or sidecar JSON files, eliminating the need for a central database. It functions as a full-text file search engine, a Kanban board file organizer, a local AI file assistant, an S3-compatible cloud file manager, and a web clipper and bookmark manager, all within a single application. The project distinguishes itself through a local-first architecture where all file operations, indexing, and AI processing run entirely on the device, with cloud storage treated as an optional remote mount point. It integrates with a locally running Ollama engine for on-device AI tasks such as automatic tagging, summarization, and image analysis, keeping all data private. A plugin-based file viewer system renders over 50 file formats, while metadata is stored in sidecar files or embedded in filenames, ensuring portability across devices and sync services. Beyond its core identity, TagSpaces offers a command-line interface for programmatic file operations and search indexing, supports S3-compatible object storage and WebDAV servers for remote file management, and provides a browser extension for capturing web pages, screenshots, and bookmarks as local files with automatic tagging. The application includes built-in viewers and editors for documents, images, audio, video, 3D models, and Markdown files, along with geo-tagging on interactive maps, Kanban board task management, and full-text search with fuzzy matching and saved queries. The application can be installed on Windows, macOS, and Linux, run in portable mode, or self-hosted as a static web app on personal servers or cloud platforms like Cloudflare Pages and AWS Amplify.
TagSpaces is a local-first file management and organization platform that functions as a self-hostable bookmarking and web-clipping tool, allowing you to archive web content as local files with tagging and full-text search capabilities.
Yuedu is an Android application designed to aggregate and manage web-based articles and reading content within a single interface. It functions as a content reader that collects information from various online sources, including RSS feeds, and organizes them for personal consumption. The application distinguishes itself through a plugin-driven architecture that utilizes custom parsing rules to extract and format unstructured web data. This modular approach allows users to define how the application interacts with diverse websites, ensuring that content is transformed into a standardized format for consistent display. To support flexible reading habits, the software includes local-first data persistence, which stores subscribed content and user metadata to enable offline access. Beyond standard reading features, the application provides tools for customizing the display of individual content feeds and managing directory details. It also supports external service interoperability by delegating text-to-speech tasks to system-level voice synthesis engines, allowing users to convert written articles into audible speech.
This is an Android-based RSS reader and content aggregator rather than a self-hostable web service for bookmarking and archiving articles, making it a different type of reading tool.
dn is a self-hosted personal web archiving system that automatically intercepts and stores web pages on a local device. It uses a proxy-based request interception model to capture browser traffic and save content for offline access without an internet connection. The system features a local full-text search engine that indexes all saved page content for information retrieval across the collection. It includes a dedicated browser interface that simulates online connectivity to serve archived files, mimicking the original live web environment. Administrative control is provided through a web-based interface for managing storage configurations and maintaining domain blacklists to filter specific websites from the archiving process.
This is a self-hosted web archiving system that provides full-text search and offline access to saved pages, serving as a robust tool for building a personal library of web content.
Folo is a centralized RSS feed aggregator designed to consolidate digital content from multiple sources into a single, unified reading interface. It utilizes a local-first data architecture, employing a relational database to store feed metadata and article content, which ensures that information remains accessible and searchable even without an active internet connection. The application distinguishes itself through an integrated intelligent content processor that leverages asynchronous pipelines to translate foreign languages and generate concise summaries of long-form articles. To maintain a consistent reading experience, it employs a modular parsing architecture that converts diverse web formats into a standardized representation, while a sandboxed browser component renders complex media, including videos and audio, directly within the interface. The system maintains data currency through a background synchronization engine that performs periodic polling of remote sources. An internal event-driven observer pattern propagates these updates across the application, ensuring that the user interface reflects the latest feed information immediately upon arrival.
Folo is a self-hostable RSS aggregator that provides offline reading and content parsing, serving as a capable alternative for managing and reading web content even though it focuses on feed-based aggregation rather than manual bookmarking.