The visitor wants a self-hostable application for bookmarking, archiving, and reading web articles offline.

omnivore-app/omnivore is the closest match — Omnivore is a comprehensive, self-hostable read-it-later platform that provides article parsing, offline reading, full-text search, and robust organization tools, making it a perfect fit for your requirements.. Other strong matches: wallabag/wallabag, miniflux/v2, pirate/archivebox, spacecowboy/feeder.

Why does omnivore-app/omnivore match “a read-later article saver”?

Omnivore is a comprehensive, self-hostable read-it-later platform that provides article parsing, offline reading, full-text search, and robust organization tools, making it a perfect fit for your requirements.

Why does wallabag/wallabag match “a read-later article saver”?

Wallabag is a dedicated self-hosted application that provides article parsing, offline reading, tagging, and full-text search, making it a comprehensive solution for your read-it-later needs.

Why does miniflux/v2 match “a read-later article saver”?

This is a self-hosted RSS feed reader that provides article parsing and full-text search, serving as a capable alternative for managing and reading web content even though it is primarily feed-based rather than a manual bookmarking service.

Why does pirate/archivebox match “a read-later article saver”?

ArchiveBox is a self-hosted web archiving system that excels at preserving permanent copies of articles for offline access and full-text search, though it focuses more on archival preservation than the reading-optimized interface typical of read-it-later services.

Why does spacecowboy/feeder match “a read-later article saver”?

This is a self-hosted feed reader that provides core read-it-later features like full-text extraction, offline caching, and bookmarking, though it is primarily designed for RSS/Atom feeds rather than manual URL-based bookmarking.

Open Source Read Later Services

Self-hosted applications and browser extensions for saving web articles to read in a distraction-free environment.

Find the best repos with AI.We'll search the best matching repositories with AI.

omnivore-app/omnivore
omnivore-app/omnivore
15,882View on GitHub
Omnivore is an open-source, self-hostable read-it-later application designed to centralize web articles, newsletters, and digital documents into a personal library. It functions as a comprehensive content archiver that captures web pages and stores them locally, ensuring permanent access and readability regardless of internet connectivity. The platform distinguishes itself through an event-sourced synchronization engine that maintains a consistent state across multiple devices by replaying user actions. It utilizes a headless web scraping service to extract clean text and metadata from raw web pages, providing a uniform reading experience. Users can manage their collections through a research-oriented workflow that supports highlighting passages and attaching personal notes to saved content. The application provides a full suite of content management capabilities, including offline reading, cross-device progress synchronization, and structured data persistence. It is distributed as an open-source project, allowing users to maintain full control over their personal data and reading history.
Omnivore is a comprehensive, self-hostable read-it-later platform that provides article parsing, offline reading, full-text search, and robust organization tools, making it a perfect fit for your requirements.
TypeScriptSelf-HostedSelf-Hosted ApplicationsWeb Content Archivers
View on GitHub15,882
wallabag/wallabag
wallabag/wallabag
12,777View on GitHub
Wallabag is a self-hosted, open-source bookmark manager designed to archive web content for later reading. It functions as a personal knowledge management tool, allowing users to collect, store, and organize web pages into a centralized, searchable library. The platform provides a distraction-free reading experience by extracting the primary text and images from web pages while removing advertisements and navigation menus. This process ensures that saved articles remain accessible for offline reading, preserving the content even if the original source is removed from the internet. The system supports a range of organizational features, including tagging and full-text storage, to help manage large collections of research materials. It utilizes a standardized interface for external client interaction and employs asynchronous processing to handle resource-intensive tasks like content parsing and image fetching.
Wallabag is a dedicated self-hosted application that provides article parsing, offline reading, tagging, and full-text search, making it a comprehensive solution for your read-it-later needs.
PHPContent Extraction EnginesOffline AccessWeb Content Archivers
View on GitHub12,777
miniflux/v2
miniflux/v2
9,389View on GitHub
This project is a self-hosted RSS feed aggregator and reader designed to collect and organize content from RSS, Atom, and JSON feeds. It functions as a privacy-focused client that blocks pixel trackers and strips URL parameters to prevent third-party tracking and referrer leakage. The system is built as a REST API feed reader, exposing its data and user accounts through a programmable interface for third-party clients. It maintains compatibility with the OPML standard for importing and exporting subscriptions and provides tools for web content extraction using readability parsers and custom rules to retrieve main body text. Broad capabilities include full-text article search via database indexing, content filtering using regular expressions, and automated workflows through webhooks. The project also includes multi-method user authentication and tools for user account administration. The application is distributed via Docker container images, cross-platform binaries, and Linux system packages.
This is a self-hosted RSS feed reader that provides article parsing and full-text search, serving as a capable alternative for managing and reading web content even though it is primarily feed-based rather than a manual bookmarking service.
GoContent Extraction EnginesFull Text SearchSelf-Hosted Applications
View on GitHub9,389
pirate/archivebox
pirate/ArchiveBox
27,721View on GitHub
ArchiveBox is a self-hosted web archiving system designed to capture and preserve permanent static copies of webpages, media, and PDFs on personal infrastructure. It functions as a digital content curator and personal web archive manager, allowing users to import URLs from bookmarks, RSS feeds, and browser history to create a centralized, searchable knowledge base. The project is distinguished by its ability to archive private, paywalled, or login-protected content using browser cookies and authenticated session persistence. It ensures long-term availability by saving pages in multiple concurrent formats, including HTML, PDF, and PNG, and can automatically mirror these local snapshots to external preservation services. The system includes capabilities for multimedia asset extraction, full-text archive indexing, and scheduled content mirroring. Users can manage their collections through a web-based interface, a command-line interface, or a remote API, with options to export the entire collection as a standalone static HTML site for offline browsing.
ArchiveBox is a self-hosted web archiving system that excels at preserving permanent copies of articles for offline access and full-text search, though it focuses more on archival preservation than the reading-optimized interface typical of read-it-later services.
PythonFull Text SearchWeb Content Archivers
View on GitHub27,721
spacecowboy/feeder
spacecowboy/Feeder
2,641View on GitHub
Feeder is an RSS and Atom feed reader that aggregates content into a single interface. It functions as a full-text content extractor that removes website clutter to isolate the main body of articles, and a self-hosted feed synchronizer that maintains subscription lists and read statuses across devices via a private backend server. The application integrates AI services and external API keys to translate and generate concise summaries of long-form articles. It also features a text-to-speech reader that uses system engines with automatic language detection to convert written content into spoken audio. The system includes tools for content curation such as bookmarks, pinned entries, and a customizable blocklist to filter out unwanted items. It provides offline reading access by caching feed lists and full article text locally. Additional capabilities cover URL tracking parameter cleansing, subscription import and export, and reading appearance customization.
This is a self-hosted feed reader that provides core read-it-later features like full-text extraction, offline caching, and bookmarking, though it is primarily designed for RSS/Atom feeds rather than manual URL-based bookmarking.
KotlinOffline Access
View on GitHub2,641
archivebox/archivebox
ArchiveBox/ArchiveBox
26,876View on GitHub
ArchiveBox is a self-hosted archiving tool designed for personal digital preservation and research data management. It functions as an automated web preservation engine that monitors URL inputs from bookmarks, browser history, or manual entries to capture and store permanent, offline copies of web content. By utilizing headless browser automation, the system renders dynamic web pages to ensure that captured snapshots, PDFs, and media assets remain accurate and accessible even if the original source disappears. The project distinguishes itself through a modular extractor pipeline and a task-queue-based processing model, which allow it to handle long-running ingestion jobs reliably and at scale. It organizes all captured data into a predictable, file-system-based directory structure, ensuring that archives remain portable and accessible without the need for a dedicated database engine. This architecture supports the generation of static, self-contained archives that can be hosted on any standard web server. To maintain high fidelity across diverse web environments, the system includes configuration-driven dependency management that coordinates the necessary browser binaries and command-line tools. The platform provides a comprehensive suite of command-line interfaces, configuration options, and core modules to support operational management and integration. Detailed documentation is available to guide users through installation, dependency maintenance, and the security considerations of managing archived web content.
This is a powerful self-hosted web archiving tool that excels at preserving permanent offline copies of articles, though it focuses more on long-term digital preservation than the reading-focused interface typical of a traditional read-it-later service.
PythonWeb Content Archivers
View on GitHub26,876
readest/readest
readest/readest
21,502View on GitHub
Readest is a comprehensive digital reading platform designed to manage, annotate, and consume electronic books across multiple devices. It functions as a versatile library manager and reading environment, supporting a wide range of user needs from standard ebook consumption to specialized study and accessibility-focused workflows. The platform distinguishes itself through advanced features like parallel text study, which enables side-by-side document rendering with synchronized scrolling, and a robust text-to-speech engine that provides hands-free reading with synchronized visual highlighting. It offers deep customization for the reading experience, including granular control over typography, page layouts, and E-ink display optimization, alongside the ability to inject custom styles for personalized visual overrides. Beyond core reading, the application includes extensive tools for library organization, such as metadata tagging and automated grouping, as well as content processing capabilities like CJK text support and web article clipping. Users can maintain a unified library state across devices through state synchronization middleware, while also securing their collections with PIN-based access controls and managing data through portable backups and annotation exports.
Readest is a self-hostable digital reading platform that includes web article clipping and full-text search, making it a capable tool for managing and reading saved content even though its primary focus is on ebook library management.
TypeScriptFull Text SearchWeb Content Archivers
View on GitHub21,502
tagspaces/tagspaces
tagspaces/tagspaces
4,935View on GitHub
TagSpaces is an offline-first file tagging and organization platform that lets you manage local files with portable metadata stored directly in filenames or sidecar JSON files, eliminating the need for a central database. It functions as a full-text file search engine, a Kanban board file organizer, a local AI file assistant, an S3-compatible cloud file manager, and a web clipper and bookmark manager, all within a single application. The project distinguishes itself through a local-first architecture where all file operations, indexing, and AI processing run entirely on the device, with cloud storage treated as an optional remote mount point. It integrates with a locally running Ollama engine for on-device AI tasks such as automatic tagging, summarization, and image analysis, keeping all data private. A plugin-based file viewer system renders over 50 file formats, while metadata is stored in sidecar files or embedded in filenames, ensuring portability across devices and sync services. Beyond its core identity, TagSpaces offers a command-line interface for programmatic file operations and search indexing, supports S3-compatible object storage and WebDAV servers for remote file management, and provides a browser extension for capturing web pages, screenshots, and bookmarks as local files with automatic tagging. The application includes built-in viewers and editors for documents, images, audio, video, 3D models, and Markdown files, along with geo-tagging on interactive maps, Kanban board task management, and full-text search with fuzzy matching and saved queries. The application can be installed on Windows, macOS, and Linux, run in portable mode, or self-hosted as a static web app on personal servers or cloud platforms like Cloudflare Pages and AWS Amplify.
TagSpaces is a local-first file management and organization platform that functions as a self-hostable bookmarking and web-clipping tool, allowing you to archive web content as local files with tagging and full-text search capabilities.
TypeScriptFull Text SearchSelf-Hosted ApplicationsWeb Content Archivers
View on GitHub4,935
xiu2/yuedu
XIU2/Yuedu
11,647View on GitHub
Yuedu is an Android application designed to aggregate and manage web-based articles and reading content within a single interface. It functions as a content reader that collects information from various online sources, including RSS feeds, and organizes them for personal consumption. The application distinguishes itself through a plugin-driven architecture that utilizes custom parsing rules to extract and format unstructured web data. This modular approach allows users to define how the application interacts with diverse websites, ensuring that content is transformed into a standardized format for consistent display. To support flexible reading habits, the software includes local-first data persistence, which stores subscribed content and user metadata to enable offline access. Beyond standard reading features, the application provides tools for customizing the display of individual content feeds and managing directory details. It also supports external service interoperability by delegating text-to-speech tasks to system-level voice synthesis engines, allowing users to convert written articles into audible speech.
This is an Android-based RSS reader and content aggregator rather than a self-hostable web service for bookmarking and archiving articles, making it a different type of reading tool.
Content Extraction EnginesOffline Access
View on GitHub11,647
do-say-go/dn
DO-SAY-GO/dn
3,905View on GitHub
dn is a self-hosted personal web archiving system that automatically intercepts and stores web pages on a local device. It uses a proxy-based request interception model to capture browser traffic and save content for offline access without an internet connection. The system features a local full-text search engine that indexes all saved page content for information retrieval across the collection. It includes a dedicated browser interface that simulates online connectivity to serve archived files, mimicking the original live web environment. Administrative control is provided through a web-based interface for managing storage configurations and maintaining domain blacklists to filter specific websites from the archiving process.
This is a self-hosted web archiving system that provides full-text search and offline access to saved pages, serving as a robust tool for building a personal library of web content.
JavaScriptOffline Web Page ArchiversAutomatic Web ArchivingFull-Text Search Engines
View on GitHub3,905
rssnext/folo
RSSNext/Folo
38,546View on GitHub
Folo is a centralized RSS feed aggregator designed to consolidate digital content from multiple sources into a single, unified reading interface. It utilizes a local-first data architecture, employing a relational database to store feed metadata and article content, which ensures that information remains accessible and searchable even without an active internet connection. The application distinguishes itself through an integrated intelligent content processor that leverages asynchronous pipelines to translate foreign languages and generate concise summaries of long-form articles. To maintain a consistent reading experience, it employs a modular parsing architecture that converts diverse web formats into a standardized representation, while a sandboxed browser component renders complex media, including videos and audio, directly within the interface. The system maintains data currency through a background synchronization engine that performs periodic polling of remote sources. An internal event-driven observer pattern propagates these updates across the application, ensuring that the user interface reflects the latest feed information immediately upon arrival.
Folo is a self-hostable RSS aggregator that provides offline reading and content parsing, serving as a capable alternative for managing and reading web content even though it focuses on feed-based aggregation rather than manual bookmarking.
TypeScriptContent AggregatorsLocal-First StorageAI Documentation Assistants
View on GitHub38,546

Open Source Read Later Services

omnivore-app/omnivore

wallabag/wallabag

miniflux/v2

pirate/ArchiveBox

spacecowboy/Feeder

ArchiveBox/ArchiveBox

readest/readest

tagspaces/tagspaces

XIU2/Yuedu

DO-SAY-GO/dn

RSSNext/Folo