What are the best open-source alternatives to Readability?

30 open-source projects similar to mozilla/readability, ranked by shared features. Top picks: kepano/defuddle, adbar/trafilatura, postlight/parser, readyouapp/readyou, grangier/python-goose, gsh199449/spider, lorien/web-scraping, drawrowfly/tiktok-scraper, minbrowser/min, deathau/markdown-clipper.

Is kepano/defuddle a good alternative to Readability?

Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and assoc…

Is adbar/trafilatura a good alternative to Readability?

Trafilatura is a Python library and command-line tool for extracting clean, structured text and metadata from web pages. It downloads HTML content, identifies the main body of text, and strips away navigation, ads, and other boilerplate, returning the core article content along with fields like tit…

Is postlight/parser a good alternative to Readability?

Postlight Parser is a command-line tool that extracts the main article content from any web page URL, returning clean structured data including the title, author, date, excerpt, and lead image while stripping away ads and clutter. It uses a readability-based heuristic that scores HTML elements on t…

Is readyouapp/readyou a good alternative to Readability?

ReadYou is a self-hosted reading application and RSS feed aggregator that centralizes content from multiple web sources. It functions as a full-text RSS reader, extracting the complete body text from web pages to provide a distraction-free reading experience. The application includes specialized a…

Is grangier/python-goose a good alternative to Readability?

python-goose is a Python library for web scraping and content extraction. It functions as an HTML boilerplate remover and article parser designed to isolate primary text and metadata from web pages by stripping away navigation, layout noise, and non-essential elements. The tool features multilingu…

Is gsh199449/spider a good alternative to Readability?

Spider is a web-based platform designed for automated data extraction, providing a centralized framework to collect, process, and route structured information from websites. It functions as a comprehensive pipeline that manages the entire lifecycle of data gathering, from initial configuration to f…

Is lorien/web-scraping a good alternative to Readability?

This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats.…

Is drawrowfly/tiktok-scraper a good alternative to Readability?

This project is a specialized TikTok API scraper and data extractor. It functions as a proxy-based web scraper designed to collect user metadata, video posts, and trend feeds, while providing a webhook data pipeline to route scraped information to external URLs via HTTP requests. The tool includes…

Is minbrowser/min a good alternative to Readability?

Min is a minimalist, privacy-focused web browser designed to limit data collection and remove interface clutter. It serves as an ad-blocking tool that prevents tracking scripts and advertisements from loading to improve page speed and protect user identity. The browser differentiates itself throug…

Is deathau/markdown-clipper a good alternative to Readability?

markdown-clipper is a browser extension that converts website content into markdown files for offline storage and personal knowledge bases. It functions as a content extractor and HTML to markdown converter that removes layout clutter to isolate primary text. The tool includes a specific integrati…

Back to mozilla/readability

Open-source alternatives to Readability

30 open-source projects similar to mozilla/readability, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Readability alternative.

kepano/defuddle
kepano/defuddle
3,189View on GitHub
Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and associated metadata. The tool provides a terminal interface that processes content from remote URLs, local files, or piped HTML streams. It supports custom content targeting, allowing users to specify CSS selectors to manually define the main content area when automatic detection is insufficient. The sy
TypeScript
View on GitHub3,189
adbar/trafilatura
adbar/trafilatura
5,319View on GitHub
Trafilatura is a Python library and command-line tool for extracting clean, structured text and metadata from web pages. It downloads HTML content, identifies the main body of text, and strips away navigation, ads, and other boilerplate, returning the core article content along with fields like title, author, date, and URL. The tool can also extract user comments and test whether a page contains extractable text, making it a general-purpose web text extraction library. What distinguishes Trafilatura from simpler extractors is its configurable extraction pipeline, which offers high-speed, high
Pythonarticle-extractorcorpus-buildercorpus-tools
View on GitHub5,319
postlight/parser
postlight/parser
5,786View on GitHub
Postlight Parser is a command-line tool that extracts the main article content from any web page URL, returning clean structured data including the title, author, date, excerpt, and lead image while stripping away ads and clutter. It uses a readability-based heuristic that scores HTML elements on text density and structural cues to identify the article body, and can accept pre-fetched HTML strings directly for parsing instead of fetching the URL. The tool distinguishes itself through a modular architecture that supports domain-specific extractor overrides, allowing custom JavaScript modules t
JavaScriptjestlabsmercury
View on GitHub5,786

Open-source alternatives to Readability

kepano/defuddle

adbar/trafilatura

postlight/parser

ReadYouApp/ReadYou

grangier/python-goose

gsh199449/spider

lorien/web-scraping

drawrowfly/tiktok-scraper

minbrowser/min

deathau/markdown-clipper

webclipper/web-clipper

feedbin/feedbin

wechatsync/Wechatsync

spacecowboy/Feeder

steipete/summarize

kkdai/youtube

samuelclay/NewsBlur

vikiboss/60s

gh0stkey/HaE

nashsu/llm_wiki

standardnotes/app

iflow-ai/iflow-cli

Nemo2011/bilibili-api

TransparentLC/WechatMomentScreenshot

tridactyl/tridactyl

dataabc/weiboSpider

qunash/chatgpt-advanced

Show-Me-the-Code/python

npm/ini

andrejewski/himalaya