What are the best open-source alternatives to Parser?

30 open-source projects similar to postlight/parser, ranked by shared features. Top picks: kepano/defuddle, mozilla/readability, deathau/markdown-clipper, ericchiang/pup, gsh199449/spider, danburzo/percollate, adbar/trafilatura, rchipka/node-osmosis, grangier/python-goose, wechatsync/wechatsync.

Is kepano/defuddle a good alternative to Parser?

Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and assoc…

Is mozilla/readability a good alternative to Parser?

Readability is a JavaScript library designed for web content extraction. It functions as a DOM parsing utility and article metadata extractor that isolates the primary text of a webpage by removing clutter such as advertisements and navigation bars. The library employs a heuristic-based content de…

Is deathau/markdown-clipper a good alternative to Parser?

markdown-clipper is a browser extension that converts website content into markdown files for offline storage and personal knowledge bases. It functions as a content extractor and HTML to markdown converter that removes layout clutter to isolate primary text. The tool includes a specific integrati…

Is ericchiang/pup a good alternative to Parser?

Pup is a command line tool for extracting and filtering data from HTML documents using CSS selectors. It functions as a parser and selector engine that isolates specific elements based on tags, IDs, classes, and attributes. The project provides utilities for converting selected HTML nodes into pla…

Is gsh199449/spider a good alternative to Parser?

Spider is a web-based platform designed for automated data extraction, providing a centralized framework to collect, process, and route structured information from websites. It functions as a comprehensive pipeline that manages the entire lifecycle of data gathering, from initial configuration to f…

Is danburzo/percollate a good alternative to Parser?

Percollate is a command-line tool for converting web pages and RSS feeds into structured files. It functions as a web content converter, static document generator, and page bundler that transforms online content into PDF, EPUB, HTML, or Markdown formats. The tool creates self-contained documents b…

Is adbar/trafilatura a good alternative to Parser?

Trafilatura is a Python library and command-line tool for extracting clean, structured text and metadata from web pages. It downloads HTML content, identifies the main body of text, and strips away navigation, ads, and other boilerplate, returning the core article content along with fields like tit…

Is rchipka/node-osmosis a good alternative to Parser?

This project is a Node.js web scraping framework designed to automate data extraction through a programmatic workflow of requests, parsing, and document interaction. It functions as a headless web crawler, an HTTP request manager, and a DOM parser and extractor. The framework distinguishes itself…

Is grangier/python-goose a good alternative to Parser?

python-goose is a Python library for web scraping and content extraction. It functions as an HTML boilerplate remover and article parser designed to isolate primary text and metadata from web pages by stripping away navigation, layout noise, and non-essential elements. The tool features multilingu…

Is wechatsync/wechatsync a good alternative to Parser?

Wechatsync is a multi-platform content synchronizer and cross-platform publishing tool. It extracts articles from webpages and distributes them to multiple social media and blogging platforms simultaneously. The system utilizes a web content extractor with reader-mode logic to strip advertisements…

Back to postlight/parser

Open-source alternatives to Parser

30 open-source projects similar to postlight/parser, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Parser alternative.

kepano/defuddle
kepano/defuddle
3,189View on GitHub
Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and associated metadata. The tool provides a terminal interface that processes content from remote URLs, local files, or piped HTML streams. It supports custom content targeting, allowing users to specify CSS selectors to manually define the main content area when automatic detection is insufficient. The sy
TypeScript
View on GitHub3,189
mozilla/readability
mozilla/readability
11,298View on GitHub
Readability is a JavaScript library designed for web content extraction. It functions as a DOM parsing utility and article metadata extractor that isolates the primary text of a webpage by removing clutter such as advertisements and navigation bars. The library employs a heuristic-based content detector to predict if a webpage contains a parseable article before performing full extraction. It uses a parsing workflow to convert complex HTML documents into a simplified format, facilitating the implementation of distraction-free reader views. The tool covers several capability areas, including
JavaScript
View on GitHub11,298
deathau/markdown-clipper
deathau/markdown-clipper
3,928View on GitHub
markdown-clipper is a browser extension that converts website content into markdown files for offline storage and personal knowledge bases. It functions as a content extractor and HTML to markdown converter that removes layout clutter to isolate primary text. The tool includes a specific integration for sending clipped web content directly into vaults and folders within the Obsidian note-taking application. It also supports batch processing to convert all open browser tabs into individual markdown files. The extension covers a broad range of extraction capabilities, including capturing selec
JavaScript
View on GitHub3,928

Open-source alternatives to Parser

kepano/defuddle

mozilla/readability

deathau/markdown-clipper

ericchiang/pup

gsh199449/spider

danburzo/percollate

adbar/trafilatura

rchipka/node-osmosis

grangier/python-goose

wechatsync/Wechatsync

pterm/pterm

ReadYouApp/ReadYou

yargs/yargs

spectresystems/spectre.console

Yomguithereal/react-blessed

lorien/web-scraping

nashsu/llm_wiki

Nemo2011/bilibili-api

standardnotes/app

mgdm/htmlq

PuerkitoBio/goquery

lysyi3m/macos-terminal-themes

qiye45/wechatDownload

freeok/so-novel

Marak/colors.js

IonicaBizau/scrape-it

Alfred1984/interesting-python

googlecodelabs/tools

quarto-dev/quarto-cli

fatih/color