jsdom is a Node.js DOM implementation that functions as a headless browser emulator and virtual browser environment. It provides a pure JavaScript implementation of web standards, acting as a web standards polyfill that simulates the window and document objects within a non-browser runtime. The project implements W3C and WHATWG specifications to provide a programmatic environment for parsing HTML and manipulating content. It serves as an HTML parser and serializer, allowing for the transformation of HTML strings into document structures and the export of those structures back into text. The
This project is an HTML and XML DOM parser designed for loading and navigating the structure of web documents to extract specific data points. It functions as a web scraping utility that provides a system for locating precise elements using a CSS and XPath selector engine. The library includes a URI resolver that converts relative links found in documents into absolute addresses using a base URI. It provides a set of tools for retrieving text, attributes, and media sources from parsed content. The toolset covers document hierarchy traversal, selector-based filtering, and text extraction with
htmlparser2 is a collection of tools for high-performance markup parsing, DOM manipulation, and incremental stream processing. It functions as an HTML and XML parser that converts markup strings into structured object trees, alongside a streaming markup parser designed for memory-efficient processing of large documents. The project includes a DOM manipulation library for querying, modifying, and serializing document object model trees. It also provides a web feed parser to extract structured metadata and entries from RSS, RDF, and Atom feeds. The library covers broad capabilities in data par
parse5 is a WHATWG HTML parser and serializer for Node.js. It transforms HTML strings into a document object model and converts those trees back into valid HTML strings, following the logic defined by the HTML Living Standard. The project functions as a streaming HTML processor, using incremental parsing to handle large documents in chunks. It includes an HTML5 compliant tokenizer that uses a state-machine approach to break input into tokens according to official web specifications. The toolset covers HTML document parsing, serialization, and real-time rewriting via streams. These capabiliti