1 Repo
Programmatically analyzing and extracting information from sets of HTML files using structured patterns.
Distinct from Document Analysis: Shortlist candidates focus on AI-driven analysis or PDF forms, not CLI-based CSS pattern extraction from HTML.
Explore 1 awesome GitHub repository matching data & databases · Automated HTML Document Analysis. Refine with filters or upvote what's useful.
htmlq is a suite of command-line utilities for querying and extracting data from HTML documents using CSS selectors. It functions as a query language tool for HTML structures and attributes, providing a way to retrieve specific information from documents via the terminal. The tool provides capabilities for extracting text content, specific HTML attributes, and document fragments. It includes an HTML document formatter for cleaning and reformatting output with consistent indentation, as well as utilities for stripping tags to isolate plain text. The software handles structural HTML processing
Enables programmatic extraction of specific information from large sets of HTML files using CSS patterns.