Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and associated metadata.
The tool provides a terminal interface that processes content from remote URLs, local files, or piped HTML streams. It supports custom content targeting, allowing users to specify CSS selectors to manually define the main content area when automatic detection is insufficient.
The system employs heuristic-based extraction and DOM-tree sanitization to identify core content and standardize page elements. It also includes metadata schema parsing to extract structured information including titles, authors, and publication dates.