Pipet | Awesome Repository

pipet is a command-line tool that turns web scraping into a piped data flow through Unix filters. It provides a set of specialized scrapers — for CSS selector extraction, headless browser JavaScript rendering, JSON API querying, and change monitoring — each outputting structured data that can be transformed by chaining additional commands.

The tool uses declarative selectors (CSS and JSON path expressions) to define what to extract, automatically follows pagination links to collect data across multiple pages, and serializes results into JSON, custom-delimited text, or rendered templates. It can rerun a scraping pipeline on a schedule and trigger a custom command whenever the output changes from the previous run. Headless browser automation allows scraping JavaScript-heavy pages, executing custom scripts, and replicating authenticated sessions by reusing browser request headers.

Additional capabilities include extracting data from HTML pages with nested iterations, querying JSON API endpoints with path syntax, and outputting results in multiple formats. pipet is designed to fit naturally into existing command-line workflows, treating each scraping job as a composable pipe.

Features

Web Scraping - Extracts structured data from websites using CSS selectors and pipes results to Unix commands.
Web Page Scraping Extractors - Parses HTML documents using CSS selectors and nested iterations, extracts structured fields, and pipes the output to Unix commands.
Command Piping - Data moves through a chain of Unix filters where each stage transforms the scraped output.
Array Iteration Clients - Extracts data from JSON endpoints using path syntax, iterates over arrays, and pipes results to external tools.

Features

Web Scraping - Extracts structured data from websites using CSS selectors and pipes results to Unix commands.
Web Page Scraping Extractors - Parses HTML documents using CSS selectors and nested iterations, extracts structured fields, and pipes the output to Unix commands.
Command Piping - Data moves through a chain of Unix filters where each stage transforms the scraped output.
Array Iteration Clients - Extracts data from JSON endpoints using path syntax, iterates over arrays, and pipes results to external tools.