pipet is a command-line tool that turns web scraping into a piped data flow through Unix filters. It provides a set of specialized scrapers — for CSS selector extraction, headless browser JavaScript rendering, JSON API querying, and change monitoring — each outputting structured data that can be transformed by chaining additional commands.
The tool uses declarative selectors (CSS and JSON path expressions) to define what to extract, automatically follows pagination links to collect data across multiple pages, and serializes results into JSON, custom-delimited text, or rendered templates. It can rerun a scraping pipeline on a schedule and trigger a custom command whenever the output changes from the previous run. Headless browser automation allows scraping JavaScript-heavy pages, executing custom scripts, and replicating authenticated sessions by reusing browser request headers.
Additional capabilities include extracting data from HTML pages with nested iterations, querying JSON API endpoints with path syntax, and outputting results in multiple formats. pipet is designed to fit naturally into existing command-line workflows, treating each scraping job as a composable pipe.