# bjesus/pipet

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/bjesus-pipet).**

4,662 stars · 220 forks · Go · mit

## Links

- GitHub: https://github.com/bjesus/pipet
- awesome-repositories: https://awesome-repositories.com/repository/bjesus-pipet.md

## Topics

`css` `curl` `gjson` `json` `playwright` `scraper` `scraping`

## Description

pipet is a command-line tool that turns web scraping into a piped data flow through Unix filters. It provides a set of specialized scrapers — for CSS selector extraction, headless browser JavaScript rendering, JSON API querying, and change monitoring — each outputting structured data that can be transformed by chaining additional commands.

The tool uses declarative selectors (CSS and JSON path expressions) to define what to extract, automatically follows pagination links to collect data across multiple pages, and serializes results into JSON, custom-delimited text, or rendered templates. It can rerun a scraping pipeline on a schedule and trigger a custom command whenever the output changes from the previous run. Headless browser automation allows scraping JavaScript-heavy pages, executing custom scripts, and replicating authenticated sessions by reusing browser request headers.

Additional capabilities include extracting data from HTML pages with nested iterations, querying JSON API endpoints with path syntax, and outputting results in multiple formats. pipet is designed to fit naturally into existing command-line workflows, treating each scraping job as a composable pipe.

## Tags

### Web Development

- [Web Scraping](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping.md) — Extracts structured data from websites using CSS selectors and pipes results to Unix commands.
- [Pagination Crawlers](https://awesome-repositories.com/f/web-development/custom-page-frameworks/page-content-injections/pagination-navigators/pagination-crawlers.md) — Automatically follows pagination links to scrape data from successive pages.
- [Headless Browsers](https://awesome-repositories.com/f/web-development/headless-browsers.md) — Controls a headless browser to render JavaScript-heavy pages and execute custom scripts.
- [Path Expression Clients](https://awesome-repositories.com/f/web-development/json-apis/path-expression-clients.md) — Queries JSON APIs with path expressions and iterates over arrays from the command line.
- [Web Scraping Selectors](https://awesome-repositories.com/f/web-development/web-scraping-selectors.md) — Scrapes HTML pages using CSS selectors with nested iterations and pagination, outputting structured data to Unix pipes.
- [JSON APIs](https://awesome-repositories.com/f/web-development/json-apis.md) — Extracts data from JSON APIs using path syntax, iterates over arrays, and pipes results to external tools. ([source](https://github.com/bjesus/pipet#readme))

### Content Management & Publishing

- [Web Page Scraping Extractors](https://awesome-repositories.com/f/content-management-publishing/web-page-scraping-extractors.md) — Parses HTML documents using CSS selectors and nested iterations, extracts structured fields, and pipes the output to Unix commands. ([source](https://github.com/bjesus/pipet#readme))
- [JavaScript Rendering](https://awesome-repositories.com/f/content-management-publishing/web-page-scraping-extractors/javascript-rendering.md) — Navigates pages with a headless browser, executes custom JavaScript, and returns serialized data or triggers UI actions. ([source](https://github.com/bjesus/pipet#readme))

### Development Tools & Productivity

- [Command Piping](https://awesome-repositories.com/f/development-tools-productivity/command-piping.md) — Data moves through a chain of Unix filters where each stage transforms the scraped output.
- [Array Iteration Clients](https://awesome-repositories.com/f/development-tools-productivity/json-api-clients/array-iteration-clients.md) — Extracts data from JSON endpoints using path syntax, iterates over arrays, and pipes results to external tools.
- [Multi-Format Data Exports](https://awesome-repositories.com/f/development-tools-productivity/multi-format-data-exports.md) — Outputs scraped data in JSON format, plain text with custom separators, or as a rendered template file. ([source](https://github.com/bjesus/pipet#readme))

### DevOps & Infrastructure

- [Page Change Monitors](https://awesome-repositories.com/f/devops-infrastructure/webhook-triggers/page-change-monitors.md) — Reruns scraping on a set interval and executes a command when scraped content differs from the previous result. ([source](https://github.com/bjesus/pipet#readme))

### System Administration & Monitoring

- [Content Diff Monitors](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/periodical-comparisons/content-diff-monitors.md) — Reruns a scraping pipeline on a schedule and triggers a command when content changes.
- [Web Change Monitors](https://awesome-repositories.com/f/system-administration-monitoring/web-change-monitors.md) — Runs scraping on a schedule and triggers a command when the scraped content differs from the previous result.

### User Interface & Experience

- [CSS Selectors](https://awesome-repositories.com/f/user-interface-experience/css-selectors.md) — Uses CSS selectors to declaratively define which elements to extract from documents.
