pyWhat is a Python-based data extraction tool designed to scan files and text for sensitive identifiers, credentials, and network artifacts using regular expressions. It functions as a pattern matching engine and PII scanner capable of identifying personal identifiers and sensitive data patterns across directories and binary files.
The project specializes in the identification of unknown data formats through file signatures and the extraction of high-value identifiers, such as URLs, IP addresses, and phone numbers, from network capture files. It utilizes a rarity-based filtering system and specific tags to reduce false positives during the discovery process.
The tool provides broad capabilities for recursive directory scanning, data type identification, and digital forensic processing. Findings can be organized through custom sorting and exported into structured JSON or plain text formats for integration into external security pipelines.