CeWL is a custom wordlist generator and web crawling security tool designed to extract unique words and metadata from websites. It functions as an OSINT metadata extractor and security scanner, identifying potential passwords and usernames by analyzing HTML and JavaScript content.
The tool differentiates itself by combining recursive spidering with metadata extraction, allowing it to collect email addresses, author names, and creator metadata from web pages and linked files. It also captures domains, subdomains, and path components to include in generated lists.
Broad capabilities include web application spidering with depth control and regular expression filtering, as well as network request management using custom headers and proxy authentication. The system supports accessing restricted sites via Basic or Digest authentication and provides data processing utilities for word frequency analysis and list formatting.
The project is available as a containerized security scanner, packaged as a portable image to eliminate manual environment setup.