so-novel is a web novel downloader and scraping engine designed to extract structured text from websites and convert it into electronic book formats. It functions as a multi-interface content extractor, providing a shared backend accessible via a web-based management dashboard, a terminal user interface, and a command line interface.
The system utilizes a rule-driven approach for data extraction, using CSS selectors and XPath rules defined in external configuration files to map web elements to specific data fields. To maintain access to content, it includes a proxy-routed request pipeline to bypass regional restrictions and anti-scraping protections.
The project converts raw extracted text into structured electronic formats such as EPUB, PDF, TXT, and HTML. It also includes utilities for translating content between Simplified and Traditional Chinese encoding standards and supports simultaneous searching across multiple aggregated web sources.
Deployment options include containerized images, shell scripts, and the ability to compile the application into a standalone native binary.