# freeok/so-novel

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/freeok-so-novel).**

7,049 stars · 565 forks · Java · AGPL-3.0

## Links

- GitHub: https://github.com/freeok/so-novel
- awesome-repositories: https://awesome-repositories.com/repository/freeok-so-novel.md

## Topics

`cli` `content-export` `document-parser` `ebook` `novel` `offline-reader` `tui`

## Description

so-novel is a web novel downloader and scraping engine designed to extract structured text from websites and convert it into electronic book formats. It functions as a multi-interface content extractor, providing a shared backend accessible via a web-based management dashboard, a terminal user interface, and a command line interface.

The system utilizes a rule-driven approach for data extraction, using CSS selectors and XPath rules defined in external configuration files to map web elements to specific data fields. To maintain access to content, it includes a proxy-routed request pipeline to bypass regional restrictions and anti-scraping protections.

The project converts raw extracted text into structured electronic formats such as EPUB, PDF, TXT, and HTML. It also includes utilities for translating content between Simplified and Traditional Chinese encoding standards and supports simultaneous searching across multiple aggregated web sources.

Deployment options include containerized images, shell scripts, and the ability to compile the application into a standalone native binary.

## Tags

### Development Tools & Productivity

- [Web Scraping](https://awesome-repositories.com/f/development-tools-productivity/web-scraping.md) — Uses customizable CSS selectors and XPath rules to extract structured content from diverse web pages.
- [System Command Dispatchers](https://awesome-repositories.com/f/development-tools-productivity/command-lifecycle-managers/system-command-dispatchers.md) — Implements a shared backend that routes commands across web, terminal, and command line interfaces.

### Web Development

- [Web Scraping](https://awesome-repositories.com/f/web-development/web-automation-scraping/web-scraping-automation/web-scraping.md) — Extracts structured information from websites using customizable rule files based on CSS selectors, XPath, and regular expressions. ([source](https://github.com/freeok/so-novel#readme))
- [Web Scraping Engines](https://awesome-repositories.com/f/web-development/web-scraping-engines.md) — Implements a high-performance engine for automated extraction of structured text using CSS and XPath.
- [Offline Web Page Archivers](https://awesome-repositories.com/f/web-development/offline-web-applications/offline-web-page-archivers.md) — Captures web-based novels and saves them as local files for offline reading on electronic devices.
- [Website Crawlers and Scrapers](https://awesome-repositories.com/f/web-development/website-crawlers-and-scrapers.md) — Functions as a deployment-ready system for recursively retrieving and structuring web novel content.

### Content Management & Publishing

- [Artwork and Novel Downloaders](https://awesome-repositories.com/f/content-management-publishing/community-content-feeds/community-content-downloaders/artwork-and-novel-downloaders.md) — Downloads online novels and chapters from various websites for offline archival and reading.
- [Web Content Extractors](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/content-parsers/web-content-extractors.md) — Transforms unstructured web pages into structured data formats based on predefined extraction rules. ([source](https://github.com/freeok/so-novel/blob/main/README.md))
- [Document Format Converters](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing/format-specific-parsers/office-document-parsers/document-format-converters.md) — Converts raw extracted text into structured electronic formats including EPUB, PDF, and TXT.
- [Document Format Conversions](https://awesome-repositories.com/f/content-management-publishing/document-format-conversions.md) — Transforms extracted web text into structured electronic document formats such as EPUB, PDF, and TXT.
- [eBook Compilers](https://awesome-repositories.com/f/content-management-publishing/ebook-compilers.md) — Assembles scraped web content into standardized eBook formats including EPUB, MOBI, and PDF.
- [Multi-Format Exporters](https://awesome-repositories.com/f/content-management-publishing/html-to-markdown-converters/multi-format-exporters.md) — Converts extracted web content into standardized formats including EPUB, PDF, TXT, and HTML with optimized layouts. ([source](https://github.com/freeok/so-novel/blob/main/README.md))
- [Web Element Mappings](https://awesome-repositories.com/f/content-management-publishing/web-element-mappings.md) — Supports mapping web elements using CSS selectors or XPaths in configuration files to identify target content. ([source](https://github.com/freeok/so-novel/tree/main/bundle))
- [Manga Chapter Downloaders](https://awesome-repositories.com/f/content-management-publishing/community-content-feeds/community-content-downloaders/restricted-content-downloaders/manga-chapter-downloaders.md) — High-performance downloader for fetching multiple novel chapters simultaneously for offline reading. ([source](https://github.com/freeok/so-novel/blob/main/CHANGELOG_ALL.md))

### Data & Databases

- [Document Format Transformations](https://awesome-repositories.com/f/data-databases/structured-data-extraction/document-format-transformations.md) — Transforms extracted raw web data into structured electronic document formats for storage and reading. ([source](https://github.com/freeok/so-novel/blob/main/bundle/DISCLAIMER.md))
- [Pattern-Based Extraction](https://awesome-repositories.com/f/data-databases/text-processing-utilities/text-extraction/pattern-based-extraction.md) — Defines and switches between data extraction patterns via configuration files to support diverse sources. ([source](https://github.com/freeok/so-novel/blob/main/bundle%2Freadme.txt))

### Networking & Communication

- [Proxy Routing](https://awesome-repositories.com/f/networking-communication/request-proxies/proxy-routing.md) — Routes outbound network traffic through configurable proxies to bypass regional restrictions and anti-scraping protections.
- [Extraction Rule Sets](https://awesome-repositories.com/f/networking-communication/traffic-rule-sets/extraction-rule-sets.md) — Manages serialized logic used to parse and extract data from various target websites. ([source](https://github.com/freeok/so-novel/blob/main/BOOK_SOURCES.md))
- [Web Element Mapping Rules](https://awesome-repositories.com/f/networking-communication/traffic-rule-sets/extraction-rule-sets/content-identification-rule-sets/web-element-mapping-rules.md) — Maps website elements to specific data fields by switching between predefined configuration sets for different sources.
- [Custom Extraction Rule Definitions](https://awesome-repositories.com/f/networking-communication/traffic-rule-sets/extraction-rule-sets/custom-extraction-rule-definitions.md) — Allows the creation and loading of site-specific rules that define how content is parsed from different websites. ([source](https://github.com/freeok/so-novel/blob/main/CHANGELOG_ALL.md))
- [Network Proxy Configurations](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-infrastructure-configuration/network-configuration/network-proxy-configurations.md) — Routes outbound network traffic through configurable proxy hosts to bypass regional access restrictions. ([source](https://github.com/freeok/so-novel/blob/main/BOOK_SOURCES.md))
- [Multi-threaded Downloading](https://awesome-repositories.com/f/networking-communication/resumable-downloads/multi-threaded-downloading.md) — Fetches multiple web resources simultaneously using a configurable concurrency model and retry logic.

### Software Engineering & Architecture

- [CSS Selector Data Extractors](https://awesome-repositories.com/f/software-engineering-architecture/syntax-query-definitions/css-selector-engines/css-selector-data-extractors.md) — Parses unstructured web pages using configurable CSS selectors and XPath rules defined in external files.

### Security & Cryptography

- [Anti-Bot Evasion](https://awesome-repositories.com/f/security-cryptography/bot-detection/anti-bot-evasion.md) — Implements techniques to bypass bot detection and security layers using proxy-routed requests.
- [Protection Bypassers](https://awesome-repositories.com/f/security-cryptography/traffic-protection/protection-bypassers.md) — Circumvents anti-scraping security layers and bot protections by routing requests through proxy services. ([source](https://github.com/freeok/so-novel/blob/main/bundle%2Freadme.txt))

### System Administration & Monitoring

- [Web Management Dashboards](https://awesome-repositories.com/f/system-administration-monitoring/administrative-operations/remote-access-interface-tools/administrative-interfaces/management-interfaces/web-management-dashboards.md) — Provides a browser-based dashboard for managing data extraction tasks and system configurations. ([source](https://github.com/freeok/so-novel/tree/main/bundle))

### User Interface & Experience

- [Multi-Interface Access](https://awesome-repositories.com/f/user-interface-experience/multi-interface-access.md) — Offers a shared backend accessible via a web-based dashboard, a terminal user interface, and a command line interface. ([source](https://github.com/freeok/so-novel/blob/main/CHANGELOG_ALL.md))
