# hect0x7/jmcomic-crawler-python

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/hect0x7-jmcomic-crawler-python).**

6,371 stars · 11,065 forks · Python · MIT

## Links

- GitHub: https://github.com/hect0x7/JMComic-Crawler-Python
- Homepage: https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/#
- awesome-repositories: https://awesome-repositories.com/repository/hect0x7-jmcomic-crawler-python.md

## Topics

`18comic` `asyncio` `crawler` `downloader` `github-actions` `jmcomic` `pypi` `python` `readthedocs`

## Description

JMComic-Crawler-Python is a high-performance asynchronous web scraper and API client designed to programmatically retrieve images and metadata from a comic hosting service. It functions as a media archiving tool for batch downloading albums and chapters, automating the process of saving content to a local filesystem.

The project is distinguished by its ability to reverse server-side pixel obfuscation, using a decryption tool to reconstruct sliced and shuffled images. To maintain stable connectivity, it utilizes a network bypass utility featuring dynamic domain rotation and proxy routing to circumvent bot protections and network blocks.

The crawler provides extensive capabilities for content management, including the conversion of downloaded images into PDF, ZIP, or long-strip formats. It covers broad functional areas such as user account authentication via browser cookie imports, asynchronous content searching, and automated synchronization of new chapters. The system also supports extensibility through a plugin-based event system and custom HTTP client implementations.

Users can execute downloads directly via a command line interface or automate workflows using continuous integration platforms.

## Tags

### Content Management & Publishing

- [Comic and Manga Downloaders](https://awesome-repositories.com/f/content-management-publishing/community-content-feeds/community-content-downloaders/artwork-and-novel-downloaders/comic-and-manga-downloaders.md) — Provides a high-performance utility for batch downloading digital comic and manga chapters from hosting services. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Manga Chapter Downloaders](https://awesome-repositories.com/f/content-management-publishing/community-content-feeds/community-content-downloaders/restricted-content-downloaders/manga-chapter-downloaders.md) — Downloads manga chapters to local storage for offline reading with automated organization.
- [Manga Scrapers](https://awesome-repositories.com/f/content-management-publishing/content-aggregation-curation/content-aggregators/media-aggregators/manga-scrapers.md) — Implements a high-performance scraper to normalize manga content and metadata from hosting services. ([source](https://jmcomic.readthedocs.io/zh-cn/latest))
- [Comic Archive Converters](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/pdf-manipulation-utilities/pdf-editors/pdf-content-converters/pdf-image-conversion/image-to-pdf-converters/comic-archive-converters.md) — Transforms retrieved comic images into PDF, ZIP, or long-strip image formats for archiving. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/))
- [Content Ranking Aggregators](https://awesome-repositories.com/f/content-management-publishing/content-ranking-aggregators.md) — Fetches popularity rankings and curated lists of comic content based on view counts. ([source](https://github.com/hect0x7/JMComic-Crawler-Python/blob/master/assets/docs/sources/tutorial/0_common_usage.md))
- [Local-Remote Media Synchronizers](https://awesome-repositories.com/f/content-management-publishing/local-remote-media-synchronizers.md) — Synchronizes online manga libraries with local storage by monitoring and downloading new chapters.
- [Manga Metadata Retrieval](https://awesome-repositories.com/f/content-management-publishing/manga-metadata-retrieval.md) — Fetches image links and descriptive metadata for manga titles via asynchronous requests. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/))
- [Scheduled Content Updates](https://awesome-repositories.com/f/content-management-publishing/scheduled-content-updates.md) — Periodically checks remote sources for new chapters and automates content updates. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Web Content Scraping](https://awesome-repositories.com/f/content-management-publishing/web-content-scraping.md) — Automates the retrieval of images and metadata from comic websites for local storage.
- [Web Media Scrapers](https://awesome-repositories.com/f/content-management-publishing/web-media-scrapers.md) — Uses asynchronous requests and multi-threading to efficiently extract media and metadata from the web.
- [Incremental Downloaders](https://awesome-repositories.com/f/content-management-publishing/community-content-feeds/community-content-downloaders/artwork-and-novel-downloaders/comic-and-manga-downloaders/incremental-downloaders.md) — Provides functionality to identify and download only new chapters added to a comic since the last update. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/plugin/))
- [Content Formats and Exporting](https://awesome-repositories.com/f/content-management-publishing/content-formats-exporting.md) — Converts downloaded images into portable formats such as PDF, ZIP, or long-strip images. ([source](https://jmcomic.readthedocs.io/zh-cn/latest))
- [Automated File Organizers](https://awesome-repositories.com/f/content-management-publishing/media-management/file-management-systems/automated-file-organizers.md) — Automatically organizes downloaded files into directory structures based on predefined naming rules and placeholders. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/))

### Graphics & Multimedia

- [Obfuscated Image Reconstruction](https://awesome-repositories.com/f/graphics-multimedia/obfuscated-image-reconstruction.md) — Reconstructs images that have been sliced and shuffled by server-side obfuscation to restore them to a viewable state.
- [Media Category Browsing](https://awesome-repositories.com/f/graphics-multimedia/media-category-browsing.md) — Provides navigation systems to explore comic content via categories and time-based rankings. ([source](https://github.com/hect0x7/JMComic-Crawler-Python/blob/master/assets/docs/sources/tutorial/0_common_usage.md))
- [Format Converters](https://awesome-repositories.com/f/graphics-multimedia/media-processing-analysis/media-manipulation/media-processing-workflows/image-processing-pipelines/image-format-decoders/format-converters.md) — Converts downloaded comic images between different formats, such as converting a series of images into a long PNG strip. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Media Content Archivers](https://awesome-repositories.com/f/graphics-multimedia/media-production-suites/media-management-production/media-archiving/media-content-archivers.md) — Batch downloads digital media from web sources and organizes them for offline use.
- [Obfuscated Image Decryptions](https://awesome-repositories.com/f/graphics-multimedia/obfuscated-image-decryptions.md) — Process encrypted image files into a viewable format to ensure content is accessible after downloading. ([source](https://jmcomic.readthedocs.io/zh-cn/latest))
- [Comic Format Transformations](https://awesome-repositories.com/f/graphics-multimedia/comic-format-transformations.md) — Transforms sequences of comic images into reader-friendly formats like PDF and long-strip images. ([source](https://jmcomic.readthedocs.io/zh-cn/latest))

### Security & Cryptography

- [Media Decryption](https://awesome-repositories.com/f/security-cryptography/data-decryption/media-decryption.md) — Implements cryptographic processes to decrypt protected images and reverse server-side pixel obfuscation. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Protection Bypassers](https://awesome-repositories.com/f/security-cryptography/traffic-protection/protection-bypassers.md) — Circumvents automated firewalls and manages domain switching to maintain stable connectivity. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Session Authentication](https://awesome-repositories.com/f/security-cryptography/session-authentication.md) — Handles authentication sessions and manages access to personal favorite collections. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Browser Cookie Imports](https://awesome-repositories.com/f/security-cryptography/session-cookie-handlers/browser-cookie-imports.md) — Extracts session cookies directly from local web browsers to enable authenticated API access. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/plugin/))
- [Cookie-Based Authentication Bridges](https://awesome-repositories.com/f/security-cryptography/session-cookie-handlers/cookie-based-authentication-bridges.md) — Bridges browser cookies and session data into the crawler to access restricted content and bypass bot protection.
- [User Account Management](https://awesome-repositories.com/f/security-cryptography/user-account-management.md) — Manages user credentials and authentication states to establish secure sessions. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/client/))

### Artificial Intelligence & ML

- [Keyword Search Crawlers](https://awesome-repositories.com/f/artificial-intelligence-ml/bot-platforms/platform-normalization-adapters/platform-search-adapters/keyword-search-crawlers.md) — Queries the platform's search API by keywords and tags to collect matching manga content. ([source](https://github.com/hect0x7/JMComic-Crawler-Python/blob/master/assets/docs/sources/tutorial/0_common_usage.md))

### Data & Databases

- [Asynchronous Content Retrievers](https://awesome-repositories.com/f/data-databases/asynchronous-content-retrievers.md) — Uses asynchronous requests and generators to retrieve manga images and metadata at high speed. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/tutorial/14_async_usage/))
- [Iterator-Based Pagination](https://awesome-repositories.com/f/data-databases/iterator-based-pagination.md) — Navigates large datasets of comic albums through automated page-by-page retrieval using generators. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/entity/))
- [Search and Indexing](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing.md) — Integrates search and indexing to retrieve album details, rankings, and personal collection metadata. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [HTML Parsing and Extraction](https://awesome-repositories.com/f/data-databases/url-crawl-queues/url-filtering-strategies/content-and-language-filtering/html-parsing-and-extraction.md) — Extracts structured data, tags, and result counts from HTML search and category pages. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/toolkit/))

### Development Tools & Productivity

- [Media Category Filters](https://awesome-repositories.com/f/development-tools-productivity/search-query-filters/media-category-filters.md) — Filters search results by predefined media categories and ranking metrics before retrieval. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/tutorial/14_async_usage/))
- [Dynamic Domain Resolution](https://awesome-repositories.com/f/development-tools-productivity/web-development-utilities/official-domain-directories/dynamic-domain-resolution.md) — Programmatically resolves current web addresses from permanent URLs to bypass frequent domain changes. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/config/))
- [CLI Execution](https://awesome-repositories.com/f/development-tools-productivity/headless-execution-environments/cli-execution.md) — Offers a command-line interface for executing downloads and metadata lookups without a GUI. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Custom Plugin Registrations](https://awesome-repositories.com/f/development-tools-productivity/plugin-systems/custom-plugin-registrations.md) — Implements a registration system for custom plugins to add specialized behaviors to the crawler. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/tutorial/6_plugin/))
- [Python API Clients](https://awesome-repositories.com/f/development-tools-productivity/rest-apis/rest-api-clients/python-api-clients.md) — Implements a native Python library for object-oriented interaction with the comic service API.

### Networking & Communication

- [Batch Image Downloads](https://awesome-repositories.com/f/networking-communication/download-automation/batch-image-downloads.md) — Processes bulk retrieval of multiple albums and chapters in a single operation with fault tolerance. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/download/))
- [Media Content Retrievers](https://awesome-repositories.com/f/networking-communication/media-content-retrievers.md) — Fetches content programmatically using mobile API endpoints and automatic domain rotation. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/roadmap/))
- [Domain Rotation Strategies](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-infrastructure-configuration/network-management/proxy-tunneling-clients/geographic-restriction-bypasses/domain-rotation-strategies.md) — Uses dynamic domain rotation and proxy routing to circumvent network blocks and bot protections.
- [Bot Detection Bypass](https://awesome-repositories.com/f/networking-communication/request-header-configuration/request-header-overrides/bot-detection-bypass.md) — Mimics legitimate browser behavior and uses app-level encryption to evade bot detection systems. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))
- [Custom Client Implementations](https://awesome-repositories.com/f/networking-communication/custom-client-implementations.md) — Allows users to replace the default HTTP networking layer with custom client implementations for specialized request handling.
- [CI-Based](https://awesome-repositories.com/f/networking-communication/download-automation/ci-based.md) — Executes content retrieval and exports using CI tools to bypass regional network restrictions. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/))
- [Multi-Interface Content Retrieval](https://awesome-repositories.com/f/networking-communication/multi-interface-content-retrieval.md) — Toggles between web and API interfaces to optimize request efficiency and circumvent IP restrictions. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/))
- [Network Proxy Configurations](https://awesome-repositories.com/f/networking-communication/network-infrastructure-routing/network-infrastructure-configuration/network-configuration/network-proxy-configurations.md) — Supports routing network requests through custom HTTP and HTTPS proxies to bypass connectivity restrictions. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/))
- [Bot-Protection Bypassing Downloader](https://awesome-repositories.com/f/networking-communication/remote-file-downloads/bot-protection-bypassing-downloader.md) — Bypasses bot detection systems using proxy routing and dynamic domain rotation.
- [Multi-threaded Downloading](https://awesome-repositories.com/f/networking-communication/resumable-downloads/multi-threaded-downloading.md) — Increases overall download speed by fetching content segments in parallel across multiple threads. ([source](https://github.com/hect0x7/JMComic-Crawler-Python/blob/master/assets/docs/sources/tutorial/0_common_usage.md))

### Software Engineering & Architecture

- [Asynchronous Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/command-pipelines/request-pipelines/asynchronous-pipelines.md) — Implements an asynchronous pipeline that separates network I/O from decryption to maximize data throughput. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/download/))
- [Image Metadata Retrievers](https://awesome-repositories.com/f/software-engineering-architecture/component-lifecycle-management/component-detail-retrievers/content-detail-retrievers/image-metadata-retrievers.md) — Retrieves technical image metadata, including specific scramble IDs required for decryption. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/client/))
- [Dynamic Domain Rotation](https://awesome-repositories.com/f/software-engineering-architecture/dynamic-domain-rotation.md) — Uses dynamic domain rotation and failure tracking to maintain connectivity and bypass network blocks.
- [Asynchronous Request Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/asynchronous-request-pipelines.md) — Implements an asynchronous pipeline that decouples network I/O from image decryption to maximize throughput.
- [Event-Driven Plugin Logic](https://awesome-repositories.com/f/software-engineering-architecture/event-driven-plugin-logic.md) — Runs custom logic automatically when specific events occur, such as downloading images or albums. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/))
- [Plugin Extenders](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/plugin-architectures/developer-authoring-interfaces/custom-module-implementations/module-functionality-extenders/plugin-extenders.md) — Supports augmenting core application functionality by loading custom external libraries at runtime. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/))
- [Core Capability Extensions](https://awesome-repositories.com/f/software-engineering-architecture/integration-extensibility/extensibility/plugin-architectures/developer-authoring-interfaces/custom-module-implementations/module-functionality-extenders/plugin-extenders/core-capability-extensions.md) — Allows extending the fundamental operational logic of the crawler through custom plugins and modules. ([source](https://jmcomic.readthedocs.io/zh-cn/latest))
- [Multi-threaded Data Extraction](https://awesome-repositories.com/f/software-engineering-architecture/multi-threaded-data-extraction.md) — Implements concurrency patterns to distribute web scraping tasks across multiple workers for faster data extraction. ([source](https://github.com/hect0x7/JMComic-Crawler-Python/blob/master/assets/docs/sources/tutorial/0_common_usage.md))
- [Hook-Based Plugin Systems](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/plugin-module-systems/modular-plugin-architectures/plugin-based-architectures/hook-based-plugin-systems.md) — Provides a hook-based plugin system to execute custom logic at specific stages of the download lifecycle.

### User Interface & Experience

- [Manga Album Browsers](https://awesome-repositories.com/f/user-interface-experience/artist-and-album-browsers/manga-album-browsers.md) — Allows searching for manga using filter criteria and retrieving album details, rankings, and category lists. ([source](https://cdn.jsdelivr.net/gh/hect0x7/jmcomic-crawler-python@master/README.md))

### Web Development

- [High-Volume Web Scrapers](https://awesome-repositories.com/f/web-development/high-volume-web-scrapers.md) — Implements a high-performance crawler optimized for large-scale collection and anti-blocking.
- [Resource Detail Retrievals](https://awesome-repositories.com/f/web-development/resource-detail-retrievals.md) — Retrieves detailed metadata and information for specific manga albums or chapters using their unique IDs. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/client/))
- [Web Crawlers](https://awesome-repositories.com/f/web-development/web-crawlers.md) — Functions as a programmable system for asynchronously visiting URLs and extracting manga metadata.
- [Asynchronous API Clients](https://awesome-repositories.com/f/web-development/asynchronous-api-clients.md) — Provides an asynchronous Python interface for interacting with comic service web and mobile endpoints.
- [Retry and Backoff Logic](https://awesome-repositories.com/f/web-development/http-client-wrappers/retry-and-backoff-logic.md) — Implements automated retry strategies and domain blacklisting to recover from network timeouts. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/tutorial/12_domain_strategy/))
- [Decryption](https://awesome-repositories.com/f/web-development/rest-apis/api-response-processing/decryption.md) — Decrypts encrypted API response payloads to retrieve plain JSON metadata. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/toolkit/))

### Part of an Awesome List

- [Content Sync Actions](https://awesome-repositories.com/f/awesome-lists/devops/ci-automation/content-sync-actions.md) — Integrates with GitHub Actions to schedule automated content updates and library synchronization.

### Business & Productivity Software

- [Storage Path Templates](https://awesome-repositories.com/f/business-productivity-software/download-parameter-templates/storage-path-templates.md) — Generates local storage directories and filenames using dynamic metadata templates.
- [Media Post-Processing Utilities](https://awesome-repositories.com/f/business-productivity-software/media-downloaders/media-post-processing-utilities.md) — Transforms raw image bytes into PDF, ZIP, or long-strip formats after the download process completes.

### DevOps & Infrastructure

- [Concurrency Controllers](https://awesome-repositories.com/f/devops-infrastructure/concurrency-controllers.md) — Limits the number of simultaneous image and chapter requests to prevent resource exhaustion and server blocks. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/option_file_syntax/))
- [API Token Generators](https://awesome-repositories.com/f/devops-infrastructure/infrastructure/operational-observability-access/api-token-management/api-token-generators.md) — Generates signed tokens from secrets and timestamps to authenticate API requests. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/api/toolkit/))

### System Administration & Monitoring

- [Domain Connectivity Tests](https://awesome-repositories.com/f/system-administration-monitoring/domain-connectivity-tests.md) — Verifies if specific service domains are reachable from the current IP to manage domain rotation. ([source](https://jmcomic.readthedocs.io/zh-cn/latest/))
