This is a collection of Python scripts designed for extracting data from popular Chinese websites and mobile applications. It functions as a multi-platform data extraction toolkit, capable of automating tasks such as downloading videos from platforms like Bilibili and Douyin, scraping product reviews and images from e-commerce sites like Taobao and JD.com, and booking train tickets on the 12306 railway system.
The project distinguishes itself through its focus on automating specific, high-value tasks within the Chinese internet ecosystem. It includes capabilities for solving Chinese CAPTCHA challenges like GEETEST, removing watermarks from downloaded videos, and building a pool of proxy IPs to avoid blocking during large-scale scraping operations. A notable feature is its ability to assist with live quiz games by capturing questions from streaming apps, searching for answers online, and broadcasting the results in real time via WebSocket.
Beyond these differentiators, the toolkit covers a broad range of standard web scraping techniques. It handles both static and dynamic web content, manages session-based authentication for sites like Taobao, and provides utilities for downloading various media types including images, music, and novels. The project also includes scripts for querying university academic systems and scheduling automated actions, such as booking tickets at a precise time.