This is a collection of Python scripts designed for extracting data from popular Chinese websites and mobile applications. It functions as a multi-platform data extraction toolkit, capable of automating tasks such as downloading videos from platforms like Bilibili and Douyin, scraping product reviews and images from e-commerce sites like Taobao and JD.com, and booking train tickets on the 12306 railway system. The project distinguishes itself through its focus on automating specific, high-value tasks within the Chinese internet ecosystem. It includes capabilities for solving Chinese CAPTCHA c
This is a tool for downloading videos, images, and audio from the Douyin social media platform using shareable URLs or profile links. It can download individual posts, entire user profiles including all posts and liked content, collections, and music tracks, with options for watermark-free and high-quality output. The tool also supports live stream recording, comment collection, and keyword-based content search with JSONL export. The project distinguishes itself through an integrated REST API server that accepts download and transcription requests, tracks job status, and exposes health check
Spider_XHS is a data extraction and automation tool built specifically for the Xiaohongshu social platform. It orchestrates multi-step workflows that combine comment tree traversal, cookie-based session reuse, high-resolution media retrieval, keyword search, proxy-backed retries, QR-code login, structured file export, and aggregated user profile collection into a single pipeline. The tool distinguishes itself through its integrated authentication and publishing capabilities, supporting login via QR code scanning or phone verification codes to establish and maintain authenticated sessions. It
This project is a specialized TikTok API scraper and data extractor. It functions as a proxy-based web scraper designed to collect user metadata, video posts, and trend feeds, while providing a webhook data pipeline to route scraped information to external URLs via HTTP requests. The tool includes a watermark-free video downloader that saves high-definition content to local storage. It employs cryptographic request signing for server authentication and utilizes session cookie authentication combined with proxy rotation to manage network traffic and avoid rate limits. Capabilities cover bulk