weiboSpider is a Python web scraper and social media crawler designed to extract user profiles, posts, and engagement metrics from Sina Weibo. It functions as an automated data pipeline for academic research and trend analysis, collecting long-form text and multimedia content.
The tool distinguishes itself through the use of browser session cookies to authenticate requests and access protected profiles. It implements randomized request pacing and global pauses to manage traffic and avoid platform rate limits, while supporting incremental crawling to capture only new content based on timestamps.
Capabilities include keyword-based post searches within defined time windows, the harvesting of original images and videos, and social network mapping via the extraction of follower lists. Extracted data can be filtered by date or originality and is persisted to flat files, relational databases, document databases, or transmitted as token-authenticated JSON payloads to remote API endpoints.