2 repos
Tools and frameworks for extracting data from websites and social media platforms.
Distinguishing note: Focuses on the high-level capability of social media content extraction.
Explore 2 awesome GitHub repositories matching web development · Web Scrapers. Refine with filters or upvote what's useful.
MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts necessary for interacting with modern web interfaces. The system distinguishes itself through a focus on session persistence and network flexibility. It supports remote debugging to reuse active browser sessions and cookies, which helps minimize the risk of triggering platform security challenges. To ma
Collects posts, comments, and creator details from social platforms using a unified interface.
RSSHub is a headless, server-side engine designed to generate standardized RSS and Atom feeds from websites that do not natively provide them. By acting as an extensible data aggregator, it enables the automated collection of web content, allowing users to monitor updates from disparate sources through centralized feed readers or workflow automation tools. The platform distinguishes itself through a route-based data extraction framework that maps specific URL patterns to custom scraping logic. This modular architecture is supported by a middleware-driven request pipeline and declarative route
Provides routing logic and parsing tools to extract structured data from websites lacking native feeds.