1 repo
Modular architectures for building custom data extraction plugins and storage backends.
Distinguishing note: Focuses on the extensibility of the extraction architecture, distinct from the scrapers themselves.
Explore 1 awesome GitHub repository matching software engineering & architecture · Extensible Data Extractors. Refine with filters or upvote what's useful.
MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts necessary for interacting with modern web interfaces. The system distinguishes itself through a focus on session persistence and network flexibility. It supports remote debugging to reuse active browser sessions and cookies, which helps minimize the risk of triggering platform security challenges. To ma
Provides a modular architecture for integrating new platforms and custom storage backends.