1 repo
Systems for collecting, normalizing, and unifying data from disparate web sources.
Distinguishing note: Focuses on the aggregation and standardization of data, distinct from raw storage.
Explore 1 awesome GitHub repository matching data & databases · Data Aggregation Pipelines. Refine with filters or upvote what's useful.
MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts necessary for interacting with modern web interfaces. The system distinguishes itself through a focus on session persistence and network flexibility. It supports remote debugging to reuse active browser sessions and cookies, which helps minimize the risk of triggering platform security challenges. To ma
Standardizes data retrieval from multiple services into a unified format for consistent processing.