# NanmiCoder/MediaCrawler

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nanmicoder-mediacrawler).**

44,037 stars · 9,677 forks · Python · other

## Links

- GitHub: https://github.com/NanmiCoder/MediaCrawler
- Homepage: https://nanmicoder.github.io/MediaCrawler/
- awesome-repositories: https://awesome-repositories.com/repository/nanmicoder-mediacrawler.md

## Description

MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts necessary for interacting with modern web interfaces.

The system distinguishes itself through a focus on session persistence and network flexibility. It supports remote debugging to reuse active browser sessions and cookies, which helps minimize the risk of triggering platform security challenges. To maintain stable data collection at scale, the tool integrates proxy-based request routing, allowing users to distribute traffic across external IP services to bypass rate limits and geographic restrictions.

The architecture is built for extensibility and modularity, employing a provider pattern that allows developers to integrate new platforms or custom storage backends through standardized interfaces. Users can manage complex scraping workflows via command-line configuration, enabling the definition of specific targets and storage formats—such as JSON, CSV, or various database systems—without modifying the core logic. The project also includes utilities for data visualization, such as generating word clouds from collected comments.

Installation requires setting up the necessary runtime environments, including a JavaScript engine for handling complex client-side rendering and the appropriate browser automation drivers.

## Tags

### Web Development

- [Web Scrapers](https://awesome-repositories.com/f/web-development/web-scrapers.md) — Collects posts, comments, and creator details from social platforms using a unified interface. ([source](https://nanmicoder.github.io/MediaCrawler/%E9%A1%B9%E7%9B%AE%E6%9E%B6%E6%9E%84%E6%96%87%E6%A1%A3.html))
- [Web Scraping Frameworks](https://awesome-repositories.com/f/web-development/web-scraping-frameworks.md) — Implements automated pipelines for navigating websites and collecting data at scale.
- [Browser Automation](https://awesome-repositories.com/f/web-development/browser-automation.md) — Controls real web browsers to render dynamic content and execute client-side scripts.
- [Headless Browser Controllers](https://awesome-repositories.com/f/web-development/headless-browser-controllers.md) — Manages headless browser instances to navigate dynamic content and bypass security challenges.
- [Social Media Scrapers](https://awesome-repositories.com/f/web-development/social-media-scrapers.md) — Automates browser interaction to extract posts, comments, and creator metadata from social platforms.
- [Browser Session Persistence](https://awesome-repositories.com/f/web-development/browser-session-persistence.md) — Maintains persistent login states to minimize detection and avoid repetitive security challenges.
- [Media Crawlers](https://awesome-repositories.com/f/web-development/media-crawlers.md) — Automates media retrieval across various social platforms using command-line configuration. ([source](https://nanmicoder.github.io/MediaCrawler/))
- [Task Execution Engines](https://awesome-repositories.com/f/web-development/task-execution-engines.md) — Retrieves specific content or user contributions by running crawling tasks across supported platforms. ([source](https://nanmicoder.github.io/MediaCrawler/%E9%A1%B9%E7%9B%AE%E6%9E%B6%E6%9E%84%E6%96%87%E6%A1%A3.html))
- [Social Media Extraction Tools](https://awesome-repositories.com/f/web-development/social-media-extraction-tools.md) — Extracts public posts, comments, and creator metadata from various social platforms.

### Development Tools & Productivity

- [Automation Scripts](https://awesome-repositories.com/f/development-tools-productivity/automation-scripts.md) — Automates scraping tasks via command-line arguments and configuration files. ([source](https://nanmicoder.github.io/MediaCrawler/))
- [Configuration Management](https://awesome-repositories.com/f/development-tools-productivity/configuration-management.md) — Decouples task logic from the runtime environment using external configuration files and command-line arguments.

### Networking & Communication

- [Proxy Management](https://awesome-repositories.com/f/networking-communication/proxy-management.md) — Distributes network traffic across external IP services to circumvent rate limits and access restricted content.
- [Proxy Management Services](https://awesome-repositories.com/f/networking-communication/proxy-management-services.md) — Routes network requests through external proxies to bypass rate limits and geo-blocking.
- [Proxy-Aware Clients](https://awesome-repositories.com/f/networking-communication/proxy-aware-clients.md) — Routes traffic through external services to manage rate limits and access geo-restricted content.
- [Proxy Configuration Tools](https://awesome-repositories.com/f/networking-communication/proxy-configuration-tools.md) — Configures network requests through external proxy services to bypass rate limits. ([source](https://nanmicoder.github.io/MediaCrawler/%E4%BB%A3%E7%90%86%E4%BD%BF%E7%94%A8.html))

### Data & Databases

- [Data Exporters](https://awesome-repositories.com/f/data-databases/data-exporters.md) — Saves collected information into formats like CSV, JSON, SQLite, MySQL, or MongoDB. ([source](https://nanmicoder.github.io/MediaCrawler/%E9%A1%B9%E7%9B%AE%E6%9E%B6%E6%9E%84%E6%96%87%E6%A1%A3.html))
- [Data Storage Adapters](https://awesome-repositories.com/f/data-databases/data-storage-adapters.md) — Persists data through interchangeable drivers that abstract the underlying database implementation.
- [Data Aggregation Pipelines](https://awesome-repositories.com/f/data-databases/data-aggregation-pipelines.md) — Standardizes data retrieval from multiple services into a unified format for consistent processing.

### Security & Cryptography

- [Remote Debugging Tools](https://awesome-repositories.com/f/security-cryptography/remote-debugging-tools.md) — Connects to existing browser instances to reuse cookies and login sessions. ([source](https://nanmicoder.github.io/MediaCrawler/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98.html))
- [Session Management](https://awesome-repositories.com/f/security-cryptography/session-management.md) — Attaches to existing browser instances to reuse active cookies and login sessions.
