awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Distributed Crawling Systems · Awesome GitHub Repositories

2 repos

Awesome GitHub RepositoriesDistributed Crawling Systems

Frameworks for managing high-volume, asynchronous web crawling across multiple nodes.

Explore 2 awesome GitHub repositories matching data & databases · Distributed Crawling Systems. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Processing Pipelines
  4. Distributed Crawling Systems

Awesome Distributed Crawling Systems GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • unclecode/crawl4ai

    unclecode/crawl4ai

    60,452GitHubView on GitHub↗

    Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.

    Python
  • scrapy/scrapy

    scrapy/scrapy

    59,824GitHubView on GitHub↗

    Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-

    Pythoncrawlercrawlingframework