awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Web Scraping · Awesome GitHub Repositories

3 repos

Awesome GitHub RepositoriesWeb Scraping

Tools and frameworks for extracting structured information and media from websites through defined rules or automated processes.

Explore 3 awesome GitHub repositories matching web development · Web Scraping. Refine with filters or upvote what's useful.

  1. Home
  2. Web Development
  3. Web Automation and Scraping
  4. Web Scraping and Automation
  5. Web Scraping

Awesome Web Scraping GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • firecrawl/firecrawl

    firecrawl/firecrawl

    84,034GitHubView on GitHub↗

    Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi

    TypeScriptaiai-agentsai-crawler
  • scrapy/scrapy

    scrapy/scrapy

    59,824GitHubView on GitHub↗

    Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-

    Pythoncrawlercrawlingframework
  • soimort/you-get

    soimort/you-get

    56,737GitHubView on GitHub↗

    This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f

    Python

Explore sub-tags

  • Batch ScrapersTools for processing multiple URLs in a single operation.
  • Crawler Health MonitoringTools and metrics for tracking the operational status, performance, and resource usage of web scraping processes.
  • Crawler MiddlewareComponents that intercept and modify request and response flows during the crawling process.
  • Crawling OptimizationTechniques and configurations for managing memory, request rates, and concurrency in large-scale data collection tasks.
  • Media ExtractorsModular scripts for parsing web pages to retrieve direct media stream URLs.
  • State PersistenceMechanisms for maintaining session or interaction state during web scraping tasks.
  • Web CrawlersAutomated systems that traverse websites to discover and extract content from multiple pages.
  • Web Scraping APIsManaged services that provide programmatic access to scraped website content.