awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Web Scraping and Automation · Awesome GitHub Repositories

6 repos

Awesome GitHub RepositoriesWeb Scraping and Automation

Systems for automating browser interactions and crawling web content at scale.

Explore 6 awesome GitHub repositories matching web development · Web Scraping and Automation. Refine with filters or upvote what's useful.

  1. Home
  2. Web Development
  3. Web Automation and Scraping
  4. Web Scraping and Automation

Awesome Web Scraping and Automation GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • openclaw/openclaw

    openclaw/openclaw

    211,971GitHubView on GitHub↗

    Openclaw is a platform for managing agent execution environments, providing the infrastructure to control agent lifecycles, session state, and workspace persistence. It features a centralized gateway that handles model loops, tool invocation, and streaming events, while supporting multi-agent routing and persistent mem

    TypeScriptaiassistantcrustacean
  • firecrawl/firecrawl

    firecrawl/firecrawl

    84,034GitHubView on GitHub↗

    Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi

    TypeScriptaiai-agentsai-crawler
  • OpenHands/OpenHands

    OpenHands/OpenHands

    67,974GitHubView on GitHub↗

    OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system

    Pythonagentartificial-intelligencechatgpt
  • unclecode/crawl4ai

    unclecode/crawl4ai

    60,452GitHubView on GitHub↗

    Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.

    Python
  • scrapy/scrapy

    scrapy/scrapy

    59,824GitHubView on GitHub↗

    Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-

    Pythoncrawlercrawlingframework
  • soimort/you-get

    soimort/you-get

    56,737GitHubView on GitHub↗

    This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f

    Python

Explore sub-tags

  • Browser Automation6 sub-tagsFrameworks and integrations that enable programmatic control of browser instances to execute tasks and capture interactions.
  • Web Crawling5 sub-tagsSystems designed to systematically discover, navigate, and index web content across domains for large-scale data collection.
  • Web Crawling Infrastructure1 sub-tagFoundational software and environment configurations required to support and maintain web data collection operations.
  • Web Scraping
10 sub-tags
Tools and frameworks for extracting structured information and media from websites through defined rules or automated processes.
  • Web Scraping Frameworks2 sub-tagsComprehensive toolkits and engines designed to extract structured data from websites by defining navigation rules or using language models.
  • Web Scraping Infrastructure1 sub-tagSelf-hosted server environments that manage asynchronous job queues and browser resources for web data extraction tasks.