6 repos

Awesome GitHub RepositoriesWeb Scraping and Automation

Systems for automating browser interactions and crawling web content at scale.

Explore 6 awesome GitHub repositories matching web development · Web Scraping and Automation. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

openclaw/openclaw
openclaw/openclaw
211,971GitHubView on GitHub
Openclaw is a platform for managing agent execution environments, providing the infrastructure to control agent lifecycles, session state, and workspace persistence. It features a centralized gateway that handles model loops, tool invocation, and streaming events, while supporting multi-agent routing and persistent mem
TypeScriptaiassistantcrustacean
firecrawl/firecrawl
firecrawl/firecrawl
84,034GitHubView on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
TypeScriptaiai-agentsai-crawler
OpenHands/OpenHands
OpenHands/OpenHands
67,974GitHubView on GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Pythonagentartificial-intelligencechatgpt
unclecode/crawl4ai
unclecode/crawl4ai
60,452GitHubView on GitHub
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.
Python
scrapy/scrapy
scrapy/scrapy
59,824GitHubView on GitHub
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-
Pythoncrawlercrawlingframework
soimort/you-get
soimort/you-get
56,737GitHubView on GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f
Python

Explore sub-tags

6 repos

Awesome GitHub RepositoriesWeb Scraping and Automation

Systems for automating browser interactions and crawling web content at scale.

Explore 6 awesome GitHub repositories matching web development · Web Scraping and Automation. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

openclaw/openclaw
openclaw/openclaw
211,971GitHubView on GitHub
Openclaw is a platform for managing agent execution environments, providing the infrastructure to control agent lifecycles, session state, and workspace persistence. It features a centralized gateway that handles model loops, tool invocation, and streaming events, while supporting multi-agent routing and persistent mem
TypeScriptaiassistantcrustacean
firecrawl/firecrawl
firecrawl/firecrawl
84,034GitHubView on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
TypeScriptaiai-agentsai-crawler
OpenHands/OpenHands
OpenHands/OpenHands
67,974GitHubView on GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Pythonagentartificial-intelligencechatgpt
unclecode/crawl4ai
unclecode/crawl4ai
60,452GitHubView on GitHub
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.
Python
scrapy/scrapy
scrapy/scrapy
59,824GitHubView on GitHub
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-
Pythoncrawlercrawlingframework
soimort/you-get
soimort/you-get
56,737GitHubView on GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f
Python

Awesome Web Scraping and Automation GitHub Repositories

openclaw/openclaw

firecrawl/firecrawl

OpenHands/OpenHands

unclecode/crawl4ai

scrapy/scrapy

soimort/you-get

Explore sub-tags

Awesome Web Scraping and Automation GitHub Repositories

openclaw/openclaw

firecrawl/firecrawl

OpenHands/OpenHands

unclecode/crawl4ai

scrapy/scrapy

soimort/you-get

Explore sub-tags