2 repos

Awesome GitHub RepositoriesWeb Crawling

Systems designed to systematically discover, navigate, and index web content across domains for large-scale data collection.

Explore 2 awesome GitHub repositories matching web development · Web Crawling. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

firecrawl/firecrawl
firecrawl/firecrawl
84,034GitHubView on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
TypeScriptaiai-agentsai-crawler
unclecode/crawl4ai
unclecode/crawl4ai
60,452GitHubView on GitHub
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.
Python

Explore sub-tags

2 repos

Systems designed to systematically discover, navigate, and index web content across domains for large-scale data collection.

Explore 2 awesome GitHub repositories matching web development · Web Crawling. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

firecrawl/firecrawl
firecrawl/firecrawl
84,034GitHubView on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
TypeScriptaiai-agentsai-crawler
unclecode/crawl4ai
unclecode/crawl4ai
60,452GitHubView on GitHub
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.
Python