1 repo
Self-hosted environments that manage asynchronous job queues and browser resources for large-scale data collection.
Explore 1 awesome GitHub repository matching web development · Distributed Crawling Systems. Refine with filters or upvote what's useful.
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.