Crawlee-python is a web crawling framework for building scalable scrapers using Python. It serves as a comprehensive tool for web scraping automation, providing a system to extract structured data from websites using both lightweight HTTP requests and headless browser automation.
The framework is distinguished by its anti-bot evasion capabilities, which include browser fingerprint impersonation and tiered proxy rotation to bypass detection systems and solve challenges such as Cloudflare. It also incorporates artificial intelligence for autonomous website navigation and schema-based data extraction, reducing the need for manual selector maintenance.
The system covers a broad range of capability areas, including headless browser orchestration, recursive crawling workflows, and persistent request queue management. It features automated data extraction using CSS selectors, adaptive concurrency scaling based on system load, and a unified storage interface for managing datasets and key-value stores. Monitoring and observability are handled through resource health tracking, error snapshot capture, and OpenTelemetry-compatible metrics.
Users can accelerate project setup via a command-line interface for bootstrapping and deploy their crawlers using Docker or cloud environments.