What are the best open-source alternatives to Crawlee Python?

30 open-source projects similar to apify/crawlee-python, ranked by shared features. Top picks: apify/crawlee, mendableai/firecrawl, lorien/web-scraping, nanmicoder/crawlertutorial, browserbase/mcp-server-browserbase, browserbase/stagehand, omkarcloud/botasaurus, camel-ai/camel, ultrafunkamsterdam/nodriver, code4craft/webmagic.

Is apify/crawlee a good alternative to Crawlee Python?

Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rend…

Is mendableai/firecrawl a good alternative to Crawlee Python?

Firecrawl is a headless browser automation tool and web crawling engine designed to extract structured data from the web. It functions as an API that transforms raw website content and documents into clean markdown and JSON formats to serve as context for large language models. The project disting…

Is lorien/web-scraping a good alternative to Crawlee Python?

This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats.…

Is nanmicoder/crawlertutorial a good alternative to Crawlee Python?

CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to…

Is browserbase/mcp-server-browserbase a good alternative to Crawlee Python?

This project is an MCP browser automation server that connects large language models to headless cloud browsers. It functions as an autonomous web workflow engine and an LLM web agent interface, enabling the translation of natural language instructions into browser actions and structured data retri…

Is browserbase/stagehand a good alternative to Crawlee Python?

Stagehand is an AI-native browser automation framework that enables developers to build reliable web automations using a hybrid of natural language instructions and deterministic TypeScript code.

Is omkarcloud/botasaurus a good alternative to Crawlee Python?

Botasaurus is a Python web scraping framework and headless browser automation system used to build scalable data extraction tools. It functions as a web data extraction tool and OCR document parser, converting website content, images, and PDF files into structured formats such as JSON, CSV, and Exc…

Is camel-ai/camel a good alternative to Crawlee Python?

This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models…

Is ultrafunkamsterdam/nodriver a good alternative to Crawlee Python?

nodriver is an asynchronous Chromium browser automation framework that provides headless control and web scraping capabilities. It functions as a Chrome DevTools Protocol client, allowing for granular engine control by attaching directly to the browser's debug port without the need for external dri…

Is code4craft/webmagic a good alternative to Crawlee Python?

Webmagic is a Java web crawling framework designed for building scalable automated crawlers to download and process large volumes of web pages. It functions as a distributed web crawler and dynamic content crawler, utilizing an XPath HTML parser to locate and extract specific data points from page…

Back to apify/crawlee-python

Open-source alternatives to Crawlee Python

30 open-source projects similar to apify/crawlee-python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Crawlee Python alternative.

apify/crawlee
apify/crawlee
24,002View on GitHub
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
TypeScriptapifyautomationcrawler
View on GitHub24,002
mendableai/firecrawl
mendableai/firecrawl
139,399View on GitHub
Firecrawl is a headless browser automation tool and web crawling engine designed to extract structured data from the web. It functions as an API that transforms raw website content and documents into clean markdown and JSON formats to serve as context for large language models. The project distinguishes itself by using natural language prompts to translate human instructions into targeted data extraction tasks and browser actions. It can execute interactive page navigation, such as clicking and scrolling, and perform automated web research to retrieve structured data without manual interventi
TypeScript
View on GitHub139,399
lorien/web-scraping
lorien/web-scraping
7,931View on GitHub
This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats. The toolkit focuses on advanced data collection strategies, including headless browser automation to interact with JavaScript and a suite of network utilities for DNS resolution and WebSocket connections. It specifically covers methods for bypassing bot protections through proxy pool management, us
Makefile
View on GitHub7,931

Open-source alternatives to Crawlee Python

apify/crawlee

mendableai/firecrawl

lorien/web-scraping

NanmiCoder/CrawlerTutorial

browserbase/mcp-server-browserbase

browserbase/stagehand

omkarcloud/botasaurus

camel-ai/camel

ultrafunkamsterdam/nodriver

code4craft/webmagic

any4ai/AnyCrawl

h4ckf0r0day/obscura

FlareSolverr/FlareSolverr

oxylabs/how-to-scrape-amazon-product-data

getmaxun/maxun

hickford/MechanicalSoup

codelucas/newspaper

ultrafunkamsterdam/undetected-chromedriver

scrapinghub/splash

go-rod/rod

psf/requests-html

FellouAI/eko

cheeriojs/cheerio

garrytan/gstack

wistbean/learn_python3_spider

itsOwen/CyberScraper-2077

steel-dev/steel-browser

henrylee2cn/pholcus

CloakHQ/CloakBrowser

shengqiangzhang/examples-of-web-crawlers