What are the best open-source alternatives to Scrapy?

30 open-source projects similar to scrapy/scrapy, ranked by shared features. Top picks: apify/crawlee, unclecode/crawl4ai, binux/pyspider, firecrawl/firecrawl, cantino/huginn, crawlab-team/crawlab, browser-use/browser-use, wistbean/learn_python3_spider, encode/httpx, hyperium/hyper.

Is apify/crawlee a good alternative to Scrapy?

Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rend…

Is unclecode/crawl4ai a good alternative to Scrapy?

Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like s…

Is binux/pyspider a good alternative to Scrapy?

PySpider is a Python web crawling framework designed for automated data extraction. It provides a pipeline for periodically fetching web content, processing HTML, and persisting scraped information into database backends. The system features a web-based management interface for editing scraping sc…

Is firecrawl/firecrawl a good alternative to Scrapy?

Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gatheri…

Is cantino/huginn a good alternative to Scrapy?

Huginn is an open-source automation platform that functions as an event-driven task automator and webhook integration engine. It enables the creation of agents that monitor web data and automate tasks across various web services, operating as a self-hosted web scraper and JavaScript workflow orches…

Is crawlab-team/crawlab a good alternative to Scrapy?

Crawlab is a distributed web scraping platform designed to centralize the management, deployment, and execution of large-scale data extraction tasks. It functions as a control plane that orchestrates scraping scripts and automated workflows across multiple nodes, providing a unified environment for…

Is browser-use/browser-use a good alternative to Scrapy?

Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex,…

Is wistbean/learn_python3_spider a good alternative to Scrapy?

This project is a comprehensive educational guide and framework for building web scrapers using Python. It provides a course-based approach to data extraction, combining a Python crawler framework with tutorials on web reverse engineering and network traffic analysis. The project distinguishes its…

Is encode/httpx a good alternative to Scrapy?

This project is a comprehensive Python network request framework designed for both synchronous and asynchronous HTTP communication. It provides a high-performance client capable of executing non-blocking requests within event-driven applications, while also supporting standard blocking calls for si…

Is hyperium/hyper a good alternative to Scrapy?

Hyper is a low-level networking library designed for building high-performance HTTP clients and servers. It provides a foundational toolkit for creating network services that leverage asynchronous execution and memory-safe data handling, supporting both HTTP/1 and HTTP/2 protocols. The library dis…

Back to scrapy/scrapy

Open-source alternatives to Scrapy

30 open-source projects similar to scrapy/scrapy, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Scrapy alternative.

apify/crawlee
apify/crawlee
24,002View on GitHub
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
TypeScriptapifyautomationcrawler
View on GitHub24,002
unclecode/crawl4ai
unclecode/crawl4ai
68,644View on GitHub
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs. By integrating language models directly into the extraction workflow, the system converts raw HTML into clean, structured data or Markdown files optimized for downstream ingestion. The platform distinguishes itself through a distributed, self-hosted infrastructure that manages l
Python
View on GitHub68,644
binux/pyspider
binux/pyspider
16,809View on GitHub
PySpider is a Python web crawling framework designed for automated data extraction. It provides a pipeline for periodically fetching web content, processing HTML, and persisting scraped information into database backends. The system features a web-based management interface for editing scraping scripts, monitoring task progress, and reviewing collected data. It includes a headless browser JavaScript renderer to capture rendered HTML from dynamic web pages and a distributed architecture that uses message queues to scale crawling workloads across multiple nodes. The framework also covers task
Python
View on GitHub16,809

Open-source alternatives to Scrapy

apify/crawlee

unclecode/crawl4ai

binux/pyspider

firecrawl/firecrawl

cantino/huginn

crawlab-team/crawlab

browser-use/browser-use

wistbean/learn_python3_spider

encode/httpx

hyperium/hyper

typicode/json-server

aws/aws-sdk-js

jmcarp/robobrowser

D4Vinci/Scrapling

sympy/sympy

matiasb/demiurge

chineking/cola

ScrapeGraphAI/Scrapegraph-ai

soimort/you-get

python-babel/babel

beeware/briefcase

scrapinghub/portia

psf/requests

python/mypy

h2oai/wave

cookiecutter/cookiecutter

fastapi/fastapi

jupyter/notebook

hickford/MechanicalSoup

pytest-dev/pytest