What are the best open-source alternatives to Goutte?

30 open-source projects similar to friendsofphp/goutte, ranked by shared features. Top picks: apify/crawlee, firecrawl/firecrawl, crawlab-team/crawlab, any4ai/anycrawl, jaypyles/scraperr, hickford/mechanicalsoup, s0md3v/photon, node-fetch/node-fetch, kepano/defuddle, freeok/so-novel.

Is apify/crawlee a good alternative to Goutte?

Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rend…

Is firecrawl/firecrawl a good alternative to Goutte?

Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gatheri…

Is crawlab-team/crawlab a good alternative to Goutte?

Crawlab is a distributed web scraping platform designed to centralize the management, deployment, and execution of large-scale data extraction tasks. It functions as a control plane that orchestrates scraping scripts and automated workflows across multiple nodes, providing a unified environment for…

Is any4ai/anycrawl a good alternative to Goutte?

AnyCrawl is an AI-powered data extractor, automated web crawler, and headless browser orchestrator. It serves as a web content extraction API and a gateway that connects crawling and scraping tools to language models using a standardized API protocol. The project specializes in converting unstruct…

Is jaypyles/scraperr a good alternative to Goutte?

Scraperr is a self-hosted web scraping and crawling platform designed for extracting structured data from websites using XPath selectors. It functions as a containerized system for managing scraping jobs through a queue and analyzing the resulting content using artificial intelligence. The project…

Is hickford/mechanicalsoup a good alternative to Goutte?

MechanicalSoup is a Python web automation library designed to simulate browser behavior. It functions as a toolkit for web scraping and automation, providing an HTML parsing engine and an HTTP session manager to interact with websites programmatically. The library enables headless web interaction…

Is s0md3v/photon a good alternative to Goutte?

Photon is a command-line web crawler designed for security reconnaissance and information gathering. It systematically traverses websites to discover URLs, map domain infrastructure, and identify associated subdomains by retrieving DNS records. The tool distinguishes itself through its ability to…

Is node-fetch/node-fetch a good alternative to Goutte?

node-fetch is a promise-based HTTP client library that provides a lightweight implementation of the Fetch API for the Node.js runtime. It serves as a network interface for performing asynchronous HTTP requests, handling server communication, and managing headers. The library utilizes a promise-bas…

Is kepano/defuddle a good alternative to Goutte?

Defuddle is a command line web parser and content extractor designed to isolate the primary article body from web pages and convert the result into standardized markdown. It functions as a content cleaner that removes layout clutter, such as sidebars and headers, to retrieve the main text and assoc…

Is freeok/so-novel a good alternative to Goutte?

so-novel is a web novel downloader and scraping engine designed to extract structured text from websites and convert it into electronic book formats. It functions as a multi-interface content extractor, providing a shared backend accessible via a web-based management dashboard, a terminal user inte…

Back to friendsofphp/goutte

Open-source alternatives to Goutte

30 open-source projects similar to friendsofphp/goutte, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Goutte alternative.

apify/crawlee
apify/crawlee
24,002View on GitHub
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
TypeScriptapifyautomationcrawler
View on GitHub24,002
firecrawl/firecrawl
firecrawl/firecrawl
133,479View on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live
TypeScriptaiai-agentsai-crawler
View on GitHub133,479
crawlab-team/crawlab
crawlab-team/crawlab
12,217View on GitHub
Crawlab is a distributed web scraping platform designed to centralize the management, deployment, and execution of large-scale data extraction tasks. It functions as a control plane that orchestrates scraping scripts and automated workflows across multiple nodes, providing a unified environment for managing complex data collection operations. The platform distinguishes itself through a distributed architecture that coordinates worker nodes via a central master, utilizing real-time communication to maintain oversight of all active processes. It ensures operational consistency by isolating task
Gocrawlabcrawlercrawling-tasks
View on GitHub12,217

Open-source alternatives to Goutte

apify/crawlee

firecrawl/firecrawl

crawlab-team/crawlab

any4ai/AnyCrawl

jaypyles/Scraperr

hickford/MechanicalSoup

s0md3v/Photon

node-fetch/node-fetch

kepano/defuddle

freeok/so-novel

php-webdriver/php-webdriver

MechanicalSoup/MechanicalSoup

gosom/google-maps-scraper

projectdiscovery/katana

curl/curl

sparklemotion/mechanize

Kr1s77/awesome-python-login-model

camel-ai/camel

zlzforever/DotnetSpider

asciimoo/colly

REMitchell/python-scraping

code4craft/webmagic

binux/pyspider

bda-research/node-crawler

lining0806/PythonSpiderNotes

lorien/web-scraping

projectdiscovery/subfinder

dotnetcore/DotnetSpider

hongyangAndroid/okhttputils

encode/httpx