What are the best open-source alternatives to Colly?

30 open-source projects similar to gocolly/colly, ranked by shared features. Top picks: apify/crawlee, henrylee2cn/pholcus, hu17889/go_spider, omkarcloud/botasaurus, crawlab-team/crawlab, lmcache/lmcache, pubkey/rxdb, prefecthq/fastmcp, nanmicoder/mediacrawler, mechanicalsoup/mechanicalsoup.

Is apify/crawlee a good alternative to Colly?

Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rend…

Is henrylee2cn/pholcus a good alternative to Colly?

Pholcus is a distributed web crawler framework written in Go designed for high-concurrency data extraction. It functions as a distributed crawling orchestrator and dynamic data extraction engine, utilizing a server-client architecture to coordinate tasks across multiple nodes. The system integrate…

Is hu17889/go_spider a good alternative to Colly?

Go Spider is a modular framework designed for building concurrent web scrapers and data extraction workflows. It provides a structured engine for orchestrating automated crawling tasks, managing request scheduling, and processing web content through a unified pipeline. The framework distinguishes…

Is omkarcloud/botasaurus a good alternative to Colly?

Botasaurus is a Python web scraping framework and headless browser automation system used to build scalable data extraction tools. It functions as a web data extraction tool and OCR document parser, converting website content, images, and PDF files into structured formats such as JSON, CSV, and Exc…

Is crawlab-team/crawlab a good alternative to Colly?

Crawlab is a distributed web scraping platform designed to centralize the management, deployment, and execution of large-scale data extraction tasks. It functions as a control plane that orchestrates scraping scripts and automated workflows across multiple nodes, providing a unified environment for…

Is lmcache/lmcache a good alternative to Colly?

LMCache is a distributed key-value cache manager and tiering system designed to accelerate large language model inference. It functions as a tiered storage layer that offloads tensors from GPU memory to CPU RAM, local disks, or remote object stores, enabling the reuse of cached prefixes across diff…

Is pubkey/rxdb a good alternative to Colly?

This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of tru…

Is prefecthq/fastmcp a good alternative to Colly?

FastMCP is a Python framework designed for building servers that expose functions, resources, and prompts to AI models using the Model Context Protocol. It simplifies the development process by automatically deriving tool metadata, input schemas, and documentation directly from Python function sign…

Is nanmicoder/mediacrawler a good alternative to Colly?

MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts nece…

Is mechanicalsoup/mechanicalsoup a good alternative to Colly?

MechanicalSoup is a Python web automation library and scraping framework designed to simulate browser sessions and navigate websites without requiring JavaScript execution. It functions as an HTML parsing tool and HTTP session manager, allowing for the programmatic retrieval of page content and the…

Back to gocolly/colly

Open-source alternatives to Colly

30 open-source projects similar to gocolly/colly, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Colly alternative.

apify/crawlee
apify/crawlee
24,002View on GitHub
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
TypeScriptapifyautomationcrawler
View on GitHub24,002
henrylee2cn/pholcus
henrylee2cn/pholcus
7,578View on GitHub
Pholcus is a distributed web crawler framework written in Go designed for high-concurrency data extraction. It functions as a distributed crawling orchestrator and dynamic data extraction engine, utilizing a server-client architecture to coordinate tasks across multiple nodes. The system integrates a headless browser engine to render dynamic content and execute JavaScript, allowing it to extract data from single-page applications. It features a web-based management interface for configuring spider parameters and monitoring execution progress, alongside the ability to update extraction rules v
Go
View on GitHub7,578
hu17889/go_spider
hu17889/go_spider
1,821View on GitHub
Go Spider is a modular framework designed for building concurrent web scrapers and data extraction workflows. It provides a structured engine for orchestrating automated crawling tasks, managing request scheduling, and processing web content through a unified pipeline. The framework distinguishes itself through a highly configurable architecture that allows developers to inject custom logic for downloaders, schedulers, and storage components via interface-driven contracts. It manages network interactions using middleware-based request throttling and URL deduplication, ensuring that crawling o
Gocrawlergopipeline
View on GitHub1,821

Open-source alternatives to Colly

apify/crawlee

henrylee2cn/pholcus

hu17889/go_spider

omkarcloud/botasaurus

crawlab-team/crawlab

LMCache/LMCache

pubkey/rxdb

PrefectHQ/fastmcp

NanmiCoder/MediaCrawler

MechanicalSoup/MechanicalSoup

AngleSharp/AngleSharp

freeok/so-novel

keiyoushi/extensions-source

psf/requests-html

hardkoded/puppeteer-sharp

lapwinglabs/x-ray

go-rod/rod

Kr1s77/awesome-python-login-model

qeeqbox/social-analyzer

NanmiCoder/CrawlerTutorial

REMitchell/python-scraping

hect0x7/JMComic-Crawler-Python

GoogleChrome/puppeteer

mherrmann/helium

speedyapply/JobSpy

constverum/ProxyBroker

qd-today/qd

Panniantong/Agent-Reach

Alibaba-NLP/WebAgent

kennethreitz/grequests