Cocrawler

CoCrawler is a versatile web crawler built using modern tools and concurrency.

Features

Python Crawling Frameworks - Versatile crawler built with modern concurrency tools.

chineking/cola

A high-level distributed crawling framework.

codelucas/newspaper

Newspaper is a Python library designed for scraping, parsing, and analyzing web-based information. It functions as a framework for automated news aggregation and large-scale web content extraction, providing tools to download, clean, and structure text, metadata, and media from diverse online sources. The project distinguishes itself through a pipeline-oriented architecture that combines heuristic-based content extraction with natural language processing. It automatically identifies and isolates article bodies from web page boilerplate while simultaneously performing language detection, keywo

douban/brownant

157View on GitHub

binux/pyspider

16,809View on GitHub

PySpider is a Python web crawling framework designed for automated data extraction. It provides a pipeline for periodically fetching web content, processing HTML, and persisting scraped information into database backends. The system features a web-based management interface for editing scraping scripts, monitoring task progress, and reviewing collected data. It includes a headless browser JavaScript renderer to capture rendered HTML from dynamic web pages and a distributed architecture that uses message queues to scale crawling workloads across multiple nodes. The framework also covers task

chineking/cola

1,501View on GitHub

A high-level distributed crawling framework.

cocrawlercocrawler

Features

Open-source alternatives to Cocrawler

chineking/cola

codelucas/newspaper

douban/brownant

binux/pyspider

Star history

Open-source alternatives to Cocrawler

chineking/cola

codelucas/newspaper

douban/brownant

binux/pyspider