3 repos

Awesome GitHub RepositoriesWeb Scraping

Tools and frameworks for extracting structured information and media from websites through defined rules or automated processes.

Explore 3 awesome GitHub repositories matching web development · Web Scraping. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

firecrawl/firecrawl
firecrawl/firecrawl
84,034GitHubView on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
TypeScriptaiai-agentsai-crawler
scrapy/scrapy
scrapy/scrapy
59,824GitHubView on GitHub
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-
Pythoncrawlercrawlingframework
soimort/you-get
soimort/you-get
56,737GitHubView on GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f
Python

Explore sub-tags

3 repos

Tools and frameworks for extracting structured information and media from websites through defined rules or automated processes.

Explore 3 awesome GitHub repositories matching web development · Web Scraping. Refine with filters or upvote what's useful.

We'll search the best matching repositories with AI.

firecrawl/firecrawl
firecrawl/firecrawl
84,034GitHubView on GitHub
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi
TypeScriptaiai-agentsai-crawler
scrapy/scrapy
scrapy/scrapy
59,824GitHubView on GitHub
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-
Pythoncrawlercrawlingframework
soimort/you-get
soimort/you-get
56,737GitHubView on GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f
Python