3 repos
We curate 3 GitHub repositories matching data & databases · Data Extraction Tools. Refine with filters or upvote what's useful.
Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media f