What are the best Awesome JavaScript Crawling Frameworks GitHub Repositories?

Node.js libraries for web scraping, browser automation, and crawling. Explore 16 awesome GitHub repositories matching part of an awesome list · JavaScript Crawling Frameworks. Refine with filters or upvote what's useful. Top picks: apify/crawlee, bda-research/node-crawler, lapwinglabs/x-ray, projectdiscovery/naabu, yujiosaka/headless-chrome-crawler, hakluke/hakrawler, gerbenjavado/linkfinder, rchipka/node-osmosis, ionicabizau/scrape-it, ruipgil/scraperjs.

Why is apify/crawlee a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Reliable browser automation and scraping library.

Why is bda-research/node-crawler a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Simple API-driven crawler for Node.js.

Why is lapwinglabs/x-ray a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Web scraper with pagination and crawler support.

Why is projectdiscovery/naabu a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Parses JavaScript files during crawling to discover hidden API endpoints and routes.

Why is yujiosaka/headless-chrome-crawler a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Headless Chrome crawler with jQuery support.

Why is hakluke/hakrawler a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Extracts JavaScript file locations from web pages to find potential endpoints or hidden functionality.

Why is gerbenjavado/linkfinder a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Extracts URLs and routes from JavaScript code using regular expressions to uncover hidden API endpoints.

Why is rchipka/node-osmosis a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

HTML/XML parser and scraper for Node.js.

Why is ionicabizau/scrape-it a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Human-friendly scraper for Node.js.

Why is ruipgil/scraperjs a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Versatile web scraper for Node.js.

16 repositorios

Awesome GitHub RepositoriesJavaScript Crawling Frameworks

Node.js libraries for web scraping, browser automation, and crawling.

Explore 16 awesome GitHub repositories matching part of an awesome list · JavaScript Crawling Frameworks. Refine with filters or upvote what's useful.

Encuentra los mejores repositorios con IA.Buscaremos los repositorios que mejor coincidan usando IA.

apify/crawlee
apify/crawlee
24,002Ver en GitHub
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
Reliable browser automation and scraping library.
TypeScriptapifyautomationcrawler
Ver en GitHub24,002
bda-research/node-crawler
bda-research/node-crawler
6,785Ver en GitHub
node-crawler is a programmable web crawler for Node.js that manages request queues and automates data extraction. It functions as a rate-limited HTTP client and a headless HTML parser, providing the infrastructure to visit large sets of URLs asynchronously while preventing duplicate processing through task deduplication. The project distinguishes itself through a proxy rotation manager that cycles user agents and proxy servers to bypass access restrictions. It utilizes the HTTP/2 protocol to improve request performance and server compatibility during large-scale scraping operations. The syst
Simple API-driven crawler for Node.js.
TypeScriptcheeriocrawlerextract-data
Ver en GitHub6,785
lapwinglabs/x-ray
lapwinglabs/x-ray
5,904Ver en GitHub
X-Ray es un framework de scraping web y crawler web asíncrono diseñado para extraer datos estructurados de sitios web. Funciona como un extractor de datos HTML que transforma el contenido de páginas sin formato en un esquema definido utilizando selectores de estilo CSS. El proyecto implementa un crawler de navegador headless capaz de ejecutar JavaScript para renderizar contenido dinámico. Maneja el descubrimiento de contenido de sitios web a través de una estrategia de rastreo en anchura y descubrimiento automático de paginación para recorrer conjuntos de resultados de múltiples páginas. El framework gestiona tuberías de datos web utilizando una cola de solicitudes con concurrencia limitada y control de tasa de solicitudes para regular las llamadas de red salientes. Los resultados extraídos se manejan mediante persistencia de datos basada en flujos para procesar grandes conjuntos de datos sin sobrecargar la memoria del sistema.
Web scraper with pagination and crawler support.
JavaScript
Ver en GitHub5,904
projectdiscovery/naabu
projectdiscovery/naabu
5,766Ver en GitHub
Naabu is a port scanner library and tool that probes hosts for open ports using SYN, CONNECT, and UDP methods to identify active services. It functions as a Go library for embedding port scanning into programs, and as a standalone tool that accepts targets as hostnames, IP addresses, CIDR ranges, or ASN numbers. The tool discovers live hosts before scanning, filters ports by range or top lists, and can integrate with Nmap for service version detection. The project distinguishes itself through its SYN-based port probing approach that sends TCP SYN packets and analyzes responses without complet
Parses JavaScript files during crawling to discover hidden API endpoints and routes.
Gocdn-exclusionhacktoberfestnmap
Ver en GitHub5,766
yujiosaka/headless-chrome-crawler
yujiosaka/headless-chrome-crawler
5,643Ver en GitHub
Este proyecto es un framework de rastreo web (web crawler) distribuido y headless Chrome para la extracción de datos. Funciona como un motor de renderizado de JavaScript que utiliza un navegador headless para procesar páginas dinámicas, extrayendo datos estructurados de sitios web que requieren ejecución de JavaScript. El sistema está diseñado para la recolección de datos escalable a través de múltiples nodos, utilizando sincronización de tareas distribuida y cachés compartidas para evitar el trabajo duplicado. Se distingue por la capacidad de emular entornos de cliente específicos configurando user agents y dimensiones de viewport, mientras captura evidencia visual como capturas de pantalla de la página. El framework cubre una gestión integral del rastreo, incluyendo programación de solicitudes con cola de prioridad, recorrido en profundidad y en anchura, y cumplimiento de archivos robots.txt y sitemap.xml. Proporciona herramientas para limitar la concurrencia, monitoreo de eventos y streaming de datos extraídos en formatos CSV o JSON.
Headless Chrome crawler with jQuery support.
JavaScript
Ver en GitHub5,643
hakluke/hakrawler
hakluke/hakrawler
4,993Ver en GitHub
Hakrawler is a command-line web spider tool designed for security reconnaissance, built to crawl target websites and extract hyperlinks along with JavaScript file references. As a focused reconnaissance utility, it collects every discoverable URL and script source from a given domain, mapping the attack surface for penetration testing and vulnerability assessment. The tool differentiates itself through its concurrent architecture: a fixed-size goroutine pool fetches pages in parallel, while CSS selectors parse HTML to extract anchor and script references. A depth-aware recursion limiter preve
Extracts JavaScript file locations from web pages to find potential endpoints or hidden functionality.
Gobugbountycrawlinghacking
Ver en GitHub4,993
gerbenjavado/linkfinder
GerbenJavado/LinkFinder
4,390Ver en GitHub
LinkFinder es una herramienta de reconocimiento de seguridad y análisis estático diseñada para el descubrimiento de endpoints en JavaScript. Extrae URLs absolutas y relativas y parámetros de archivos JavaScript para mapear la superficie de ataque de aplicaciones web e identificar rutas de API ocultas. La herramienta opera a través de análisis de código estático y coincidencia de patrones de expresiones regulares para encontrar endpoints sin ejecutar el código fuente. Incluye un procesador de datos para importar archivos exportados desde Burp Suite, permitiendo el análisis por lotes de múltiples activos JavaScript en una sola ejecución. El sistema proporciona capacidades para el análisis a nivel de dominio y filtrado específico de dominio para enfocar el descubrimiento en objetivos específicos. También cuenta con notificaciones de detección de palabras clave para alertar a los usuarios cuando cadenas específicas aparecen en los resultados, y soporta la exportación de datos descubiertos en formatos de texto plano o HTML.
Extracts URLs and routes from JavaScript code using regular expressions to uncover hidden API endpoints.
Python
Ver en GitHub4,390
rchipka/node-osmosis
rchipka/node-osmosis
4,110Ver en GitHub
Este proyecto es un framework de web scraping en Node.js diseñado para automatizar la extracción de datos a través de un flujo de trabajo programático de peticiones, análisis e interacción con documentos. Funciona como un crawler web headless, un gestor de peticiones HTTP y un parser y extractor de DOM. El framework se distingue por combinar un motor de ejecución de JavaScript para interactuar con contenido dinámico y un sistema de selección híbrido que utiliza selectores CSS y XPath. Incluye middleware especializado para la rotación de proxies y la gestión de sesiones de cookie-jar para mantener estados autenticados y gestionar tráfico automatizado. Sus capacidades más amplias cubren el rastreo recursivo de enlaces, el manejo de paginación y la automatización de formularios web. La herramienta también proporciona funciones de gestión de tráfico como limitación de tasa de peticiones mediante retrasos temporizados y configuración de cabeceras HTTP personalizadas.
HTML/XML parser and scraper for Node.js.
JavaScript
Ver en GitHub4,110
ionicabizau/scrape-it
IonicaBizau/scrape-it
4,074Ver en GitHub
scrape-it is a Node.js web scraper and HTML parser designed to extract structured data from websites and HTML files. It functions as a web data extraction tool that retrieves specific information from DOM elements and converts web content into usable data fields. The tool uses CSS selectors to target specific data points and employs schema-driven data mapping to organize unstructured web text into a consistent format. It supports custom value transformation to convert raw extracted strings into specific data formats. The system provides capabilities for web data extraction and automated cont
Human-friendly scraper for Node.js.
JavaScripthacktoberfestnode-scraperscraper
Ver en GitHub4,074
ruipgil/scraperjs
ruipgil/scraperjs
3,718Ver en GitHub
Scraperjs is a web scraper module that make scraping the web an easy job.
Versatile web scraper for Node.js.
JavaScript
Ver en GitHub3,718
cgiffard/node-simplecrawler
cgiffard/node-simplecrawler
2,133Ver en GitHub
simplecrawler is designed to provide a basic, flexible and robust API for crawling websites. It was written to archive, analyse, and search some very large websites and has happily chewed through hundreds of thousands of pages and written tens of gigabytes to disk without issue.
Event-driven web crawler for Node.js.
JavaScript
Ver en GitHub2,133
martinsbalodis/web-scraper-chrome-extension
martinsbalodis/web-scraper-chrome-extension
1,364Ver en GitHub
Web Scraper is a chrome browser extension built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data.…
Browser-based data extraction tool.
JavaScript
Ver en GitHub1,364
zhuyingda/webster
zhuyingda/webster
559Ver en GitHub
Webster is a reliable web crawling and scraping framework written with Node.js, used to crawl websites and extract structured data from their pages.
Framework for scraping AJAX and JavaScript-rendered content.
JavaScript
Ver en GitHub559
brendonboshell/supercrawler
brendonboshell/supercrawler
381Ver en GitHub
Supercrawler is a Node.js web crawler. It is designed to be highly configurable and easy to use.
Crawler with custom handlers and rate limiting.
JavaScript
Ver en GitHub381
antivanov/js-crawler
antivanov/js-crawler
257Ver en GitHub
js-crawler
Node.js crawler supporting HTTP and HTTPS.
TypeScript
Ver en GitHub257
n0tan3rd/squidwarc
n0tan3rd/squidwarc
176Ver en GitHub
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head.
High-fidelity archival crawler using Chrome.
JavaScript
Ver en GitHub176

Awesome JavaScript Crawling Frameworks GitHub Repositories

apify/crawlee

bda-research/node-crawler

lapwinglabs/x-ray

projectdiscovery/naabu

yujiosaka/headless-chrome-crawler

hakluke/hakrawler

GerbenJavado/LinkFinder

rchipka/node-osmosis

IonicaBizau/scrape-it

ruipgil/scraperjs

cgiffard/node-simplecrawler

martinsbalodis/web-scraper-chrome-extension

zhuyingda/webster

brendonboshell/supercrawler

antivanov/js-crawler

n0tan3rd/squidwarc

Explorar subetiquetas