What are the best Awesome JavaScript Crawling Frameworks GitHub Repositories?

Node.js libraries for web scraping, browser automation, and crawling. Explore 16 awesome GitHub repositories matching part of an awesome list · JavaScript Crawling Frameworks. Refine with filters or upvote what's useful. Top picks: apify/crawlee, bda-research/node-crawler, lapwinglabs/x-ray, projectdiscovery/naabu, yujiosaka/headless-chrome-crawler, hakluke/hakrawler, gerbenjavado/linkfinder, rchipka/node-osmosis, ionicabizau/scrape-it, ruipgil/scraperjs.

Why is apify/crawlee a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Reliable browser automation and scraping library.

Why is bda-research/node-crawler a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Simple API-driven crawler for Node.js.

Why is lapwinglabs/x-ray a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Web scraper with pagination and crawler support.

Why is projectdiscovery/naabu a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Parses JavaScript files during crawling to discover hidden API endpoints and routes.

Why is yujiosaka/headless-chrome-crawler a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Headless Chrome crawler with jQuery support.

Why is hakluke/hakrawler a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Extracts JavaScript file locations from web pages to find potential endpoints or hidden functionality.

Why is gerbenjavado/linkfinder a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Extracts URLs and routes from JavaScript code using regular expressions to uncover hidden API endpoints.

Why is rchipka/node-osmosis a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

HTML/XML parser and scraper for Node.js.

Why is ionicabizau/scrape-it a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Human-friendly scraper for Node.js.

Why is ruipgil/scraperjs a recommended JavaScript Crawling Frameworks GitHub Repositories repository?

Versatile web scraper for Node.js.

16 مستودعات

Awesome GitHub RepositoriesJavaScript Crawling Frameworks

Node.js libraries for web scraping, browser automation, and crawling.

Explore 16 awesome GitHub repositories matching part of an awesome list · JavaScript Crawling Frameworks. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

apify/crawlee
apify/crawlee
24,002عرض على GitHub
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
Reliable browser automation and scraping library.
TypeScriptapifyautomationcrawler
عرض على GitHub24,002
bda-research/node-crawler
bda-research/node-crawler
6,785عرض على GitHub
node-crawler is a programmable web crawler for Node.js that manages request queues and automates data extraction. It functions as a rate-limited HTTP client and a headless HTML parser, providing the infrastructure to visit large sets of URLs asynchronously while preventing duplicate processing through task deduplication. The project distinguishes itself through a proxy rotation manager that cycles user agents and proxy servers to bypass access restrictions. It utilizes the HTTP/2 protocol to improve request performance and server compatibility during large-scale scraping operations. The syst
Simple API-driven crawler for Node.js.
TypeScriptcheeriocrawlerextract-data
عرض على GitHub6,785
lapwinglabs/x-ray
lapwinglabs/x-ray
5,904عرض على GitHub
X-Ray هو إطار عمل لكشط الويب ومزاحف ويب غير متزامن مصمم لاستخراج البيانات المهيكلة من المواقع. يعمل كمستخرج بيانات HTML يحول محتوى الصفحة الخام إلى مخطط محدد باستخدام محددات بنمط CSS. يطبق المشروع مزاحف متصفح بدون واجهة رسومية قادراً على تنفيذ JavaScript لعرض المحتوى الديناميكي. يتعامل مع اكتشاف محتوى الموقع من خلال استراتيجية زحف بالعرض أولاً واكتشاف الترقيم التلقائي لاجتياز مجموعات النتائج متعددة الصفحات. يدير إطار العمل خطوط أنابيب بيانات الويب باستخدام قائمة انتظار طلبات محدودة التزامن والتحكم في معدل الطلبات لتنظيم مكالمات الشبكة الصادرة. تتم معالجة النتائج المستخرجة عبر استمرارية البيانات القائمة على التدفق لمعالجة مجموعات البيانات الكبيرة دون تحميل ذاكرة النظام بشكل زائد.
Web scraper with pagination and crawler support.
JavaScript
عرض على GitHub5,904
projectdiscovery/naabu
projectdiscovery/naabu
5,766عرض على GitHub
Naabu is a port scanner library and tool that probes hosts for open ports using SYN, CONNECT, and UDP methods to identify active services. It functions as a Go library for embedding port scanning into programs, and as a standalone tool that accepts targets as hostnames, IP addresses, CIDR ranges, or ASN numbers. The tool discovers live hosts before scanning, filters ports by range or top lists, and can integrate with Nmap for service version detection. The project distinguishes itself through its SYN-based port probing approach that sends TCP SYN packets and analyzes responses without complet
Parses JavaScript files during crawling to discover hidden API endpoints and routes.
Gocdn-exclusionhacktoberfestnmap
عرض على GitHub5,766
yujiosaka/headless-chrome-crawler
yujiosaka/headless-chrome-crawler
5,643عرض على GitHub
هذا المشروع عبارة عن إطار عمل موزع لكشط الويب باستخدام Chrome بدون واجهة رسومية (headless). يعمل كمحرك عرض JavaScript يستخدم متصفحاً بدون واجهة لمعالجة الصفحات الديناميكية، واستخراج بيانات منظمة من مواقع الويب التي تتطلب تنفيذ JavaScript. تم تصميم النظام لجمع البيانات القابل للتوسع عبر عقد متعددة، باستخدام مزامنة المهام الموزعة وذاكرة التخزين المؤقت المشتركة لمنع العمل المكرر. ويتميز بالقدرة على محاكاة بيئات عميل محددة من خلال تكوين وكلاء المستخدم وأبعاد إطار العرض، مع التقاط أدلة بصرية مثل لقطات الشاشة للصفحات. يغطي إطار العمل إدارة شاملة للكشط، بما في ذلك جدولة الطلبات ذات الأولوية، والاجتياز بالعمق أولاً وبالعرض أولاً، والالتزام بملفات robots.txt وsitemap.xml. ويوفر أدوات لتحديد التزامن، ومراقبة الأحداث، وبث البيانات المستخرجة إلى تنسيقات CSV أو JSON.
Headless Chrome crawler with jQuery support.
JavaScript
عرض على GitHub5,643
hakluke/hakrawler
hakluke/hakrawler
4,993عرض على GitHub
Hakrawler is a command-line web spider tool designed for security reconnaissance, built to crawl target websites and extract hyperlinks along with JavaScript file references. As a focused reconnaissance utility, it collects every discoverable URL and script source from a given domain, mapping the attack surface for penetration testing and vulnerability assessment. The tool differentiates itself through its concurrent architecture: a fixed-size goroutine pool fetches pages in parallel, while CSS selectors parse HTML to extract anchor and script references. A depth-aware recursion limiter preve
Extracts JavaScript file locations from web pages to find potential endpoints or hidden functionality.
Gobugbountycrawlinghacking
عرض على GitHub4,993
gerbenjavado/linkfinder
GerbenJavado/LinkFinder
4,390عرض على GitHub
LinkFinder هي أداة استطلاع أمني وتحليل ثابت مصممة لاكتشاف نقاط نهاية JavaScript. تستخرج الروابط المطلقة والنسبية والمعلمات من ملفات JavaScript لرسم خريطة لسطح الهجوم لتطبيقات الويب وتحديد مسارات API المخفية. تعمل الأداة من خلال تحليل الكود الثابت ومطابقة أنماط التعبير العادي للعثور على نقاط النهاية دون تنفيذ الكود المصدري. تتضمن معالج بيانات لاستيراد الملفات المصدرة من Burp Suite، مما يتيح التحليل الدفعي لأصول JavaScript متعددة في تنفيذ واحد. يوفر النظام إمكانيات للتحليل على مستوى النطاق والتصفية الخاصة بالنطاق للتركيز على الأهداف المحددة. كما يتميز بإشعارات اكتشاف الكلمات الرئيسية لتنبيه المستخدمين عند ظهور سلاسل نصية معينة في النتائج، ويدعم تصدير البيانات المكتشفة بتنسيقات نص عادي أو HTML.
Extracts URLs and routes from JavaScript code using regular expressions to uncover hidden API endpoints.
Python
عرض على GitHub4,390
rchipka/node-osmosis
rchipka/node-osmosis
4,110عرض على GitHub
هذا المشروع هو إطار عمل لكشط الويب (web scraping) مبني على Node.js مصمم لأتمتة استخراج البيانات من خلال سير عمل برمجي للطلبات، والتحليل، وتفاعل المستندات. يعمل كزاحف ويب بدون رأس (headless)، ومدير طلبات HTTP، ومحلل ومستخرج DOM. يتميز إطار العمل بدمج محرك تنفيذ JavaScript للتفاعل مع المحتوى الديناميكي ونظام اختيار هجين يستخدم كلاً من محددات CSS وXPath. يتضمن برمجيات وسيطة (middleware) متخصصة لتدوير الوكيل (proxy rotation) وإدارة جلسة ملفات تعريف الارتباط للحفاظ على الحالات المصادق عليها وإدارة حركة المرور المؤتمتة. تغطي قدراته الأوسع زحف الروابط المتكرر، ومعالجة الترقيم، وأتمتة نماذج الويب. توفر الأداة أيضاً ميزات إدارة حركة المرور مثل تحديد معدل الطلبات من خلال تأخيرات زمنية وتكوين رؤوس HTTP مخصصة.
HTML/XML parser and scraper for Node.js.
JavaScript
عرض على GitHub4,110
ionicabizau/scrape-it
IonicaBizau/scrape-it
4,074عرض على GitHub
scrape-it is a Node.js web scraper and HTML parser designed to extract structured data from websites and HTML files. It functions as a web data extraction tool that retrieves specific information from DOM elements and converts web content into usable data fields. The tool uses CSS selectors to target specific data points and employs schema-driven data mapping to organize unstructured web text into a consistent format. It supports custom value transformation to convert raw extracted strings into specific data formats. The system provides capabilities for web data extraction and automated cont
Human-friendly scraper for Node.js.
JavaScripthacktoberfestnode-scraperscraper
عرض على GitHub4,074
ruipgil/scraperjs
ruipgil/scraperjs
3,718عرض على GitHub
Scraperjs is a web scraper module that make scraping the web an easy job.
Versatile web scraper for Node.js.
JavaScript
عرض على GitHub3,718
cgiffard/node-simplecrawler
cgiffard/node-simplecrawler
2,133عرض على GitHub
simplecrawler is designed to provide a basic, flexible and robust API for crawling websites. It was written to archive, analyse, and search some very large websites and has happily chewed through hundreds of thousands of pages and written tens of gigabytes to disk without issue.
Event-driven web crawler for Node.js.
JavaScript
عرض على GitHub2,133
martinsbalodis/web-scraper-chrome-extension
martinsbalodis/web-scraper-chrome-extension
1,364عرض على GitHub
Web Scraper is a chrome browser extension built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data.…
Browser-based data extraction tool.
JavaScript
عرض على GitHub1,364
zhuyingda/webster
zhuyingda/webster
559عرض على GitHub
Webster is a reliable web crawling and scraping framework written with Node.js, used to crawl websites and extract structured data from their pages.
Framework for scraping AJAX and JavaScript-rendered content.
JavaScript
عرض على GitHub559
brendonboshell/supercrawler
brendonboshell/supercrawler
381عرض على GitHub
Supercrawler is a Node.js web crawler. It is designed to be highly configurable and easy to use.
Crawler with custom handlers and rate limiting.
JavaScript
عرض على GitHub381
antivanov/js-crawler
antivanov/js-crawler
257عرض على GitHub
js-crawler
Node.js crawler supporting HTTP and HTTPS.
TypeScript
عرض على GitHub257
n0tan3rd/squidwarc
n0tan3rd/squidwarc
176عرض على GitHub
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head.
High-fidelity archival crawler using Chrome.
JavaScript
عرض على GitHub176

Awesome JavaScript Crawling Frameworks GitHub Repositories

apify/crawlee

bda-research/node-crawler

lapwinglabs/x-ray

projectdiscovery/naabu

yujiosaka/headless-chrome-crawler

hakluke/hakrawler

GerbenJavado/LinkFinder

rchipka/node-osmosis

IonicaBizau/scrape-it

ruipgil/scraperjs

cgiffard/node-simplecrawler

martinsbalodis/web-scraper-chrome-extension

zhuyingda/webster

brendonboshell/supercrawler

antivanov/js-crawler

n0tan3rd/squidwarc

استكشف الوسوم الفرعية