30 open-source projects similar to browserless/browserless, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Browserless alternative.
gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases. The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repos
This project is a high-performance headless browser engine designed for scalable web automation, data extraction, and AI agent integration. It provides a specialized environment that allows autonomous agents and testing frameworks to interact with web content through standardized remote control protocols. By executing pages in a lightweight, headless state, the engine minimizes resource consumption while maintaining the ability to perform complex navigation and dynamic content rendering. The platform distinguishes itself through deep integration with AI-centric communication layers and advanc
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
chromedp is a browser automation framework and driver that controls web browsers via the Chrome DevTools Protocol. It functions as a headless browser automation tool and web browser controller, enabling the programmatic management of browser sessions, targets, and network responses through a remote debugging interface. The project provides specialized capabilities for Chrome DevTools Protocol automation, including headless browser testing, web scraping and data extraction, and mobile device emulation. It also supports browser-based visual regression by capturing precise screenshots of web pag
Puppeteer is a JavaScript library for programmatically controlling Chrome and Firefox through the Chrome DevTools Protocol or the WebDriver BiDi protocol. It launches and manages browser instances—typically without a visible user interface—to automate interactions with web pages, enabling navigation, clicking, typing, and data extraction entirely through code. The library distinguishes itself through deep integration with the Chromium embedding layer, allowing fine-grained process configuration with custom flags, permissions, and sandbox policies. It maintains multiple concurrent command stre
Obscura is a web scraping infrastructure and headless browser server designed for AI agents. It provides a system for AI models to control browser sessions, interact with websites, and extract web data using a WebSocket implementation of the Chrome DevTools Protocol. The project focuses on bot detection evasion by randomizing browser fingerprints, masking native functions, and blocking tracking scripts to mimic human behavior. It further secures identities through a traffic layer that routes network requests via HTTP or SOCKS5 proxies. The system supports large-scale data extraction through
Puppeteer Sharp is a .NET wrapper and automation library used to programmatically drive headless Chrome and Chromium browsers. It functions as a Chrome DevTools Protocol client, providing a framework for web scraping and the automation of web page interactions. The project enables the execution of JavaScript within the browser context and supports attaching to remote browser sessions via WebSocket endpoints. It allows for the manipulation of browser states to perform functional web testing and visual regression analysis. Capability areas include content transformation via HTML injection, pag
Steel is a cloud browser automation platform that provides a REST API for launching and controlling remote Chrome browser sessions. It enables programmatic browsing and web scraping using standard automation tools like Puppeteer, Playwright, and Selenium, connecting to cloud-hosted browser instances via WebSocket and the Chrome DevTools Protocol. The platform supports both headless and headful browser sessions, with language-specific SDKs for TypeScript and Python. The service distinguishes itself through comprehensive anti-detection capabilities, including residential proxy rotation, CAPTCHA
This project is an automation framework that connects large language models to web browsers via the Chrome DevTools Protocol for autonomous task execution. It functions as a bridge between intelligent agents and browser engines, allowing for the direct control of browser sessions and profiles. The framework features a self-healing agent capable of generating and executing custom scripts during runtime to resolve failures and optimize browser tasks. It supports stealthy deployment through the use of integrated proxies and captcha solvers to bypass bot detection and security mitigations. The s
This project is an agentic framework designed to enable autonomous web navigation and browser automation. It functions as a controller that translates natural language instructions into deterministic browser actions, allowing agents to interact with websites, perform data extraction, and manage complex authentication flows. By leveraging accessibility trees and semantic element resolution, the framework mimics human-like navigation, moving beyond brittle DOM selectors to interact reliably with modern web interfaces. The framework distinguishes itself through its focus on secure, scalable exec
This project serves as an agentic browser controller, providing a programmatic bridge that enables autonomous software agents to navigate web pages and interact with document elements. It functions as a browser automation protocol, facilitating headless browser operations and automated web interactions to perform repetitive tasks and end-to-end testing without manual human input. The system distinguishes itself by utilizing the Chrome DevTools Protocol to establish a bidirectional communication channel with the browser engine. This allows for protocol-based remote control, where external appl
Katana is a web crawler and spider designed for security reconnaissance and web application mapping. It functions as a utility for identifying endpoints, forms, and API structures across web targets by combining standard HTTP request traversal with headless browser automation to render dynamic, JavaScript-heavy content. The tool distinguishes itself through its ability to maintain authenticated sessions and handle complex web interactions, such as automated form submission and captcha resolution. It provides granular control over the discovery process, allowing users to define specific crawl
This project is a LinkedIn data scraper and professional profile extractor designed to collect information from professional networking sites. It functions as a headless browser scraper that extracts professional profiles, company details, and job listings using automated browser sessions. The tool includes a session manager that saves and loads authentication cookies to maintain persistent access to protected profiles. It employs configurable browser settings and user-agent mimicry to simulate human activity and bypass bot detection. Data extraction capabilities cover person profiles, compa
Botasaurus is a Python web scraping framework and headless browser automation system used to build scalable data extraction tools. It functions as a web data extraction tool and OCR document parser, converting website content, images, and PDF files into structured formats such as JSON, CSV, and Excel. The framework distinguishes itself by providing a scraper management interface that allows Python functions to be wrapped in a web-based UI or deployed as standalone desktop applications. This enables non-technical users to trigger extraction jobs and manage tasks via a graphical interface or RE
CasperJS is a scripting utility and testing framework for automating web scenarios via headless browsers. It enables the execution of navigation steps and form inputs to automate complex user scenarios, extract web data, and validate the state of remote pages. The project provides specific tooling for PhantomJS and SlimerJS, allowing users to write programmable sequences for web navigation and data extraction. It includes capabilities for capturing visual snapshots of full pages or specific elements to perform user interface regression testing. The framework covers broad automation areas inc
CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to final data storage. The project provides specialized guidance on anti-bot bypass techniques and web API reverse engineering. It includes methods for evading browser detection through identity masking and proxy rotation, as well as techniques for identifying hidden API endpoints by analyzing network
This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats. The toolkit focuses on advanced data collection strategies, including headless browser automation to interact with JavaScript and a suite of network utilities for DNS resolution and WebSocket connections. It specifically covers methods for bypassing bot protections through proxy pool management, us
This project is a comprehensive educational guide and framework for building web scrapers using Python. It provides a course-based approach to data extraction, combining a Python crawler framework with tutorials on web reverse engineering and network traffic analysis. The project distinguishes itself by covering advanced extraction challenges, including the decryption of obfuscated JavaScript and the bypass of anti-scraping measures. It specifically addresses mobile application scraping through the simulation of user interactions and the interception of network traffic. The capability surfac
PhantomJS is a scriptable, headless browser engine based on WebKit that provides a programmatic interface for automating web page interactions. It operates without a graphical user interface, allowing for the execution of JavaScript to navigate pages, manipulate the document object model, and perform functional testing of web applications. The tool distinguishes itself by providing low-level control over the browser rendering lifecycle and network stack. It enables real-time interception and modification of network traffic, alongside the ability to generate visual snapshots and document expor
Firecrawl is a headless browser automation tool and web crawling engine designed to extract structured data from the web. It functions as an API that transforms raw website content and documents into clean markdown and JSON formats to serve as context for large language models. The project distinguishes itself by using natural language prompts to translate human instructions into targeted data extraction tasks and browser actions. It can execute interactive page navigation, such as clicking and scrolling, and perform automated web research to retrieve structured data without manual interventi
Crawlee-python is a web crawling framework for building scalable scrapers using Python. It serves as a comprehensive tool for web scraping automation, providing a system to extract structured data from websites using both lightweight HTTP requests and headless browser automation. The framework is distinguished by its anti-bot evasion capabilities, which include browser fingerprint impersonation and tiered proxy rotation to bypass detection systems and solve challenges such as Cloudflare. It also incorporates artificial intelligence for autonomous website navigation and schema-based data extra
Browsershot is a PHP library that serves as a Puppeteer browser wrapper to convert HTML and URLs into PDFs, images, or strings using a headless Chrome browser. It functions as a tool for transforming web content into visual media or extracting the final rendered DOM state of a page. The library enables the automation of browser rendering to generate PDFs and screenshots from web pages. It can retrieve the final rendered HTML markup after all client-side JavaScript execution is complete and can capture a full audit of network requests triggered during the page load process. The system include
Maxun is an open-source web scraping and automation platform designed to transform dynamic website content into structured data. By leveraging artificial intelligence to interpret natural language prompts, the system identifies page elements and extracts information without requiring manual selector configuration. It serves as a bridge between raw web content and intelligent workflows, providing structured outputs in formats optimized for large language model ingestion and agent-based applications. The platform distinguishes itself through its ability to handle complex, authenticated, and dyn
CyberScraper-2077 is an AI-powered web scraping tool that uses large language models to extract and structure data from websites into organized formats. It functions as an LLM web scraper and AI content parser, transforming unstructured raw web text into specific data schemas. The project distinguishes itself through a suite of anonymity and evasion tools, including proxy rotation, SOCKS-based identity masking, and the ability to route traffic through the Tor network to access hidden onion services. It further includes a bot detection bypass system that employs stealth parameters and custom n
OpenCLI is an AI browser automation framework designed to automate web navigation, data extraction, and repetitive browser tasks. It functions as a browser-based CLI generator that converts website interfaces into command-line interactions by controlling authenticated web browser sessions. The project features a web-to-CLI adapter platform for mapping web elements to programmatic command-line inputs and outputs. It includes a browser profile manager to organize and switch between isolated session profiles to maintain different user identities. The toolkit provides capabilities for web conten
This project is a Python web scraping library and automated data collection suite. It provides tools for extracting structured data from websites, implementing web crawlers to navigate site links, and parsing HTML DOM structures to isolate specific elements and attributes. The toolkit includes a pipeline for processing unstructured text and cleaning raw web content to extract meaningful information. It also features capabilities for image data extraction and the integration of external APIs to retrieve structured data from remote endpoints. The system covers broad capability areas including
This project is a Python web scraping tutorial and framework designed for building automated data extraction tools and web crawlers. It provides a structured approach to navigating websites and persisting scraped data to databases. The project includes a toolset for web API analysis, focusing on reverse engineering obfuscated API requests and inspecting network traffic to extract structured data. It also covers optical character recognition workflows to convert visual text within images into machine-readable strings. The framework covers capabilities for headless browser automation to handle
X-ray is a headless browser web scraper and HTML content crawler designed to extract structured data from websites. It functions as a stream-based data scraper and structured data extractor, using selectors to retrieve text and attributes from HTML as nested objects or arrays. The project includes a request rate controller to manage network traffic through concurrency limits, throttles, and timeouts. It handles dynamic website scraping by rendering JavaScript via a headless browser and performs automated website crawling using breadth-first link following and pagination management. The syste
Chromeless is a serverless deployment of Chrome and a programmable interface for automating headless browser interactions. It functions as a web page rendering engine and browser orchestrator, enabling the execution of automation tasks within an AWS Lambda environment. The project specializes in managing browser state, cookies, and viewport settings across remote Chrome instances. It provides tools for generating screenshots, PDFs, and raw text exports from rendered web pages. The system supports dynamic web interaction, including form filling, element clicking, and the execution of custom J
Streamlink is a command line video stream extractor that retrieves direct stream URLs from online services for use in external media players. It functions as a local media stream pipe, redirecting raw video data from web services into local files or players via standard input or HTTP. The project includes a headless browser stream scraper to intercept network requests and extract media data from script-heavy websites, alongside a dedicated processor for HLS and DASH segmented media streams. The tool utilizes a modular video plugin framework, allowing support for new streaming platforms to be