30 open-source projects similar to autoscrape-labs/pydoll, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Pydoll alternative.
nodriver is an asynchronous Chromium browser automation framework that provides headless control and web scraping capabilities. It functions as a Chrome DevTools Protocol client, allowing for granular engine control by attaching directly to the browser's debug port without the need for external driver binaries. The framework is specifically designed as an anti-bot detection bypass tool. It modifies browser fingerprints and protocol headers to evade automated security systems, handle security warnings, and bypass common obstacles like insecure connection alerts. The system covers a broad rang
gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases. The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repos
Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture. The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
CefSharp is a .NET binding for the Chromium Embedded Framework that allows developers to embed a full web browser into desktop applications. It provides an embedded browser control for rendering HTML, CSS, and JavaScript content within a native host window. The project features a bidirectional JavaScript bridge interface that enables the execution of scripts and the exposure of native host classes and methods to the browser environment. It also includes a headless browser automation tool for executing web tasks and capturing page screenshots without a graphical user interface. The library co
Taiko is a browser automation framework and web end-to-end testing library used to perform programmatic user actions and verify application behavior. It functions as a headless browser testing tool capable of simulating real interactions and asserting page states in Chromium and Firefox. The project includes a browser interaction recorder that captures live actions and exports them as executable JavaScript automation scripts. It also serves as a web accessibility auditor, analyzing pages to detect accessibility violations and ensure compliance with inclusive design standards. The framework c
This project is an automation framework that connects large language models to web browsers via the Chrome DevTools Protocol for autonomous task execution. It functions as a bridge between intelligent agents and browser engines, allowing for the direct control of browser sessions and profiles. The framework features a self-healing agent capable of generating and executing custom scripts during runtime to resolve failures and optimize browser tasks. It supports stealthy deployment through the use of integrated proxies and captcha solvers to bypass bot detection and security mitigations. The s
Playwright for Python is a browser automation framework designed for end-to-end testing, web scraping, and user interaction simulation. It functions as a headless browser controller that enables programmatic navigation, data extraction, and the execution of complex workflows across multiple rendering engines. The framework distinguishes itself through an actionability-aware interaction engine that automatically verifies element readiness before performing actions, significantly reducing test flakiness. It utilizes isolated browser contexts to maintain separate storage and cookies for parallel
Playwright is a comprehensive browser automation framework designed for end-to-end testing and web workflow automation. It provides a unified API to drive web applications across multiple browser engines, enabling developers to simulate complex user interactions, perform web scraping, and validate application behavior in consistent, isolated environments. The framework distinguishes itself through a web-first testing paradigm that prioritizes stability and resilience. By utilizing an auto-waiting actionability engine and accessibility-tree-based locators, it eliminates common sources of test
BrowserMCP is a browser automation bridge that connects AI tools to a live browser session through a local proxy server. It implements a standardized protocol for sending commands like click, type, and navigate to a real browser instance running on the user's machine, while keeping all browsing data on the device. The project distinguishes itself by preserving user sessions and fingerprints across automation tasks. It attaches to the user's existing browser profile to maintain cookies, logins, and authentication state, and uses the real browser's user agent, viewport, and extension context to
Automa is a browser-based automation platform that enables users to build, schedule, and execute repetitive web tasks through a visual, no-code interface. By operating as a browser extension, it provides a canvas-based environment where users construct workflows by connecting functional blocks to interact with web elements, manage browser state, and process data. The platform distinguishes itself through its deep integration with the browser environment, allowing for complex orchestration such as event-driven triggers, cross-origin request handling, and the ability to package workflows as sta
Portia is a containerized scraping platform and visual web scraper that enables no-code data extraction. It serves as a Scrapy visual scraping tool and spider generator, allowing users to design and deploy web scrapers through a graphical interface instead of writing manual selector code. The system distinguishes itself by converting visual web page annotations into executable Scrapy spider code and structured JSON specifications. This visual-to-code mapping allows users to define scraping logic and extraction rules through a point-and-click interface, which can then be exported for use in ex
Camoufox is a Firefox-based stealth automation browser designed to evade detection during automated browsing. It combines a fingerprint randomization engine that generates thousands of unique device attributes per session, native-level API interception to spoof WebRTC, WebGL, media, and other fingerprintable properties, and human behavior simulation that moves the cursor along natural, distance-aware trajectories. The browser is compiled from source with build-time stealth patches and runs headlessly via a lightweight virtual display buffer, making it suitable for web scraping, automated testi
This project serves as an agentic browser controller, providing a programmatic bridge that enables autonomous software agents to navigate web pages and interact with document elements. It functions as a browser automation protocol, facilitating headless browser operations and automated web interactions to perform repetitive tasks and end-to-end testing without manual human input. The system distinguishes itself by utilizing the Chrome DevTools Protocol to establish a bidirectional communication channel with the browser engine. This allows for protocol-based remote control, where external appl
Chromeless is a serverless deployment of Chrome and a programmable interface for automating headless browser interactions. It functions as a web page rendering engine and browser orchestrator, enabling the execution of automation tasks within an AWS Lambda environment. The project specializes in managing browser state, cookies, and viewport settings across remote Chrome instances. It provides tools for generating screenshots, PDFs, and raw text exports from rendered web pages. The system supports dynamic web interaction, including form filling, element clicking, and the execution of custom J
Puppeteer is a browser automation library that provides a programmatic interface for controlling web browsers to execute tasks, simulate user interactions, and perform end-to-end testing. It functions as a headless browser controller, managing browser lifecycles, isolated session contexts, and remote connections to facilitate stable, automated web-based workflows. The library distinguishes itself through its deep integration with the Chrome DevTools Protocol, utilizing a bidirectional message bus to execute commands and receive real-time event notifications. It supports advanced automation pa
Botasaurus is a Python web scraping framework and headless browser automation system used to build scalable data extraction tools. It functions as a web data extraction tool and OCR document parser, converting website content, images, and PDF files into structured formats such as JSON, CSV, and Excel. The framework distinguishes itself by providing a scraper management interface that allows Python functions to be wrapped in a web-based UI or deployed as standalone desktop applications. This enables non-technical users to trigger extraction jobs and manage tasks via a graphical interface or RE
php-webdriver is a WebDriver PHP client and browser automation framework that implements the W3C WebDriver standard. It serves as a programmatic interface for controlling web browsers, executing JavaScript, and managing browser sessions in both headed and headless environments. The library functions as a Selenium protocol implementation, allowing PHP applications to communicate with browser drivers such as ChromeDriver or GeckoDriver. It provides the ability to automate user actions, navigate pages, and validate DOM elements for web UI testing. Its capabilities cover broad areas of browser i
This project is a Model Context Protocol server that enables Large Language Models to control Playwright browsers for web automation, scraping, and end-to-end testing. It functions as a programmable interface for executing JavaScript, capturing screenshots, and interacting with web elements across multiple browser engines. The server exposes browser automation capabilities as a set of standardized tools that models can discover and invoke. It supports session-based browser isolation to ensure unique contexts for each client connection and provides a transport layer using either standard input
chromedp is a browser automation framework and driver that controls web browsers via the Chrome DevTools Protocol. It functions as a headless browser automation tool and web browser controller, enabling the programmatic management of browser sessions, targets, and network responses through a remote debugging interface. The project provides specialized capabilities for Chrome DevTools Protocol automation, including headless browser testing, web scraping and data extraction, and mobile device emulation. It also supports browser-based visual regression by capturing precise screenshots of web pag
This is a Model Context Protocol server that exposes Windows desktop automation and system administration functions to large language models. It provides programmatic control of mouse, keyboard, windows, and UI elements on Windows through simulated user input, while also enabling LLMs to manage the Windows registry, processes, files, and execute PowerShell commands through a remote interface. The server supports multiple transport protocols including stdio, SSE, and streamable HTTP, allowing flexible integration with different language model clients. It implements OAuth 2.0 with PKCE for secu
Undetected-chromedriver is a framework for automated browser navigation designed to bypass anti-bot security measures. It functions by patching browser drivers at the binary level to obscure automation signals, allowing scripts to interact with protected websites without being flagged or blocked by security services. The project distinguishes itself through its ability to maintain stealth during automated sessions, including those executed in headless mode. It achieves this by injecting custom configurations to mimic human user behavior and by hooking into low-level browser debugging protocol
WebDriverIO is a Node.js test automation framework used for automating functional tests across web browsers and mobile applications. It acts as a WebDriver protocol client that manages remote browser sessions and executes commands against WebDriver and Appium servers to perform end-to-end testing. The framework is distinguished by its ability to control both native and hybrid mobile applications and its support for running automated suites across local machines, remote grids, and cloud device providers. It includes specialized capabilities for coordinating multi-browser interactions and estab
Playwriter is a browser automation framework and remote controller that manages stateful sessions and executes programmatic commands via the Chrome DevTools Protocol. It provides a system for controlling web browsers to interact with pages and extract data through both programmatic APIs and a command-line interface. The project features a visual element selector that generates screenshots with accessibility labels, mapping visual interface elements to programmatic selectors to help agents navigate. It supports remote browser control through WebSocket tunneling, allowing users to manage browse
Consent-O-Matic is a browser extension and cookie consent automation tool designed to automatically interact with privacy notices and cookie banners. It utilizes a DOM interaction engine and a privacy preference manager to map user choices to automated actions on third-party consent management providers. The project features a custom rule engine that allows for the import of external rule lists from user-provided URLs to target specific website behaviors. It employs a targeting system that combines CSS selectors with text, style, and iframe filters to locate and interact with precise web elem
Crawlee-python is a web crawling framework for building scalable scrapers using Python. It serves as a comprehensive tool for web scraping automation, providing a system to extract structured data from websites using both lightweight HTTP requests and headless browser automation. The framework is distinguished by its anti-bot evasion capabilities, which include browser fingerprint impersonation and tiered proxy rotation to bypass detection systems and solve challenges such as Cloudflare. It also incorporates artificial intelligence for autonomous website navigation and schema-based data extra
Codeception is a full-stack testing framework for PHP applications that provides a unified interface for unit, functional, and acceptance testing. It functions as a browser automation tool via the WebDriver protocol, a REST and SOAP API client, and a database testing utility for managing state and asserting records. The framework is distinguished by its support for Behavior-Driven Development, utilizing Gherkin-based parsing to map natural language feature files to executable PHP methods. It employs a multi-layer testing abstraction and a module-based extension system, allowing users to switc
NeoPass is a specialized toolset designed to circumvent proctored exam environments. It functions as an AI-powered exam assistant and automated solver that helps users find answers to multiple-choice and coding questions during monitored tests. The project distinguishes itself through a browser-based bypass system that neutralizes portal restrictions. It disables tab-switching detection, overrides clipboard restrictions to enable copying and pasting, and employs a human-like typing simulator to input code letter by letter to avoid detection by automated monitoring systems. The software provi
This project is an MCP browser automation server that connects large language models to headless cloud browsers. It functions as an autonomous web workflow engine and an LLM web agent interface, enabling the translation of natural language instructions into browser actions and structured data retrieval. The system distinguishes itself through a managed headless browser cloud API that supports concurrent Chromium sessions with integrated stealth modes, CAPTCHA solving, and proxy traffic routing. It utilizes self-healing element selection to maintain automation resilience when page structures c
This project is an agentic framework designed to enable autonomous web navigation and browser automation. It functions as a controller that translates natural language instructions into deterministic browser actions, allowing agents to interact with websites, perform data extraction, and manage complex authentication flows. By leveraging accessibility trees and semantic element resolution, the framework mimics human-like navigation, moving beyond brittle DOM selectors to interact reliably with modern web interfaces. The framework distinguishes itself through its focus on secure, scalable exec