# Headless Browser Automation Libraries

> Search results for `control a headless browser from code for scraping and automation` on awesome-repositories.com. 111 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/control-a-headless-browser-from-code-for-scraping-and-automation

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/control-a-headless-browser-from-code-for-scraping-and-automation).**

## Results

- [autoscrape-labs/pydoll](https://awesome-repositories.com/repository/autoscrape-labs-pydoll.md) (6,919 ⭐) — pydoll is a Chrome DevTools Protocol automation library and headless browser controller used for web data extraction and parallel browser automation. It controls Chromium-based browsers via direct WebSocket connections, allowing it to manage isolated browser contexts and tabs while bypassing the overhead and detection associated with WebDriver.

The project features an anti-bot evasion framework that mimics natural human behavior, including mouse movements generated via Bezier curves and variable typing patterns. It provides specialized stealth capabilities to bypass behavioral analysis and au
- [browser-use/browser-harness](https://awesome-repositories.com/repository/browser-use-browser-harness.md) (15,265 ⭐) — This project is an automation framework that connects large language models to web browsers via the Chrome DevTools Protocol for autonomous task execution. It functions as a bridge between intelligent agents and browser engines, allowing for the direct control of browser sessions and profiles.

The framework features a self-healing agent capable of generating and executing custom scripts during runtime to resolve failures and optimize browser tasks. It supports stealthy deployment through the use of integrated proxies and captcha solvers to bypass bot detection and security mitigations.

The s
- [browser-use/browser-use](https://awesome-repositories.com/repository/browser-use-browser-use.md) (100,229 ⭐) — Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows without relying on brittle selectors. The system functions as a headless browser controller, providing a programmatic interface to manage browser instances and execute granular interactions.

The project distinguishes itself through its ability to translate high-level intent into
- [sawyerhood/dev-browser](https://awesome-repositories.com/repository/sawyerhood-dev-browser.md) (3,631 ⭐) — Dev-browser is a browser automation framework and headless browser controller that provides a sandboxed script runner for executing web tasks. It functions as a vision-based web automator and a specialized interface for large language models, enabling the navigation and interaction of web pages within isolated execution environments.

The project distinguishes itself by converting complex web pages into simplified representations and coordinate-based maps, allowing AI agents to analyze layouts and perform actions based on pixel locations. It employs a mapping system that assigns unique identif
- [lightpanda-io/browser](https://awesome-repositories.com/repository/lightpanda-io-browser.md) (31,168 ⭐) — This project is a high-performance headless browser engine designed for scalable web automation, data extraction, and AI agent integration. It provides a specialized environment that allows autonomous agents and testing frameworks to interact with web content through standardized remote control protocols. By executing pages in a lightweight, headless state, the engine minimizes resource consumption while maintaining the ability to perform complex navigation and dynamic content rendering.

The platform distinguishes itself through deep integration with AI-centric communication layers and advanc
- [garrytan/gstack](https://awesome-repositories.com/repository/garrytan-gstack.md) (110,596 ⭐) — gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases.

The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repos
- [aimeos/aimeos-headless](https://awesome-repositories.com/repository/aimeos-aimeos-headless.md) (2,541 ⭐) — This project is a headless commerce API and a REST-based gateway that exposes e-commerce business logic and product data to decoupled frontend applications. It provides a centralized system for handling online store operations through a set of commerce interfaces.

The platform is designed for large-scale marketplace management, featuring a multi-tenant architecture that isolates data for multiple independent vendors, channels, and warehouses within a single installation. It distinguishes itself with an automated subscription billing system for recurring payment cycles and a tiered pricing eng
- [checkly/headless-recorder](https://awesome-repositories.com/repository/checkly-headless-recorder.md) (15,292 ⭐) — Chrome extension that records your browser interactions and generates a Playwright or Puppeteer script.
- [ariya/phantomjs](https://awesome-repositories.com/repository/ariya-phantomjs.md) (29,489 ⭐) — PhantomJS is a scriptable, headless browser engine based on WebKit that provides a programmatic interface for automating web page interactions. It operates without a graphical user interface, allowing for the execution of JavaScript to navigate pages, manipulate the document object model, and perform functional testing of web applications.

The tool distinguishes itself by providing low-level control over the browser rendering lifecycle and network stack. It enables real-time interception and modification of network traffic, alongside the ability to generate visual snapshots and document expor
- [unclecode/crawl4ai](https://awesome-repositories.com/repository/unclecode-crawl4ai.md) (68,644 ⭐) — Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs. By integrating language models directly into the extraction workflow, the system converts raw HTML into clean, structured data or Markdown files optimized for downstream ingestion.

The platform distinguishes itself through a distributed, self-hosted infrastructure that manages l
- [eyalzh/browser-control-mcp](https://awesome-repositories.com/repository/eyalzh-browser-control-mcp.md) (294 ⭐) — MCP server paired with a browser extension that enables AI agents to control the user's browser.
- [avelino/awesome-go](https://awesome-repositories.com/repository/avelino-awesome-go.md) (175,576 ⭐) — This project serves as a comprehensive language ecosystem index, functioning as a centralized, community-curated directory for the Go programming language. It organizes a vast landscape of software components, libraries, and development tools into a structured, navigable hierarchy, enabling developers to efficiently discover resources tailored to specific functional domains.

The repository distinguishes itself through a decentralized contribution model, where community-driven updates ensure the index remains current with the rapidly evolving software landscape. Beyond simple resource listing,
- [openai/skills](https://awesome-repositories.com/repository/openai-skills.md) (9,043 ⭐) — This project is a framework for packaging and installing standardized capabilities, scripts, and instructions that LLM agents execute to perform complex tasks. It functions as a tool orchestrator and skill framework, bundling instructions and resources into portable formats that agents discover and use for repeatable workflows.

The system distinguishes itself through a manifest-driven discovery process, allowing agents to identify available capabilities and their execution parameters. It supports the deployment of these modular capability sets into isolated runtime environments using remote U
- [roryprimrose/headless](https://awesome-repositories.com/repository/roryprimrose-headless.md) (85 ⭐) — Headless browser support for fast web acceptance testing in .Net
- [yhat/scrape](https://awesome-repositories.com/repository/yhat-scrape.md) (1,515 ⭐) — A simple, higher level interface for Go web scraping.
- [fingerprintjs/fingerprintjs](https://awesome-repositories.com/repository/fingerprintjs-fingerprintjs.md) (27,334 ⭐) — Fingerprint is a visitor identification and fraud detection platform that generates persistent, unique identifiers by analyzing browser and device attributes. By extracting technical signals from the client environment, it enables reliable user tracking across sessions without relying on traditional cookies.

The platform distinguishes itself through its focus on high-accuracy identification and security-first architecture. It employs edge-side proxying to bypass ad-blockers and privacy restrictions, ensuring consistent data collection. To maintain data integrity, it uses cryptographic payload
- [googlechromelabs/carlo](https://awesome-repositories.com/repository/googlechromelabs-carlo.md) (9,259 ⭐) — Carlo is a Node.js web rendering framework and desktop application bundler. It functions as a server-side browser controller and headless automation bridge that uses a local browser instance as the primary user interface for Node.js applications.

The project distinguishes itself by providing a bidirectional bridge for cross-environment JavaScript integration, allowing server-side functions to be exposed to the browser window object and enabling the execution of page-context code from the server. It includes capabilities for packaging applications into standalone desktop executables, complete
- [cloudflare/moltworker](https://awesome-repositories.com/repository/cloudflare-moltworker.md) (9,909 ⭐) — Moltworker is an AI agent sandbox and model orchestrator designed for the secure execution of untrusted code and shell commands generated by large language models. It functions as a gateway proxy that routes requests to multiple AI providers through a unified interface, integrating a container runtime backed by S3-compatible object storage to persist state across ephemeral lifecycles.

The system distinguishes itself by combining an AI model orchestrator with a headless browser controller for automated web scraping and screenshot capture. It manages the full lifecycle of AI agents, including m
- [scrapy/scrapely](https://awesome-repositories.com/repository/scrapy-scrapely.md) (1,887 ⭐) — Scrapely
- [insin/control-panel-for-twitter](https://awesome-repositories.com/repository/insin-control-panel-for-twitter.md) (2,540 ⭐) — Browser extension which gives you more control over your Twitter timeline and adds missing features and UI improvements - for desktop and mobile
- [gocolly/colly](https://awesome-repositories.com/repository/gocolly-colly.md) (25,101 ⭐) — Colly is a high-performance web scraping framework designed for the automated extraction of structured data from websites. It provides a programmable toolkit that manages the complexities of large-scale data collection, including concurrent request orchestration, automatic cookie handling, and robots.txt compliance. By utilizing an asynchronous execution model, the engine maintains high throughput while preventing resource exhaustion during recursive or distributed crawling tasks.

The framework is distinguished by its modular, event-driven architecture, which allows developers to hook into sp
- [cloudflare/workers-sdk](https://awesome-repositories.com/repository/cloudflare-workers-sdk.md) (4,186 ⭐) — This project is an edge computing development toolkit and serverless command line interface used to develop, test, and deploy serverless functions to a global edge network. It serves as an edge runtime bundler and resource orchestrator, managing the entire lifecycle of edge projects from local development to worldwide distribution.

The toolkit distinguishes itself through distributed workflow management, coordinating stateful instances and the durable execution of long-running processes across the edge. It also provides specialized integrations for edge AI, including the management of vector
- [letta-ai/letta](https://awesome-repositories.com/repository/letta-ai-letta.md) (21,168 ⭐) — Letta is a framework for building, deploying, and managing autonomous AI agents that maintain persistent state across long-term interactions. It provides a comprehensive suite of primitives for defining agents with configurable personas, modular memory blocks, and tool-use capabilities, enabling them to retain user preferences and conversation history over extended sessions.

The platform distinguishes itself through its advanced memory management and orchestration capabilities. It allows agents to autonomously update their own memory, perform retrieval-augmented generation, and coordinate com
- [browsh-org/browsh](https://awesome-repositories.com/repository/browsh-org-browsh.md) (18,884 ⭐) — Browsh is a text-based web browser and headless browser frontend that renders modern websites and web applications within a terminal emulator. It functions as a TTY web browser, allowing users to view and interact with complex web content directly from a command line interface.

The project enables web navigation in environments where a graphical user interface is unavailable, such as when accessing a remote server via SSH or operating in low-bandwidth conditions. It translates browser pixels and colors into ANSI escape codes to simulate a graphical interface using text characters.

The system
- [nanmicoder/mediacrawler](https://awesome-repositories.com/repository/nanmicoder-mediacrawler.md) (51,294 ⭐) — MediaCrawler is an automated web scraping framework designed to extract public posts, comments, and creator metadata from various social media platforms. It functions as a headless browser automator, utilizing real browser instances to render dynamic content and execute the client-side scripts necessary for interacting with modern web interfaces.

The system distinguishes itself through a focus on session persistence and network flexibility. It supports remote debugging to reuse active browser sessions and cookies, which helps minimize the risk of triggering platform security challenges. To ma
- [apify/crawlee](https://awesome-repositories.com/repository/apify-crawlee.md) (24,002 ⭐) — Crawlee is a web scraping framework designed for building scalable, reliable, and distributed data extraction pipelines. It provides a unified interface for managing headless browser automation and lightweight HTTP requests, allowing developers to handle complex web navigation, dynamic content rendering, and large-scale data collection within a single, modular architecture.

The project distinguishes itself through its resource-aware concurrency controller, which dynamically scales task execution based on real-time CPU and memory usage to prevent host machine exhaustion. It also features a rob
- [any4ai/anycrawl](https://awesome-repositories.com/repository/any4ai-anycrawl.md) (2,742 ⭐) — AnyCrawl is an AI-powered data extractor, automated web crawler, and headless browser orchestrator. It serves as a web content extraction API and a gateway that connects crawling and scraping tools to language models using a standardized API protocol.

The project specializes in converting unstructured website content into structured JSON or markdown optimized for AI assistants. It utilizes language models and JSON schemas to pull specific information into validated formats and provides capabilities for AI page summarization and LLM-optimized content extraction.

The system manages comprehensi
- [remitchell/python-scraping](https://awesome-repositories.com/repository/remitchell-python-scraping.md) (4,714 ⭐) — These code samples are for the book Web Scraping with Python 2nd Edition
- [code-and-comment/code-and-comment](https://awesome-repositories.com/repository/code-and-comment-code-and-comment.md) (17 ⭐) — PWA to add comment to Github file.
- [segmentio/nightmare](https://awesome-repositories.com/repository/segmentio-nightmare.md) (19,775 ⭐) — Nightmare is an Electron-based browser automation library and headless browser controller. It provides the infrastructure to programmatically navigate web pages, interact with DOM elements, and execute JavaScript within a background browser instance.

The project distinguishes itself by integrating a full Chromium instance within an Electron shell, allowing for the management of browser sessions, network proxy settings, and persistent storage partitions. It enables the capture of page states as PNG screenshots, PDF documents, or HTML files.

The tool covers a broad range of capabilities includ
- [lharries/whatsapp-mcp](https://awesome-repositories.com/repository/lharries-whatsapp-mcp.md) (5,339 ⭐) — This project is a Model Context Protocol server that acts as a programmatic bridge between large language models and private messaging accounts. It provides an automation interface for interacting with WhatsApp by exposing messaging and data retrieval capabilities as tools for AI assistants.

The system utilizes browser automation to control the web application interface, allowing for stateful session management to maintain authentication. It enables the transmission of various content types, including plain text, documents, and audio files formatted as voice messages.

The server covers conve
- [google-gemini/gemini-cli](https://awesome-repositories.com/repository/google-gemini-gemini-cli.md) (105,341 ⭐) — This project provides a command-line interface for managing autonomous agent workflows, task orchestration, and system-level automation. It includes a comprehensive framework for defining agent skills, managing persistent memory, and delegating tasks to specialized subagents. Users can configure complex planning modes, execute shell commands with safety constraints, and integrate external tools through standardized protocols.

The platform supports non-interactive execution via a headless mode and provides an event-driven hook framework for custom lifecycle automation. It features centralized
- [lorien/web-scraping](https://awesome-repositories.com/repository/lorien-web-scraping.md) (7,931 ⭐) — This project is a comprehensive resource directory for web data extraction, providing a curated collection of tools and libraries for parsing data, automating browsers, and managing network operations. It serves as a guide for extracting structured information from HTML, XML, JSON, and PDF formats.

The toolkit focuses on advanced data collection strategies, including headless browser automation to interact with JavaScript and a suite of network utilities for DNS resolution and WebSocket connections. It specifically covers methods for bypassing bot protections through proxy pool management, us
- [liquidgalaxylab/lg-gesture-and-voice-control](https://awesome-repositories.com/repository/liquidgalaxylab-lg-gesture-and-voice-control.md) (0 ⭐) — LG Gesture and Voice Control An App To Provide Gesture and Voice Control for Liquid Galaxy .
- [g1879/drissionpage](https://awesome-repositories.com/repository/g1879-drissionpage.md) (12,102 ⭐) — DrissionPage is a Python library designed for web automation, data scraping, and testing. It functions as a browser automation framework that communicates directly with the browser engine via the Chrome DevTools Protocol, allowing for precise control over browser instances and page states.

The library distinguishes itself by providing a unified interface that combines full browser automation with raw HTTP request capabilities. This hybrid approach allows users to switch between lightweight network requests and heavy browser-based interactions within a single workflow. By wrapping asynchronous
- [netsoss/headless-burp](https://awesome-repositories.com/repository/netsoss-headless-burp.md) (235 ⭐) — Headless Burp
- [executeautomation/mcp-playwright](https://awesome-repositories.com/repository/executeautomation-mcp-playwright.md) (5,237 ⭐) — This project is a Model Context Protocol server that enables Large Language Models to control Playwright browsers for web automation, scraping, and end-to-end testing. It functions as a programmable interface for executing JavaScript, capturing screenshots, and interacting with web elements across multiple browser engines.

The server exposes browser automation capabilities as a set of standardized tools that models can discover and invoke. It supports session-based browser isolation to ensure unique contexts for each client connection and provides a transport layer using either standard input
- [getmaxun/maxun](https://awesome-repositories.com/repository/getmaxun-maxun.md) (15,049 ⭐) — Maxun is an open-source web scraping and automation platform designed to transform dynamic website content into structured data. By leveraging artificial intelligence to interpret natural language prompts, the system identifies page elements and extracts information without requiring manual selector configuration. It serves as a bridge between raw web content and intelligent workflows, providing structured outputs in formats optimized for large language model ingestion and agent-based applications.

The platform distinguishes itself through its ability to handle complex, authenticated, and dyn
- [anorov/cloudflare-scrape](https://awesome-repositories.com/repository/anorov-cloudflare-scrape.md) (3,526 ⭐) — cloudflare-scrape
- [pestphp/pest](https://awesome-repositories.com/repository/pestphp-pest.md) (11,537 ⭐) — Pest is a testing framework for PHP that provides a comprehensive suite for executing unit, integration, and end-to-end tests. It functions as an automated testing tool that prioritizes developer experience and readability through a concise, expressive syntax for defining test suites. By wrapping an established testing foundation, it maintains compatibility with existing ecosystem tools while offering a specialized interface for writing and organizing automated tests.

The framework distinguishes itself through integrated support for parallel test execution, which distributes suites across mul
- [h4ckf0r0day/obscura](https://awesome-repositories.com/repository/h4ckf0r0day-obscura.md) (16,110 ⭐) — Obscura is a web scraping infrastructure and headless browser server designed for AI agents. It provides a system for AI models to control browser sessions, interact with websites, and extract web data using a WebSocket implementation of the Chrome DevTools Protocol.

The project focuses on bot detection evasion by randomizing browser fingerprints, masking native functions, and blocking tracking scripts to mimic human behavior. It further secures identities through a traffic layer that routes network requests via HTTP or SOCKS5 proxies.

The system supports large-scale data extraction through
- [formbricks/formbricks](https://awesome-repositories.com/repository/formbricks-formbricks.md) (12,391 ⭐) — Formbricks is an open-source survey and feedback platform designed to help teams capture and analyze user insights through targeted, in-app, and website-based interactions. It functions as a comprehensive customer experience analytics system that allows organizations to maintain full control over their data, user attributes, and survey workflows.

The platform distinguishes itself through its event-driven architecture, which enables precise behavioral targeting by triggering surveys based on specific user actions or application events. It supports deep integration with external ecosystems by a
- [stefanbuck/awesome-browser-extensions-for-github](https://awesome-repositories.com/repository/stefanbuck-awesome-browser-extensions-for-github.md) (3,268 ⭐) — A collection of awesome browser extensions for GitHub.
- [ultrafunkamsterdam/nodriver](https://awesome-repositories.com/repository/ultrafunkamsterdam-nodriver.md) (3,578 ⭐) — nodriver is an asynchronous Chromium browser automation framework that provides headless control and web scraping capabilities. It functions as a Chrome DevTools Protocol client, allowing for granular engine control by attaching directly to the browser's debug port without the need for external driver binaries.

The framework is specifically designed as an anti-bot detection bypass tool. It modifies browser fingerprints and protocol headers to evade automated security systems, handle security warnings, and bypass common obstacles like insecure connection alerts.

The system covers a broad rang
- [hummingbot/hummingbot](https://awesome-repositories.com/repository/hummingbot-hummingbot.md) (18,907 ⭐) — Hummingbot is an open-source framework designed for building, backtesting, and deploying autonomous trading agents and algorithmic strategies across centralized and decentralized cryptocurrency exchanges. It provides a modular environment where users can orchestrate containerized bots to execute complex market-making, grid trading, and arbitrage operations.

The platform distinguishes itself through a skill-based architecture that integrates large language models, enabling users to monitor market conditions and control trading operations via natural language commands. It features a unified con
- [oxylabs/oxylabs-ai-studio-py](https://awesome-repositories.com/repository/oxylabs-oxylabs-ai-studio-py.md) (2,468 ⭐)
- [pgssoft/automate](https://awesome-repositories.com/repository/pgssoft-automate.md) (291 ⭐) — Swift framework containing a set of helpful XCTest extensions for writing UI automation tests
- [huggingface/smolagents](https://awesome-repositories.com/repository/huggingface-smolagents.md) (27,885 ⭐) — This framework provides a development toolkit for building autonomous agents that utilize language models to solve complex, non-deterministic tasks. Its core design centers on a code-executing architecture where agents generate and run Python code snippets to perform logic, data manipulation, and tool interactions. By moving beyond structured data formats, the system enables agents to manage program flow and object state through iterative reasoning cycles.

The project distinguishes itself through its focus on code-based agent implementation and secure execution environments. Developers can ch
- [makieorg/makie.jl](https://awesome-repositories.com/repository/makieorg-makie-jl.md) (2,778 ⭐) — Makie.jl is a high-performance Julia data visualization library and hardware-accelerated plotting engine used to create interactive 2D and 3D visualizations. It functions as a reactive visualization framework where plots update automatically via observables and compute graphs, and as a vector graphics generator for high-resolution academic output.

The system is distinguished by its backend-agnostic rendering pipeline, which supports OpenGL, WebGL, and ray-traced scenes. It employs a grammar-of-graphics approach to map variables to aesthetic attributes and utilizes a hierarchical scene graph t
- [googlechrome/puppeteer](https://awesome-repositories.com/repository/googlechrome-puppeteer.md) (94,974 ⭐) — Puppeteer is a JavaScript library for programmatically controlling Chrome and Firefox through the Chrome DevTools Protocol or the WebDriver BiDi protocol. It launches and manages browser instances—typically without a visible user interface—to automate interactions with web pages, enabling navigation, clicking, typing, and data extraction entirely through code.

The library distinguishes itself through deep integration with the Chromium embedding layer, allowing fine-grained process configuration with custom flags, permissions, and sandbox policies. It maintains multiple concurrent command stre
