30 open-source projects similar to alibaba/page-agent, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Page Agent alternative.
Steel is a cloud browser automation platform that provides a REST API for launching and controlling remote Chrome browser sessions. It enables programmatic browsing and web scraping using standard automation tools like Puppeteer, Playwright, and Selenium, connecting to cloud-hosted browser instances via WebSocket and the Chrome DevTools Protocol. The platform supports both headless and headful browser sessions, with language-specific SDKs for TypeScript and Python. The service distinguishes itself through comprehensive anti-detection capabilities, including residential proxy rotation, CAPTCHA
AutoAgent is a multi-agent orchestrator and natural language workflow builder designed to connect multiple large language models with external API tools. It provides a framework for designing multi-step agent interactions and reasoning processes using plain text instead of manual code. The platform functions as a tool integration gateway, linking agents to third-party platforms and authenticated browser sessions. It enables the execution of complex analytical tasks and deep research by distributing work across collaborative agent frameworks and importing browser cookies to access restricted w
LaVague is an LLM web agent framework and large action model designed to translate natural language instructions into executable browser automation scripts. It functions as a multi-modal orchestrator that reasons over web page states and HTML content to automate multi-step tasks via a Selenium-based automation engine. The framework features a modular model provider layer, allowing users to swap between different language and vision models from providers such as Anthropic, Gemini, and Azure OpenAI. It employs a multi-modal world model to process screenshots and HTML structures, utilizing retri
escrcpy is an Android device mirroring tool and ADB device manager that enables the display and control of Android screens on a computer via USB or network connections. It functions as a multi-device screen orchestrator, providing a visual interface to arrange and control several mirrored device windows in parallel layouts. The project distinguishes itself as an automation controller that utilizes large language models to translate natural language instructions into actionable device commands. It further differentiates its capabilities by acting as a reverse tethering client, allowing a compu
This project is a browser automation system that connects Google's Gemini API to a web browser, enabling an AI agent to perform tasks on a user's behalf by interpreting natural language instructions. At its core, it operates through a continuous screenshot-based action loop, where the agent captures the browser's current state, sends the image to the Gemini model, and executes the model's returned commands to click, type, and navigate. The system distinguishes itself through a dual browser backend abstraction, supporting both local Playwright-controlled browsers and remote Browserbase cloud i
This project is a Model Context Protocol server that enables Large Language Models to control Playwright browsers for web automation, scraping, and end-to-end testing. It functions as a programmable interface for executing JavaScript, capturing screenshots, and interacting with web elements across multiple browser engines. The server exposes browser automation capabilities as a set of standardized tools that models can discover and invoke. It supports session-based browser isolation to ensure unique contexts for each client connection and provides a transport layer using either standard input
gstack is an AI agent framework and development workflow system designed to automate the software development lifecycle. It coordinates specialized AI personas to manage tasks across product design, engineering management, and quality assurance, transforming product intent into technical specifications and final releases. The project is distinguished by its deep integration of headless browser automation and semantic code memory. It utilizes a persistent Chromium daemon for web scraping and visual auditing, and implements a searchable knowledge base that logs architectural decisions and repos
This project is a framework for integrating Large Language Models into the Feishu messaging platform to create automated assistants. It functions as a self-hosted AI assistant and a chatbot gateway that routes messages between chat platforms and remote AI cloud providers. The system features a multi-channel messaging bridge and provider-agnostic model routing, allowing for orchestration between different AI models with automatic failover management. It includes a browser automation agent capable of programmatically controlling web browsers and capturing page snapshots to extend the assistant'
Kilocode is an autonomous engineering platform designed to orchestrate AI agents for complex software development tasks. It functions as a comprehensive system for automating coding, testing, and repository management by integrating directly with your codebase and terminal. The platform provides a unified gateway for model orchestration, allowing for the management of agentic workflows, event-driven automation, and persistent session state across distributed development environments. The platform distinguishes itself through its federated task management and policy-based access control, which
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
Stagehand is an AI-native browser automation framework that enables developers to build reliable web automations using a hybrid of natural language instructions and deterministic TypeScript code.
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows without relying on brittle selectors. The system functions as a headless browser controller, providing a programmatic interface to manage browser instances and execute granular interactions. The project distinguishes itself through its ability to translate high-level intent into
Droidrun is a mobile device automation framework that uses large language models to translate natural language commands into executable actions on mobile operating systems. It functions as an agent orchestrator and UI automation engine, providing a reasoning engine that decomposes complex mobile tasks into smaller, manageable steps. The system distinguishes itself through a hierarchical action translation process and the ability to analyze accessibility trees and screenshots to determine the visual layout and current status of mobile applications. It supports execution across both physical ha
kubectl-ai is a natural language cluster operator and AI command assistant that translates plain-text prompts into executable Kubernetes commands. It serves as an interface between large language models and the Kubernetes API to enable cluster management through conversational text. The project implements a Model Context Protocol server to expose cluster operations as standardized tools for external AI clients. It uses a provider-agnostic model interface to support both cloud-based and local AI backends. The system covers natural language infrastructure control and AI-assisted DevOps through
chromedp is a browser automation framework and driver that controls web browsers via the Chrome DevTools Protocol. It functions as a headless browser automation tool and web browser controller, enabling the programmatic management of browser sessions, targets, and network responses through a remote debugging interface. The project provides specialized capabilities for Chrome DevTools Protocol automation, including headless browser testing, web scraping and data extraction, and mobile device emulation. It also supports browser-based visual regression by capturing precise screenshots of web pag
how2 is a terminal-based tool that translates plain-English questions into shell commands using AI and StackOverflow data. It functions as a command-line interface where users describe what they want to do in natural language, and the tool returns the appropriate Unix shell or PowerShell command, with support for generating multi-line Bash scripts from natural language prompts. The tool distinguishes itself through its interactive answer browsing mode, which lets users select and copy from multiple StackOverflow answers directly in the terminal. It includes a fallback search mechanism that qu
This project is a high-performance headless browser engine designed for scalable web automation, data extraction, and AI agent integration. It provides a specialized environment that allows autonomous agents and testing frameworks to interact with web content through standardized remote control protocols. By executing pages in a lightweight, headless state, the engine minimizes resource consumption while maintaining the ability to perform complex navigation and dynamic content rendering. The platform distinguishes itself through deep integration with AI-centric communication layers and advanc
This project is an agentic framework designed to enable autonomous web navigation and browser automation. It functions as a controller that translates natural language instructions into deterministic browser actions, allowing agents to interact with websites, perform data extraction, and manage complex authentication flows. By leveraging accessibility trees and semantic element resolution, the framework mimics human-like navigation, moving beyond brittle DOM selectors to interact reliably with modern web interfaces. The framework distinguishes itself through its focus on secure, scalable exec
This project is a Python-based framework that functions as a generative AI agent for programmatic data analysis. It enables users to interact with structured data sources through natural language prompts, translating these requests into executable code to perform analysis, data cleaning, and visualization. By maintaining conversational context across multi-turn interactions, the system allows for iterative exploration and the building of complex data narratives. The framework distinguishes itself through a robust semantic layer and secure execution model. It maps raw datasets to descriptive m
The Gemini Cookbook is a comprehensive collection of implementation patterns, code samples, and development guides designed for building applications with Google Gemini models. It serves as a central resource for developers to integrate multimodal generative artificial intelligence into their software, providing the necessary frameworks to manage model interactions, stateful workflows, and structured data extraction. The repository distinguishes itself by offering specialized toolkits for autonomous agent orchestration, enabling the construction of agents that can execute code, browse the web
UFO is a multi-device task orchestrator and LLM agent orchestration framework designed to decompose natural language requests into executable task graphs. It functions as a cross-platform UI automation tool capable of performing interactions on Windows and mobile devices while routing tasks to distributed agents based on their hardware and software capabilities. The system is distinguished by its RAG-enhanced agent architecture, which integrates external documentation and previous execution traces to improve decision-making. It employs a hybrid UI detection approach that combines computer vis
Bytebot is an LLM desktop automation framework and virtual Linux desktop environment. It enables AI agents to plan and execute mouse and keyboard actions on a virtual computer using natural language, allowing for autonomous desktop automation and the integration of legacy systems that lack native APIs. The system operates as an LLM API gateway and a Model Context Protocol server, routing requests across multiple language model providers with integrated load balancing and rate limiting. It provides isolated, containerized environments where agents use visual reasoning to interpret screenshots
This project is an automation framework that connects large language models to web browsers via the Chrome DevTools Protocol for autonomous task execution. It functions as a bridge between intelligent agents and browser engines, allowing for the direct control of browser sessions and profiles. The framework features a self-healing agent capable of generating and executing custom scripts during runtime to resolve failures and optimize browser tasks. It supports stealthy deployment through the use of integrated proxies and captcha solvers to bypass bot detection and security mitigations. The s
This project is a platform that orchestrates multiple AI agents to automate data science workflows—covering data loading, cleaning, feature engineering, modeling, and querying. It also functions as a natural language database query interface, converting plain English questions into SQL, and as a visual data pipeline builder. Custom agents are generated on demand by filling prompt templates for tasks like data cleaning and feature engineering. Pipelines incorporate human-in-the-loop checkpoints that pause execution for review and approval. Intermediate results are saved as versioned files, ena
Firecrawl is a headless browser automation tool and web crawling engine designed to extract structured data from the web. It functions as an API that transforms raw website content and documents into clean markdown and JSON formats to serve as context for large language models. The project distinguishes itself by using natural language prompts to translate human instructions into targeted data extraction tasks and browser actions. It can execute interactive page navigation, such as clicking and scrolling, and perform automated web research to retrieve structured data without manual interventi
OpenBrowser is an AI web agent toolkit and automation framework designed to translate natural language instructions into executable browser workflows. It functions as a headless browser controller and orchestrator, enabling the creation of autonomous agents that navigate websites, interact with elements, and extract data using plain English commands. The system features a sandboxed execution environment that utilizes domain whitelists and memory limits to ensure secure web interaction. It distinguishes itself through a command-line interface for triggering autonomous tasks with configurable m
BrowserMCP is a browser automation bridge that connects AI tools to a live browser session through a local proxy server. It implements a standardized protocol for sending commands like click, type, and navigate to a real browser instance running on the user's machine, while keeping all browsing data on the device. The project distinguishes itself by preserving user sessions and fingerprints across automation tasks. It attaches to the user's existing browser profile to maintain cookies, logins, and authentication state, and uses the real browser's user agent, viewport, and extension context to
Jasper Client is a voice computing client and extensible speech framework designed to translate natural language speech into hardware actions and service requests. It functions as a voice command interface that manages the end-to-end process of audio capture, transcription, and action execution. The system features a modular architecture that allows for the integration of custom plugins, various speech recognition engines, and synthesis providers. This plugin-based approach supports the addition of new speakers and regional language capabilities without altering the core logic. The client in
Obscura is a web scraping infrastructure and headless browser server designed for AI agents. It provides a system for AI models to control browser sessions, interact with websites, and extract web data using a WebSocket implementation of the Chrome DevTools Protocol. The project focuses on bot detection evasion by randomizing browser fingerprints, masking native functions, and blocking tracking scripts to mimic human behavior. It further secures identities through a traffic layer that routes network requests via HTTP or SOCKS5 proxies. The system supports large-scale data extraction through
Magentic-UI is an agentic UI toolkit and framework that enables large language models to interface with real-time browser environments, operating systems, and virtual machines. It provides a sandbox environment where models can execute instructions to manage local files and run shell commands. The project functions as a web interaction orchestrator and browser automation framework, allowing for the execution of end-to-end web workflows and form completions. It coordinates these actions through a system that translates natural language goals into executable sequences. The toolkit covers sever