# alibaba/page-agent

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/alibaba-page-agent).**

19,138 stars · 1,651 forks · TypeScript · MIT

## Links

- GitHub: https://github.com/alibaba/page-agent
- Homepage: https://alibaba.github.io/page-agent/
- awesome-repositories: https://awesome-repositories.com/repository/alibaba-page-agent.md

## Topics

`agent` `ai` `ai-agents` `browser-automation` `javascript` `mcp` `typescript` `web`

## Description

Page-agent is an LLM browser automation agent and JavaScript in-page GUI controller. It translates natural language instructions into direct browser interface actions to automate web-based tasks and manipulate web page elements through a programmable interface.

The system coordinates complex sequences of actions across multiple browser tabs and different websites. It functions as a remote browser control server, providing an interface that allows external clients to operate a browser and manage page interactions.

Its capabilities include natural language intent decoding and action mapping, DOM-tree accessibility parsing, and cross-tab state management. The project also incorporates a JavaScript injection runtime and a remote procedure call interface to facilitate external command execution and state updates.

## Tags

### Artificial Intelligence & ML

- [Autonomous Browser Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agent-orchestration-multi-agent/autonomous-agents/autonomous-browser-agents.md) — An intelligent agent that interprets natural language to navigate and interact with web interfaces.
- [Browser Automation Agents](https://awesome-repositories.com/f/artificial-intelligence-ml/browser-automation-agents.md) — An LLM-powered agent that translates natural language into direct browser interface actions.
- [Intent-to-UI Action Mappings](https://awesome-repositories.com/f/artificial-intelligence-ml/intent-to-ui-action-mappings.md) — Maps natural language instructions to a sequence of executable browser operations and interface elements.
- [Natural Language Command Translation](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-translation-integrations/natural-language-command-translation.md) — Translates natural language instructions into executable browser-level commands via LLMs. ([source](https://alibaba.github.io/page-agent/))
- [Natural Language Query Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-query-interfaces.md) — Processes user text inputs to determine specific goals and parameters for web interface interaction.
- [Natural Language Workflow Builders](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-workflow-builders.md) — Converts plain text instructions into executable sequences of browser agent actions. ([source](https://cdn.jsdelivr.net/gh/alibaba/page-agent@main/README.md))
- [Multi-Model Workflow Coordinators](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-model-workflow-coordinators.md) — Coordinates automated sequences that complete multi-step processes across various web interfaces.

### Part of an Awesome List

- [Accessibility Tree Extractions](https://awesome-repositories.com/f/awesome-lists/data/html-and-xml-parsing/xml-parsing/ui-hierarchy-parsing/accessibility-tree-extractions.md) — Extracts the structural and semantic hierarchy of web page elements to identify interactable targets.
- [Natural Language Automation](https://awesome-repositories.com/f/awesome-lists/productivity/task-automation/natural-language-automation.md) — Controls web interfaces using text commands to automate repetitive browser workflows.

### Development Tools & Productivity

- [In-Page GUI Controllers](https://awesome-repositories.com/f/development-tools-productivity/in-page-script-execution/in-page-gui-controllers.md) — Provides a JavaScript-based system for manipulating web page elements through a programmable interface.
- [Multi-Page Browser Workflow Automators](https://awesome-repositories.com/f/development-tools-productivity/workflow-automations/multi-page-browser-workflow-automators.md) — Coordinates complex sequences of actions across multiple browser tabs and different websites.

### User Interface & Experience

- [JavaScript Injections](https://awesome-repositories.com/f/user-interface-experience/webview-interface-customizations/javascript-injections.md) — Injects JavaScript directly into the browser context to manipulate the DOM in real time.

### Web Development

- [Remote Browser Controllers](https://awesome-repositories.com/f/web-development/remote-browser-controllers.md) — Provides a server interface for external agents to remotely operate browsers via API calls. ([source](https://cdn.jsdelivr.net/gh/alibaba/page-agent@main/README.md))
- [Cross-Tab State Coordination](https://awesome-repositories.com/f/web-development/browser-integration-utilities/browser-session-management/tab-management/cross-tab-state-coordination.md) — Coordinates interaction sequences across multiple browser windows to execute multi-domain workflows.

### Networking & Communication

- [Remote Procedure Call Interfaces](https://awesome-repositories.com/f/networking-communication/remote-procedure-call-interfaces.md) — Exposes server endpoints for external clients to send commands and receive browser state updates.

### Software Engineering & Architecture

- [Cross-Page Workflow Orchestration](https://awesome-repositories.com/f/software-engineering-architecture/system-internals/centralization-patterns/workflow-execution-managers/complex-workflow-coordination/cross-page-workflow-orchestration.md) — Coordinates complex sequences of browser actions across multiple tabs and different websites. ([source](https://cdn.jsdelivr.net/gh/alibaba/page-agent@main/README.md))
