MobileAgent

MobileAgent is an LLM-powered mobile automation agent and framework designed to navigate mobile user interfaces and execute multi-step tasks. It functions as a device interface automation system that maps semantic commands to screen coordinates to perform input events across mobile operating systems.

The project operates as a cross-app workflow orchestrator, switching between native on-screen interface actions and external API tools to complete sophisticated operations. It includes a visual grounding system that analyzes screenshots and interface metadata to identify elements and validate the success of actions through a feedback loop.

As a long-horizon task planner, the agent decomposes complex high-level goals into sequential executable steps. This process is supported by hierarchical state tracking and memory to maintain progress across multi-step automation workflows.

Features

Agentic LLM Frameworks - Provides a comprehensive framework for building autonomous agents that use LLMs to navigate and interact with mobile user interfaces.

GUI Task Automation - Executes end-to-end operations across mobile devices by identifying interface elements and performing grounding actions.

Agentic Workflow Automation - Uses intelligent agents to coordinate tasks and automate complex, stateful workflows.

Cross-Application Workflow Automation - Executes goals that require moving between different mobile applications and external tools.

Interface Grounding - Maps high-level instructions to specific screen coordinates and interface components for mobile interaction.

Semantic Action Mappings - Implements the translation of semantic LLM commands into precise mobile device input events.

Agentic Goal Decomposition - Uses LLMs to recursively break high-level objectives into actionable sub-tasks for mobile automation.

Task Planners - Implements logic to decompose high-level automation goals into a sequence of executable steps with progress tracking.

Task Planning Systems - Provides a framework for decomposing complex automation goals into actionable steps and coordinating execution.

Visual Grounding - Maps AI-generated intents to specific screen coordinates by analyzing screenshots and interface metadata.

Hierarchical State Tracking - Provides hierarchical memory to track progress across complex, multi-step automation workflows.

Device Interface Automation Frameworks - Provides a system for identifying on-screen elements and performing grounding actions to automate mobile operations.

Mobile Device Automation - Performs complex tasks on mobile devices by automatically identifying and interacting with UI elements.

Native Mobile Automation - Enables the execution of semantic commands as input events within native mobile application environments.

Cross-App Workflow Orchestration - Coordinates transitions between native on-screen actions and external API tools to complete complex operations.

Visual State - Validates action success by comparing resulting screenshots with expected outcomes in a feedback loop.

Hybrid Tool Orchestration - Coordinates the use of native mobile interface interactions and external API calls.

Tool-Use Orchestration - Interleaves reasoning with a choice between native interface actions and external tool execution.

Tool Orchestration - Dynamically invokes native interface actions and external software within automated workflows.

Planning - Maintains explicit plans and progress lists for long-running autonomous mobile automation tasks.

Hybrid Tool Dispatching - Dynamically routes requests between native on-screen actions and external API tools.

Agent Frameworks - Assistant family for mobile device operation.

Agent Lifecycle Management - Self-evolving mobile assistant for complex task automation.

GUI Agents - Autonomous multimodal agent for mobile device interaction.

GUI and Computer Agents - Fundamental agents and critics for GUI automation and error diagnosis.

X-PLUGMobileAgent

Features

Star history