This project is a Model Context Protocol server and automation framework designed to control and automate iOS and Android devices. It provides a unified API that abstracts interactions between physical hardware and simulators across different mobile operating systems, functioning as a cross-platform device bridge.
The system is distinguished by a visual UI automation toolkit that uses screenshots and coordinate-based gestures—such as tapping, swiping, and long-pressing—rather than relying on element selectors. It supports remote connectivity via an HTTP server using Server-Sent Events, which can be secured with bearer token authorization.
The framework covers a broad range of capabilities, including application lifecycle management, device discovery, and hardware button simulation. It also includes tools for screen state recording, UI accessibility inspection, and the resolution of CAPTCHA challenges through visual analysis.