Droidrun is a mobile device automation framework that uses large language models to translate natural language commands into executable actions on mobile operating systems. It functions as an agent orchestrator and UI automation engine, providing a reasoning engine that decomposes complex mobile tasks into smaller, manageable steps.
The system distinguishes itself through a hierarchical action translation process and the ability to analyze accessibility trees and screenshots to determine the visual layout and current status of mobile applications. It supports execution across both physical hardware and managed remote cloud environments, removing the requirement for local hardware setup.
The framework covers several capability areas, including structured data extraction from native mobile applications using natural language queries and telemetry-driven execution tracing to visualize agent decision paths. It also provides a system for extending agent functionality through custom tools and specialized guidance.