Dev-browser is a browser automation framework and headless browser controller that provides a sandboxed script runner for executing web tasks. It functions as a vision-based web automator and a specialized interface for large language models, enabling the navigation and interaction of web pages within isolated execution environments.
The project distinguishes itself by converting complex web pages into simplified representations and coordinate-based maps, allowing AI agents to analyze layouts and perform actions based on pixel locations. It employs a mapping system that assigns unique identifiers to DOM elements, decoupling interaction logic from volatile page selectors.
The system covers a broad range of automation capabilities, including persistent session and page management to maintain state across script executions, headless browser lifecycle control, and the generation of AI-friendly page snapshots for state analysis. It also includes security primitives to restrict script access to the host filesystem and network.
The framework is implemented using TypeScript and leverages Playwright for its programmable browser interface.