Steel is a cloud browser automation platform that provides a REST API for launching and controlling remote Chrome browser sessions. It enables programmatic browsing and web scraping using standard automation tools like Puppeteer, Playwright, and Selenium, connecting to cloud-hosted browser instances via WebSocket and the Chrome DevTools Protocol. The platform supports both headless and headful browser sessions, with language-specific SDKs for TypeScript and Python. The service distinguishes itself through comprehensive anti-detection capabilities, including residential proxy rotation, CAPTCHA
LaVague is an LLM web agent framework and large action model designed to translate natural language instructions into executable browser automation scripts. It functions as a multi-modal orchestrator that reasons over web page states and HTML content to automate multi-step tasks via a Selenium-based automation engine. The framework features a modular model provider layer, allowing users to swap between different language and vision models from providers such as Anthropic, Gemini, and Azure OpenAI. It employs a multi-modal world model to process screenshots and HTML structures, utilizing retri
AutoAgent is a multi-agent orchestrator and natural language workflow builder designed to connect multiple large language models with external API tools. It provides a framework for designing multi-step agent interactions and reasoning processes using plain text instead of manual code. The platform functions as a tool integration gateway, linking agents to third-party platforms and authenticated browser sessions. It enables the execution of complex analytical tasks and deep research by distributing work across collaborative agent frameworks and importing browser cookies to access restricted w
This project is a browser automation system that connects Google's Gemini API to a web browser, enabling an AI agent to perform tasks on a user's behalf by interpreting natural language instructions. At its core, it operates through a continuous screenshot-based action loop, where the agent captures the browser's current state, sends the image to the Gemini model, and executes the model's returned commands to click, type, and navigate. The system distinguishes itself through a dual browser backend abstraction, supporting both local Playwright-controlled browsers and remote Browserbase cloud i