Fara

FARA is a visual computer-use agent model that controls a browser by predicting screen coordinates for clicking, typing, and scrolling, without relying on DOM or accessibility trees. It is designed to automate multi-step web tasks such as searching, form filling, booking, and shopping by reasoning over visual state and decomposing tasks into sequential actions.

The model uses a compact 7-billion-parameter decoder-only transformer that can run on consumer GPUs for low-latency on-device inference, or be deployed as a managed endpoint on Azure Foundry for cloud-based inference without local infrastructure. It also supports self-hosted serving via vLLM, LM Studio, or Ollama, giving users full control over the inference environment.

FARA includes a reproducible evaluation framework that runs agent benchmarks on 609 real, live web-browsing tasks with automatic retry handling for time-sensitive and error-prone scenarios. The framework provides standardized scoring rubrics to compare agent performance across different task descriptions and versions.

Features

Browser and Web Agents - Controls a browser to complete multi-step web tasks through visual perception and coordinate-based actions.

Visual Web Task Agents - Provides a visual computer-use agent that automates multi-step web tasks through pixel-level interface control.

Task Decomposition - Breaks multi-step web tasks into sequential actions by reasoning over visual state.

Visual Grounding Execution - Predicts click coordinates and scroll targets directly from pixel-level screen analysis without DOM reliance.

Computer Use Agents - Controls a computer visually by predicting screen coordinates for clicking, typing, and scrolling.

Compact Parameter Models - Uses a compact 7-billion-parameter transformer that fits on consumer GPUs for low-latency inference.

On-Device GUI Agents - Runs a compact 7-billion-parameter model locally on consumer hardware for computer-use tasks with low latency.

Visual Computer-Use Agents - Ships a visual computer-use agent model deployable via vLLM, LM Studio, or Ollama for full local control.

Web Task Automations - Automates multi-step web tasks like searching, form filling, booking, and shopping via visual interface control.

Visual Computer Controllers - Perceives webpages and performs scrolling, typing, and clicking on predicted coordinates without accessibility trees.

Live-Website Agent Benchmarks - Includes a reproducible evaluation framework that runs agent benchmarks on 609 real, live web-browsing tasks.

On-Device Deployments - Runs locally on consumer hardware with a compact 7-billion-parameter size for low latency and privacy.

Local Model Inference Servers - Supports self-hosted serving via vLLM, LM Studio, or Ollama for full inference control.

Agent Benchmarks - Provides a reproducible evaluation system for testing web-browsing agents across 609 tasks with live websites.

Managed Model Endpoints - Ships a managed endpoint deployment option on Azure Foundry for API-based model inference.

Managed Agent Endpoints - Deploys a computer-use model via Azure Foundry endpoint without managing infrastructure or downloading weights.

Managed AI Endpoints - Deploys the model on Azure Foundry without downloading weights or managing GPU infrastructure.

Self-Hosted Agent Inference - Runs the model on a GPU machine using vLLM, LM Studio, or Ollama for full inference control.

Self-Hosted Inference Servers - Runs the model on a GPU machine using vLLM, LM Studio, or Ollama for full inference control.

Live Website Agent Benchmarks - Provides a reproducible evaluation framework running agent benchmarks on 609 real, live web-browsing tasks.

Agent Performance Benchmarks - Runs a benchmark of 609 web-browsing tasks comparing agent performance across scoring rubrics.

microsoftfara

Features

Star history