Online-Mind2Web - browseruse-bench

Online-Mind2Web is an online evaluation benchmark based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.

Overview

Attribute	Value
Source	Mind2Web dataset
Task Type	Web navigation and interaction
Target Websites	Real-world English websites
Evaluation	WebJudge semantic matching

Features

Real Websites

Tests operation on real websites, not simulated environments

Multi-step Tasks

Requires multiple sequential steps to complete complex goals

Semantic Evaluation

Uses WebJudge for semantic matching evaluation

No Login Required

All tasks can be executed without login

Quick Start

Run Tasks

# Run first 3 tasks
bubench run \
  --agent browser-use \
  --data Online-Mind2Web \
  --mode first_n \
  --count 3

# Run all tasks
bubench run \
  --agent Agent-TARS \
  --data Online-Mind2Web \
  --mode all \
  --skip-completed

Evaluate Results

bubench eval --agent browser-use --data Online-Mind2Web --model-id bu-2-0

Evaluation Metrics

Metric	Description
Task Success Rate	Percentage of tasks completed
Action Accuracy	Accuracy of individual actions
Element Accuracy	Accuracy of element targeting

Data Format

Task data is stored in benchmarks/Online-Mind2Web/data/:

{
  "task_id": "b7258ee05d75e6c50673a59914db412e_110325",
  "confirmed_task": "Find the store location and hours of the closest Trader Joe's to zip code 90028 and set it as my home store.",
  "website": "https://www.traderjoes.com/",
  "reference_length": 6,
  "level": "medium"
}

LexBench-Browser BrowseComp

​Overview

​Features

Real Websites

Multi-step Tasks

Semantic Evaluation

No Login Required

​Quick Start

​Run Tasks

​Evaluate Results

​Evaluation Metrics

​Data Format

​Related Links