Benchmarks Overview

browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.

Supported Benchmarks

LexBench-Browser

Recommended - Real-world browser-agent benchmark with 210 tasks across 107 distinct Chinese and English websites. No login required.

Online-Mind2Web

Online evaluation based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.

BrowseComp

Browser operation competition tasks, evaluating agents’ comprehensive browser operation capabilities.

Feature Comparison

Benchmark	Tasks	Language	Evaluation	Login Required
LexBench-Browser	210	zh/en	LLM (visual)	No
Online-Mind2Web	300	English	WebJudge	No
BrowseComp	1266	English	Grader	No

Quick Comparison Run

# LexBench-Browser (recommended; no login required)
bubench run --agent browser-use --data LexBench-Browser --mode first_n --count 5

# Online-Mind2Web
bubench run --agent browser-use --data Online-Mind2Web --mode first_n --count 5

# BrowseComp
bubench run --agent browser-use --data BrowseComp --mode first_n --count 5

Data Location

All benchmark data is stored in the benchmarks/ directory:

Benchmark	Data File Path
LexBench-Browser	`benchmarks/LexBench-Browser/data/`
Online-Mind2Web	`benchmarks/Online-Mind2Web/data/`
BrowseComp	`benchmarks/BrowseComp/data/`

For more details on data loading configuration (including HuggingFace support and private datasets), please refer to the Data Loading guide.

Planned Support

More benchmarks

If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.

Custom Agent Data Loading

Get Started

Features

Examples

Development

Supported Benchmarks

LexBench-Browser

Online-Mind2Web

BrowseComp

Feature Comparison

Quick Comparison Run

Data Location

Planned Support

​Supported Benchmarks

LexBench-Browser

Online-Mind2Web

BrowseComp

​Feature Comparison

​Quick Comparison Run

​Data Location

​Planned Support

Supported Benchmarks

Feature Comparison

Quick Comparison Run

Data Location

Planned Support