Skip to main content
browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.

Supported Benchmarks

LexBench-Browser

Recommended - Real-world browser-agent benchmark with 210 tasks across 107 distinct Chinese and English websites. No login required.

Online-Mind2Web

Online evaluation based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.

BrowseComp

Browser operation competition tasks, evaluating agents’ comprehensive browser operation capabilities.

Feature Comparison

BenchmarkTasksLanguageEvaluationLogin Required
LexBench-Browser210zh/enLLM (visual)No
Online-Mind2Web300EnglishWebJudgeNo
BrowseComp1266EnglishGraderNo

Quick Comparison Run

# LexBench-Browser (recommended; no login required)
bubench run --agent browser-use --data LexBench-Browser --mode first_n --count 5

# Online-Mind2Web
bubench run --agent browser-use --data Online-Mind2Web --mode first_n --count 5

# BrowseComp
bubench run --agent browser-use --data BrowseComp --mode first_n --count 5

Data Location

All benchmark data is stored in the benchmarks/ directory:
BenchmarkData File Path
LexBench-Browserbenchmarks/LexBench-Browser/data/
Online-Mind2Webbenchmarks/Online-Mind2Web/data/
BrowseCompbenchmarks/BrowseComp/data/
For more details on data loading configuration (including HuggingFace support and private datasets), please refer to the Data Loading guide.

Planned Support

  • More benchmarks
If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.