Overview
| Attribute | Value |
|---|---|
| Source | Mind2Web dataset |
| Task Type | Web navigation and interaction |
| Target Websites | Real-world English websites |
| Evaluation | WebJudge semantic matching |
Features
Real Websites
Tests operation on real websites, not simulated environments
Multi-step Tasks
Requires multiple sequential steps to complete complex goals
Semantic Evaluation
Uses WebJudge for semantic matching evaluation
No Login Required
All tasks can be executed without login
Quick Start
Run Tasks
Evaluate Results
Evaluation Metrics
| Metric | Description |
|---|---|
| Task Success Rate | Percentage of tasks completed |
| Action Accuracy | Accuracy of individual actions |
| Element Accuracy | Accuracy of element targeting |
Data Format
Task data is stored inbenchmarks/Online-Mind2Web/data/: