Prerequisites
- Python 3.11+
- Node.js 18+ (only for Agent-TARS)
- uv (recommended Python package manager)
Installation
Install Python dependencies
bubench is on PATH:bubench run will create the agent venv defined in config.yaml (built-in defaults:
.venvs/browser_use, .venvs/skyvern, .venvs/agent_tars) and install the matching
dependencies on first use. Agent venv must be configured explicitly (no fallback to .venv).
If uv is not available, creation/install falls back to python -m venv and pip.Configure environment (.env)
.env — .env.example is the source of truth; keys below are the common ones.
$VAR placeholders in config.yaml are resolved from this file at runtime.Lexmount credentials: Apply forLEXMOUNT_API_KEYandLEXMOUNT_PROJECT_IDat browser.lexmount.cn (mainland China) or browser.lexmount.com (international). See Lexmount Cloud Browser for the full setup.
Configure config.yaml
config.yaml is the canonical runtime config. $VAR placeholders are
resolved from .env at runtime. Three parts to set up:1. Agent (model) — pick the active model for each agent under agents.<agent>:browser_id under agents.<agent>.browser. Only one
backend is active at a time; fill in its required keys and leave others commented:bubench eval for LLM-as-judge scoring:Per-agent config files under
configs/agents/<agent>/config.yaml are a legacy path
and may be removed in a future release. Prefer the root config.yaml.Quick Run
Run your first benchmark
Smoke test (recommended)
Add--dry-run to validate config loading and task resolution without executing tasks:
--dry-run checks that config.yaml / .env parse cleanly and that at least one
task matches your --benchmark / --split / --mode. It does not create the
agent venv, call model APIs, or open a browser — those only happen on a real run.Evaluate results
Logs: Script execution logs are saved inoutput/logs/.
run.py:output/logs/run/eval.py:output/logs/eval/leaderboard:output/logs/leaderboard/
Generate leaderboard
Run Modes
| Mode | Description | Example |
|---|---|---|
single | Run the first task (sanity check) | --mode single |
first_n | Run the first N tasks | --mode first_n --count 5 |
sample_n | Randomly sample N tasks | --mode sample_n --count 10 |
specific | Run specified task IDs | --mode specific --task-ids id1 id2 |
by_id | Run one task by numeric ID field | --mode by_id --id 123 |
all | Run all tasks | --mode all |
Note: --task-ids expects a space-separated list.
Common Parameters
--data-source:localorhuggingface.--force-download: Force re-download in HuggingFace mode.--agent-config: Optional external agent config YAML path. By default the runtime config is loaded from rootconfig.yaml.--timestamp: Resume or run in a specific directory (YYYYMMDD_HHmmss).
--timeout overrides TIMEOUT in the agent config.
Resume an Interrupted Run
If a run is interrupted, use--timestamp to point to the same output directory and --skip-completed to skip tasks that already have results:
Tip: Find your timestamp under experiments/{benchmark}/{split}/{agent}/.
Running Multiple Agents in Parallel
bubench run uses the venv specified by the agent entry in config.yaml and will auto-create/install dependencies
on first use. By default each built-in agent has a dedicated venv:
browser-use->.venvs/browser_useskyvern->.venvs/skyvernAgent-TARS->.venvs/agent_tars
venv, bubench run exits with an error instead of falling back to .venv.
If you need to run conflicting agents at the same time, open two terminals and run each agent with its own venv.
Parallel Task Execution (Split by Task IDs)
To speed up a large benchmark, split tasks across multiple terminals using--mode specific --task-ids:
--timestamp in both terminals to write results to the same output directory.
Node.js Agents (No Conflicts)
Agent-TARS runs via a Node.js CLI and does not share Python dependencies with other agents. You can run it in any terminal after installing the CLI.Next Steps
Supported Agents
Explore available browser agents
Benchmarks
Learn about each benchmark
Cloud Browser Setup
Configure Lexmount cloud browser
View Leaderboard
Compare agent performance