II.
Benchmark overview
Reference · livebenchmark:agentbench
AgentBench overview
Multi-domain agent benchmark spanning OS, DB, KG, web shopping, web browsing, and other environments.
Attributes
displayName
AgentBench
homepageUrl
kind
tool-use
targetsKind
AgentVersion
description
Multi-domain agent benchmark spanning OS, DB, KG, web shopping, web browsing, and other environments.
Outgoing edges
applies_to2
- domain:software-engineering·DomainSoftware Engineering
- domain:data-science·DomainData Science
Incoming edges
belongs_to_benchmark1
- test-set:agentbench-environments·TestSetAgentBench multi-environment suite
bounds_subject1
- scope-boundary:agentbench.scope·ScopeBoundary