II.
TestSet overview
Reference · livetest-set:agentbench-environments
AgentBench multi-environment suite overview
Canonical AgentBench artifact for broad LLM-as-agent evaluation.
Attributes
displayName
AgentBench multi-environment suite
benchmarkId
environmentCount
8
releasedAt
2023-08-07
composition
Multi-environment LLM-as-agent benchmark covering eight interactive
environments for reasoning, decision-making, tool use, and instruction
following.
homepageUrl
description
Canonical AgentBench artifact for broad LLM-as-agent evaluation.
Outgoing edges
belongs_to_benchmark1
- benchmark:agentbench·BenchmarkAgentBench
Incoming edges
None.