Agentic AI Atlas

II.

TestSet overview

test-set:agentbench-environments

Reference · live

AgentBench multi-environment suite overview

Canonical AgentBench artifact for broad LLM-as-agent evaluation.

TestSetOutgoing · 1Incoming · 0

displayName

AgentBench multi-environment suite

benchmarkId

environmentCount

releasedAt

2023-08-07

composition

Multi-environment LLM-as-agent benchmark covering eight interactive environments for reasoning, decision-making, tool use, and instruction following.

homepageUrl

description

Canonical AgentBench artifact for broad LLM-as-agent evaluation.

belongs_to_benchmark1

None.