Agentic AI Atlas

II.

Benchmark overview

benchmark:berkeley-function-calling

Reference · live

Berkeley Function Calling Leaderboard (BFCL) overview

BFCL (Berkeley Function Calling Leaderboard, from the Gorilla project) is the canonical public leaderboard for LLM function- calling and tool-use accuracy across simple, parallel, multiple, and live function-calling categories. Versions v1, v2 (live), and v3 (multi-turn / multi-step) have been released.

BenchmarkOutgoing · 1Incoming · 4

Attributes

displayName

Berkeley Function Calling Leaderboard (BFCL)

homepageUrl

https://gorilla.cs.berkeley.edu/leaderboard.html

kind

model-only

targetsKind

ModelVersion

description

Outgoing edges

covers1

skill-area:tool-use·SkillAreaLLM Tool Use

Incoming edges

for_benchmark2

eval-run:bfcl.claude-sonnet-4-5.2025-09·EvalRun
eval-run:bfcl.gpt-5.2025-08·EvalRun

scored_against2

eval-result:bfcl.claude-sonnet-4-5.001·EvalResult
eval-result:bfcl.gpt-5.001·EvalResult