Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiRecord
Agentic AI Atlas · EvalPlus
benchmark:bigcode-evalplusa5c.ai
Search record views/
Record · tabs

Available views

II.Record viewspp. 1 - 1
overviewjsongraph
II.
Benchmark overview

benchmark:bigcode-evalplus

Reference · live

EvalPlus overview

EvalPlus extends HumanEval and MBPP with 80x more high-quality tests per task to expose flaky correctness in LLM-generated code, yielding HumanEval+ and MBPP+ leaderboards.

BenchmarkOutgoing · 0Incoming · 8

Attributes

displayName
EvalPlus
homepageUrl
https://evalplus.github.io/
kind
code-functional-correctness
targetsKind
ModelVersion
description
EvalPlus extends HumanEval and MBPP with 80x more high-quality tests per task to expose flaky correctness in LLM-generated code, yielding HumanEval+ and MBPP+ leaderboards.

Outgoing edges

None.

Incoming edges

belongs_to_benchmark1
  • test-set:bigcode-evalplus·TestSetBigCode EvalPlus
bounds_subject1
  • scope-boundary:bigcode-evalplus.scope·ScopeBoundary
for_benchmark3
  • eval-run:human-eval-plus.claude-sonnet-4-5.2025-09·EvalRun
  • eval-run:human-eval-plus.gpt-5.2025-08·EvalRun
  • eval-run:evalplus.gpt-5.2025-08·EvalRun
scored_against3
  • eval-result:human-eval-plus.claude-sonnet-4-5.001·EvalResult
  • eval-result:human-eval-plus.gpt-5.001·EvalResult
  • eval-result:evalplus.gpt-5.001·EvalResult

Related pages

No related wiki pages for this record.

Shortcuts

Open in graph
Browse node kind