Agentic AI Atlas

Agentic AI Atlasby a5c.ai

Overview Wiki Graph For Agents Edges Search Workspace

/

GitHub Docs Discord

Dark mode

iiRecord

Agentic AI Atlas · HumanEval

benchmark:human-evala5c.ai

Search record views/

Record · tabs

Available views

II.Record viewspp. 1 - 1

overview json graph

II.

Benchmark overview

benchmark:human-eval

Reference · live

HumanEval overview

Hand-written programming problems for evaluating code generation.

BenchmarkOutgoing · 1Incoming · 20

Attributes

displayName

HumanEval

homepageUrl

https://github.com/openai/human-eval

kind

function-completion

targetsKind

ModelVersion

description

Hand-written programming problems for evaluating code generation.

Outgoing edges

covers1

skill-area:python-implementation·SkillAreaPython Function Implementation

Incoming edges

belongs_to_benchmark1

test-set:humaneval-original·TestSetHumanEval original problem set

bounds_subject1

scope-boundary:human-eval.scope·ScopeBoundary

for_benchmark9

eval-run:human-eval.qwen-2-5-72b.2024-09·EvalRun
eval-run:human-eval.qwen-2-5-coder-32b.2024-11·EvalRun
eval-run:human-eval.claude-sonnet-4-6.2025-11·EvalRun
eval-run:human-eval.deepseek-v3.2024-12·EvalRun
eval-run:human-eval.llama-3-1-405b.2024-07·EvalRun
eval-run:human-eval.llama-3-3-70b.2024-12·EvalRun
eval-run:human-eval.mistral-large-2.2024-07·EvalRun
eval-run:human-eval.codestral-25-01.2025-01·EvalRun
eval-run:human-eval.gpt-5.2025-08·EvalRun

scored_against9

eval-result:human-eval.qwen-2-5-72b.001·EvalResult
eval-result:human-eval.qwen-2-5-coder-32b.001·EvalResult
eval-result:human-eval.claude-sonnet-4-6.001·EvalResult
eval-result:human-eval.deepseek-v3.001·EvalResult
eval-result:human-eval.llama-3-1-405b.001·EvalResult
eval-result:human-eval.llama-3-3-70b.001·EvalResult
eval-result:human-eval.mistral-large-2.001·EvalResult
eval-result:human-eval.codestral-25-01.001·EvalResult
eval-result:human-eval.gpt-5.001·EvalResult

Related pages

No related wiki pages for this record.

Shortcuts

Browse node kind