II.
Benchmark overview
Reference · livebenchmark:human-eval
HumanEval overview
Hand-written programming problems for evaluating code generation.
Attributes
displayName
HumanEval
homepageUrl
kind
function-completion
targetsKind
ModelVersion
description
Hand-written programming problems for evaluating code generation.
Outgoing edges
covers1
- skill-area:python-implementation·SkillAreaPython Function Implementation
Incoming edges
belongs_to_benchmark1
- test-set:humaneval-original·TestSetHumanEval original problem set
bounds_subject1
- scope-boundary:human-eval.scope·ScopeBoundary
for_benchmark9
- eval-run:human-eval.qwen-2-5-72b.2024-09·EvalRun
- eval-run:human-eval.qwen-2-5-coder-32b.2024-11·EvalRun
- eval-run:human-eval.claude-sonnet-4-6.2025-11·EvalRun
- eval-run:human-eval.deepseek-v3.2024-12·EvalRun
- eval-run:human-eval.llama-3-1-405b.2024-07·EvalRun
- eval-run:human-eval.llama-3-3-70b.2024-12·EvalRun
- eval-run:human-eval.mistral-large-2.2024-07·EvalRun
- eval-run:human-eval.codestral-25-01.2025-01·EvalRun
- eval-run:human-eval.gpt-5.2025-08·EvalRun
scored_against9
- eval-result:human-eval.qwen-2-5-72b.001·EvalResult
- eval-result:human-eval.qwen-2-5-coder-32b.001·EvalResult
- eval-result:human-eval.claude-sonnet-4-6.001·EvalResult
- eval-result:human-eval.deepseek-v3.001·EvalResult
- eval-result:human-eval.llama-3-1-405b.001·EvalResult
- eval-result:human-eval.llama-3-3-70b.001·EvalResult
- eval-result:human-eval.mistral-large-2.001·EvalResult
- eval-result:human-eval.codestral-25-01.001·EvalResult
- eval-result:human-eval.gpt-5.001·EvalResult