II.
Benchmark overview
Reference · livebenchmark:hellaswag
HellaSwag overview
HellaSwag is a commonsense-reasoning benchmark — multiple-choice sentence completion drawn from ActivityNet captions and WikiHow, adversarially filtered against ELMo and BERT.
Attributes
displayName
HellaSwag
homepageUrl
kind
reasoning
targetsKind
ModelVersion
description
HellaSwag is a commonsense-reasoning benchmark — multiple-choice
sentence completion drawn from ActivityNet captions and WikiHow,
adversarially filtered against ELMo and BERT.
Outgoing edges
None.
Incoming edges
belongs_to_benchmark1
- test-set:hellaswag-validation·TestSetHellaSwag validation
bounds_subject1
- scope-boundary:hellaswag.scope·ScopeBoundary
for_benchmark1
scored_against1
- eval-result:hellaswag.claude-opus-4-5.001·EvalResult