Agentic AI Atlasby a5c.ai
OverviewWikiGraphFor AgentsEdgesSearchWorkspace
/
GitHubDocsDiscord
iiRecord
Agentic AI Atlas · GPQA
benchmark:gpqaa5c.ai
Search record views/
Record · tabs

Available views

II.Record viewspp. 1 - 1
overviewjsongraph
II.
Benchmark overview

benchmark:gpqa

Reference · live

GPQA overview

GPQA (Graduate-Level Google-Proof Q&A) by Rein et al. (2023) is a 448-question multiple-choice benchmark in biology, chemistry, and physics written and validated by domain-expert PhDs. Designed to be "Google-proof" — non-experts with web access score ~34%, in-domain PhDs score ~65%. The Diamond subset (198 questions) is the hardest tier and is the standard reported number in vendor announcements.

BenchmarkOutgoing · 3Incoming · 20

Attributes

displayName
GPQA
homepageUrl
https://github.com/idavidrein/gpqa
kind
model-only
targetsKind
ModelVersion
description
GPQA (Graduate-Level Google-Proof Q&A) by Rein et al. (2023) is a 448-question multiple-choice benchmark in biology, chemistry, and physics written and validated by domain-expert PhDs. Designed to be "Google-proof" — non-experts with web access score ~34%, in-domain PhDs score ~65%. The Diamond subset (198 questions) is the hardest tier and is the standard reported number in vendor announcements.

Outgoing edges

targets2
  • model:claude-opus-4-7@current·ModelVersionClaude 4.7 Opus
  • model:claude-opus-4-6@current·ModelVersionClaude 4.6 Opus
uses_test_set1
  • test-set:gpqa-diamond·TestSetGPQA Diamond

Incoming edges

for_benchmark12
  • eval-run:gpqa.claude-haiku-4-5.2025-10·EvalRun
  • eval-run:gpqa-diamond.claude-opus-4-5.2025-09·EvalRun
  • eval-run:gpqa.deepseek-r1.2025-01·EvalRun
  • eval-run:gpqa.gemini-2-5-pro.2025-06·EvalRun
  • eval-run:gpqa-diamond.gemini-2-5-pro.2025-06·EvalRun
  • eval-run:gpqa-diamond.gemini-3-1-pro.2026-02-19·EvalRun
  • eval-run:gpqa-diamond.gemini-3-pro.2025-11-18·EvalRun
  • eval-run:gpqa.gpt-5.2025-08·EvalRun
  • eval-run:gpqa-diamond.gpt-5.2025-08·EvalRun
  • eval-run:gpqa-diamond.gpt-5-4.2026-03-17·EvalRun
  • eval-run:gpqa-diamond.gpt-5-4-mini.2026-03-17·EvalRun
  • eval-run:gpqa.claude-sonnet-4-5.2025-09·EvalRun
scored_against8
  • eval-result:gpqa-diamond.claude-opus-4-5.001·EvalResult
  • eval-result:gpqa.deepseek-r1.001·EvalResult
  • eval-result:gpqa-diamond.gemini-2-5-pro.001·EvalResult
  • eval-result:gpqa-diamond.gemini-3-1-pro.2026-02-19.accuracy·EvalResult
  • eval-result:gpqa-diamond.gemini-3-pro.2025-11-18.accuracy·EvalResult
  • eval-result:gpqa-diamond.gpt-5.001·EvalResult
  • eval-result:gpqa-diamond.gpt-5-4.2026-03-17.accuracy·EvalResult
  • eval-result:gpqa-diamond.gpt-5-4-mini.2026-03-17.accuracy·EvalResult

Related pages

No related wiki pages for this record.

Shortcuts

Open in graph
Browse node kind