II.
Benchmark overview
Reference · livebenchmark:android-world
AndroidWorld overview
AndroidWorld (Google Research, 2024) is a dynamic Android environment benchmark of 116 tasks across 20 real apps, used to evaluate autonomous mobile-UI agents on natural-language goals with stochastic real-app state.
Attributes
displayName
AndroidWorld
homepageUrl
kind
full-stack
targetsKind
AgentVersion
description
AndroidWorld (Google Research, 2024) is a dynamic Android
environment benchmark of 116 tasks across 20 real apps, used to
evaluate autonomous mobile-UI agents on natural-language goals
with stochastic real-app state.
Outgoing edges
covers1
- skill-area:android-native·SkillAreaAndroid Native Development
Incoming edges
belongs_to_benchmark1
- test-set:androidworld-programmatic-tasks·TestSetAndroidWorld programmatic task suite
for_benchmark1
scored_against1
- eval-result:android-world.gemini-2-5-pro.001·EvalResult