PuSH - Publikationsserver des Helmholtz Zentrums München

Mecattaf, M.G.* ; Slater, B.* ; Tešić, M.* ; Prunty, J.* ; Voudouris, K. ; Cheke, L.G.*

A Little Less Conversation, a Little More Action, Please: Investigating the Physical Common-Sense of LLMs in a 3D Embodied Environment.

In: (PRICAI 2025: Trends in Artificial Intelligence). Berlin [u.a.]: Springer, 2026. 272-287 (Lect. Notes Comput. Sc. ; 16453 LNAI)
DOI
Large Language Models (LLMs) are increasingly used to reason about everyday physical environments and control the actions of agentic systems. The vast majority of research into how capable LLMs are at reasoning in physical environments has used static text- or image-based benchmarks, which do not capture the complexity and nuance of real-life physical processes. To address this issue, we present LLM-AAI, a framework allowing direct comparison between LLMs and other embodied agents, and use it to perform the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. Our framework employs the Animal-AI environment, a simulated 3D virtual laboratory, and we compare LLMs to the entrants of the 2019 Animal-AI Olympics competition and to human children. Our results show that LLMs are currently outperformed by human children on tasks from the competition. We argue that this approach allows the study of physical reasoning using ecologically valid experiments drawn directly from cognitive science, improving the predictability and reliability of LLMs (Additional supporting materials can be found at: https://github.com/Kinds-of-Intelligence-CFI/llm-aai-supporting-materials. Our full code can be found at: https://github.com/Kinds-of-Intelligence-CFI/llm-aai).
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Konferenzbeitrag
Schlagwörter Animal Cognition ; Cognitive Science ; Evaluation ; Llm Agents
ISSN (print) / ISBN 0302-9743
e-ISSN 1611-3349
Konferenztitel PRICAI 2025: Trends in Artificial Intelligence
Quellenangaben Band: 16453 LNAI, Heft: , Seiten: 272-287 Artikelnummer: , Supplement: ,
Verlag Springer
Verlagsort Berlin [u.a.]