möglich sobald bei der ZB eingereicht worden ist.
A Little Less Conversation, a Little More Action, Please: Investigating the Physical Common-Sense of LLMs in a 3D Embodied Environment.
In: (PRICAI 2025: Trends in Artificial Intelligence). Berlin [u.a.]: Springer, 2026. 272-287 (Lect. Notes Comput. Sc. ; 16453 LNAI)
Large Language Models (LLMs) are increasingly used to reason about everyday physical environments and control the actions of agentic systems. The vast majority of research into how capable LLMs are at reasoning in physical environments has used static text- or image-based benchmarks, which do not capture the complexity and nuance of real-life physical processes. To address this issue, we present LLM-AAI, a framework allowing direct comparison between LLMs and other embodied agents, and use it to perform the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. Our framework employs the Animal-AI environment, a simulated 3D virtual laboratory, and we compare LLMs to the entrants of the 2019 Animal-AI Olympics competition and to human children. Our results show that LLMs are currently outperformed by human children on tasks from the competition. We argue that this approach allows the study of physical reasoning using ecologically valid experiments drawn directly from cognitive science, improving the predictability and reliability of LLMs (Additional supporting materials can be found at: https://github.com/Kinds-of-Intelligence-CFI/llm-aai-supporting-materials. Our full code can be found at: https://github.com/Kinds-of-Intelligence-CFI/llm-aai).
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten
[➜Einloggen]
Publikationstyp
Artikel: Konferenzbeitrag
Schlagwörter
Animal Cognition ; Cognitive Science ; Evaluation ; Llm Agents
ISSN (print) / ISBN
0302-9743
e-ISSN
1611-3349
Konferenztitel
PRICAI 2025: Trends in Artificial Intelligence
Zeitschrift
Lecture Notes in Computer Science
Quellenangaben
Band: 16453 LNAI,
Seiten: 272-287
Verlag
Springer
Verlagsort
Berlin [u.a.]
Institut(e)
Institute of AI for Health (AIH)