PuSH - Publikationsserver des Helmholtz Zentrums München: A Little Less Conversation, a Little More Action, Please: Investigating the Physical Common-Sense of LLMs in a 3D Embodied Environment.

Navigation

Startseite

English

Recherche

Erweiterte Suche

Durchblättern nach ...

... Zeitschriften

... Publikationstypen

... Forschungsdaten

... Erscheinungsjahr

Publikationen im Überblick

Hilfe & Kontakt

Ansprechpartner

Hilfe

Datenschutz

Mecattaf, M.G.* ; Slater, B.* ; Tešić, M.* ; Prunty, J.* ; Voudouris, K. ; Cheke, L.G.*

A Little Less Conversation, a Little More Action, Please: Investigating the Physical Common-Sense of LLMs in a 3D Embodied Environment.

In: (PRICAI 2025: Trends in Artificial Intelligence). Berlin [u.a.]: Springer, 2026. 272-287 (Lect. Notes Comput. Sc. ; 16453 LNAI)

DOI

Abstract
Metriken
Zusatzinfos

Large Language Models (LLMs) are increasingly used to reason about everyday physical environments and control the actions of agentic systems. The vast majority of research into how capable LLMs are at reasoning in physical environments has used static text- or image-based benchmarks, which do not capture the complexity and nuance of real-life physical processes. To address this issue, we present LLM-AAI, a framework allowing direct comparison between LLMs and other embodied agents, and use it to perform the first embodied and cognitively meaningful evaluation of physical common-sense reasoning in LLMs. Our framework employs the Animal-AI environment, a simulated 3D virtual laboratory, and we compare LLMs to the entrants of the 2019 Animal-AI Olympics competition and to human children. Our results show that LLMs are currently outperformed by human children on tasks from the competition. We argue that this approach allows the study of physical reasoning using ecologically valid experiments drawn directly from cognitive science, improving the predictability and reliability of LLMs (Additional supporting materials can be found at: https://github.com/Kinds-of-Intelligence-CFI/llm-aai-supporting-materials. Our full code can be found at: https://github.com/Kinds-of-Intelligence-CFI/llm-aai).

Altmetric

Weitere Metriken?

[➜Einloggen]

Zusatzinfos bearbeiten [➜Einloggen]

Publikationstyp Artikel: Konferenzbeitrag

Schlagwörter Animal Cognition ; Cognitive Science ; Evaluation ; Llm Agents

ISSN (print) / ISBN 0302-9743

e-ISSN 1611-3349

Konferenztitel PRICAI 2025: Trends in Artificial Intelligence

Zeitschrift Lecture Notes in Computer Science

Quellenangaben Band: 16453 LNAI, Seiten: 272-287

Verlag Springer

Verlagsort Berlin [u.a.]

Institut(e) Institute of AI for Health (AIH)