PuSH - Publikationsserver des Helmholtz Zentrums München: Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models.

Navigation

Startseite

English

Recherche

Erweiterte Suche

Durchblättern nach ...

... Zeitschriften

... Publikationstypen

... Forschungsdaten

... Erscheinungsjahr

Publikationen im Überblick

Hilfe & Kontakt

Ansprechpartner

Hilfe

Datenschutz

Schulze Buschoff, L.M. ; Voudouris, K. ; Akata, E. ; Bethge, M.* ; Tenenbaum, J.B.* ; Schulz, E.

Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models.

In: (42nd International Conference on Machine Learning, ICML 2025, 13-19 July 2025, Vancouver). 2025. 53645-53662 (Proceedings of Machine Learning Research ; 267)

Verlagsversion

	Open Access Hybrid

Abstract
Metriken
Zusatzinfos

Pre-trained vision language models still fall short of human visual cognition. In an effort to improve visual cognition and align models with human behavior, we introduce visual stimuli and human judgments on visual cognition tasks, allowing us to systematically evaluate performance across cognitive domains under a consistent environment. We fine-tune models on ground truth data for intuitive physics and causal reasoning and find that this improves model performance in the respective fine-tuning domain. Furthermore, it can improve model alignment with human behavior. However, we find that task-specific fine-tuning does not contribute to robust human-like generalization to data with other visual characteristics or to tasks in other cognitive domains.

Weitere Metriken?

[➜Einloggen]

Zusatzinfos bearbeiten [➜Einloggen]

Publikationstyp Artikel: Konferenzbeitrag

Konferenztitel 42nd International Conference on Machine Learning, ICML 2025

Konferzenzdatum 13-19 July 2025

Konferenzort Vancouver

Quellenangaben Band: 267, Seiten: 53645-53662

Institut(e) Institute of AI for Health (AIH)