PuSH - Publication Server of Helmholtz Zentrum München

Li, J.* ; Kim, S.H.* ; Müller, P.* ; Felsner, F.* ; Rueckert, D.* ; Wiestler, B.* ; Schnabel, J.A. ; Bercea, C.-I.

Language models meet anomaly detection for better interpretability and generalizability.

In: (Medical Image Computing and Computer Assisted Intervention – MICCAI 2024). Berlin [u.a.]: Springer, 2025. 113-123 (Lect. Notes Comput. Sc. ; 15401 LNCS)
Postprint DOI
Open Access Green
This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model’s generalizability to previously unseen medical conditions. The code and dataset are available at: https://github.com/compai-lab/miccai-2024-junli?tab=readme-ov-file.
Altmetric
Tags
Annotations
Special Publikation
Hide on homepage

Edit extra information
Edit own tags
Private
Edit own annotation
Private
Hide on publication lists
on hompage
Mark as special
publikation
Publication type Article: Conference contribution
Keywords Multimodal Learning ; Vision-language Models ; Vqa
Language english
Publication Year 2025
HGF-reported in Year 2025
ISSN (print) / ISBN 0302-9743
e-ISSN 1611-3349
Conference Title Medical Image Computing and Computer Assisted Intervention – MICCAI 2024
Quellenangaben Volume: 15401 LNCS, Issue: , Pages: 113-123 Article Number: , Supplement: ,
Publisher Springer
Publishing Place Berlin [u.a.]
Institute(s) Institute for Machine Learning in Biomed Imaging (IML)
POF-Topic(s) 30205 - Bioengineering and Digital Health
Research field(s) Enabling and Novel Technologies
PSP Element(s) G-507100-001
Scopus ID 105003862554
Erfassungsdatum 2025-05-22