PuSH - Publikationsserver des Helmholtz Zentrums München

Lim, H. ; Choi, J.* ; Choo, J.* ; Schneider, S.

Sparse autoencoders reveal selective remapping of visual concepts during adaptation.

In: (13th International Conference on Learning Representations Iclr 2025, 24 - 28 April 2025, Singapur). 2025. 46012-46037 (13th International Conference on Learning Representations Iclr 2025)
Postprint
Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Konferenzbeitrag
Sprache englisch
Veröffentlichungsjahr 2025
HGF-Berichtsjahr 2025
ISSN (print) / ISBN [9798331320850]
Konferenztitel 13th International Conference on Learning Representations Iclr 2025
Konferzenzdatum 24 - 28 April 2025
Konferenzort Singapur
Quellenangaben Band: , Heft: , Seiten: 46012-46037 Artikelnummer: , Supplement: ,
POF Topic(s) 30205 - Bioengineering and Digital Health
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-503800-001
Scopus ID 105010241693
Erfassungsdatum 2025-07-17