as soon as is submitted to ZB.
Sparse autoencoders reveal selective remapping of visual concepts during adaptation.
In: (13th International Conference on Learning Representations Iclr 2025, 24 - 28 April 2025, Singapur). 2025. 46012-46037 (13th International Conference on Learning Representations Iclr 2025)
Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.
Annotations
Special Publikation
Hide on homepage
Publication type
Article: Conference contribution
Language
english
Publication Year
2025
HGF-reported in Year
2025
ISSN (print) / ISBN
[9798331320850]
Conference Title
13th International Conference on Learning Representations Iclr 2025
Conference Date
24 - 28 April 2025
Conference Location
Singapur
Quellenangaben
Pages: 46012-46037
Institute(s)
Institute of Computational Biology (ICB)
POF-Topic(s)
30205 - Bioengineering and Digital Health
Research field(s)
Enabling and Novel Technologies
PSP Element(s)
G-503800-001
Scopus ID
105010241693
Erfassungsdatum
2025-07-17