PuSH - Publication Server of Helmholtz Zentrum München

Yu, Z.* ; Zhang, S.* ; Qiao, N.* ; Zhao, Y.* ; Yu, L.* ; Peng, T. ; Zhang, X.Y.*

FM2: Fusing multiple foundation models for pathology image analysis via disentangled consensus-divergence representation.

Inf. Fusion 127:103840 (2026)
DOI
Open Access Green as soon as Postprint is submitted to ZB.
Foundation models (FMs) have emerged and achieved good performance on numerous downstream tasks. However, different FMs, like CLIP, DINOv2, and SAM, are trained on diverse datasets with varying methodologies, exhibiting model-specific characteristics and encoding scenario-specific knowledge. Efforts to unify the strengths of these different FMs through knowledge distillation show promise but remain challenging due to the inconsistencies in feature distributions, which can lead to suboptimal convergence and reduced generalizability. In this paper, we propose a novel aggregation framework, FM 2 (Fusing Multiple Foundation Models), which leverages disentangled representation learning to address these challenges. Specifically, our approach effectively disentangles consensus and divergence features from multiple expert FMs and then aligns them into a unified and robust representation. Extensive experiments on datasets with over 1,000,000 pathology images across various tasks, including zero-shot and few-shot classification, cross-modal retrieval, and survival analysis, demonstrate that our method consistently outperforms state-of-the-art models, delivering superior accuracy and reliability across various clinical scenarios. Additionally, the visualizations offer insights into the model’s ability to harmonize knowledge across different FMs, highlighting its potential for enhancing diagnostic precision in medical imaging. The significant advancements demonstrated in our work underscore the promise of effectively aligning FMs, showing potential for broadening their application not only in pathology but also in other medical imaging domains.
Altmetric
Additional Metrics?
Edit extra informations Login
Publication type Article: Journal article
Document type Scientific Article
Keywords Disentangled Representation ; Foundation Model ; Knowledge Distillation ; Pathology Image ; Teacher-student Network
ISSN (print) / ISBN 1566-2535
e-ISSN 1872-6305
Quellenangaben Volume: 127, Issue: , Pages: , Article Number: 103840 Supplement: ,
Publisher Elsevier
Publishing Place Radarweg 29, 1043 Nx Amsterdam, Netherlands
Reviewing status Peer reviewed
Grants China Postdoctoral Science Foundation
Natural Science Foundation of Shanghai
Shanghai Key Laboratory of Child Brain and Development
National Natural Science Foundation of China