PuSH - Publikationsserver des Helmholtz Zentrums München: Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study.

Navigation

Startseite

English

Recherche

Erweiterte Suche

Durchblättern nach ...

... Zeitschriften

... Publikationstypen

... Forschungsdaten

... Erscheinungsjahr

Publikationen im Überblick

Hilfe & Kontakt

Ansprechpartner

Hilfe

Datenschutz

Huang, Y. ; Thede, L. ; Mancini, M.* ; Xu, W.* ; Akata, Z.

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study.

In: (Pattern Recognition). Berlin [u.a.]: Springer, 2026. 320 - 336 (Lect. Notes Comput. Sc. ; 16125 LNCS)

DOI

Open Access Green möglich sobald Postprint bei der ZB eingereicht worden ist.

Abstract
Metriken
Zusatzinfos

While Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities, their substantial computational and memory requirements pose significant barriers to practical deployment. Current parameter reduction techniques primarily involve training MLLMs from Small Language Models (SLMs), but these methods offer limited flexibility and remain computationally intensive. To address this gap, we propose to directly compress existing MLLMs through structural pruning combined with efficient recovery training. Specifically, we investigate two structural pruning paradigms—layerwise and widthwise pruning—applied to the language model backbone of MLLMs, alongside supervised finetuning and knowledge distillation. Additionally, we assess the feasibility of conducting recovery training with only a small fraction of the available data. Our results show that widthwise pruning generally maintains better performance in low-resource scenarios with limited computational resources or insufficient finetuning data. As for the recovery training, finetuning only the multimodal projector is sufficient at small compression levels (<20%). Furthermore, a combination of supervised finetuning and hidden-state distillation yields optimal recovery across various pruning levels. Notably, effective recovery can be achieved with as little as 5% of the original training data, while retaining over 95% of original performance. Through empirical study on two representative MLLMs, i.e., LLaVA-v1.5-7B and Bunny-v1.0-3B, this study offers actionable insights for practitioners aiming to compress MLLMs effectively without extensive computation resources or sufficient data.

Altmetric

Weitere Metriken?

[➜Einloggen]

Zusatzinfos bearbeiten [➜Einloggen]

Publikationstyp Artikel: Konferenzbeitrag

Schlagwörter Model Compression ; Multimodal Llms ; Pruning

ISSN (print) / ISBN 0302-9743

e-ISSN 1611-3349

Konferenztitel Pattern Recognition

Zeitschrift Lecture Notes in Computer Science

Quellenangaben Band: 16125 LNCS, Seiten: 320 - 336

Verlag Springer

Verlagsort Berlin [u.a.]

Institut(e) Helmholtz Artifical Intelligence Cooperation Unit (HAICU)