PuSH - Publication Server of Helmholtz Zentrum München: A systematic study of In-the-wild model merging for large language models.

Navigation

Home

Deutsch

Research

Advanced Search

Browse by ...

... Journal

... Publication Type

... Research Data

... Publication Year

Publication overview

Support & Contact

Contact persons

Help

Data protection

Hitit, O.K.* ; Girrbach, L.* ; Akata, Z.

A systematic study of In-the-wild model merging for large language models.

Trans. Machine Learn. Res. 2026-March, accepted (2026)

Postprint

Abstract
Metrics
Extra information

Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for settings where all merged experts have distinct roles and are tuned on clearly separated tasks also hold in settings where the merged experts do not have clearly distinct roles, but are trained on overlapping or even conflicting objectives. To evaluate this setting, we present a largescale, systematic evaluation of “in-the-wild” model merging of heterogeneous experts, that may have been trained on overlapping or conflicting objectives. Concretely, we evaluate six state-of-the-art merging methods, including recent subspace methods, across four openweight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM benchmarks. Evaluating through standardized benchmarks, we measure both the probability that a model merged from a heterogeneous set of experts outperforms the base model and we measure relative gains over the best individual checkpoint. Our results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs in this “in-the-wild” setting. Other interference-aware and subspace merging methods typically do not result in notable improvements over the base model. Our findings indicate that current merging techniques mostly do not enable extracting useful weight updates from heterogeneous and potentially conflicting versions. This motivates the design of LLM-specific merging algorithms and merging-aware fine-tuning methods. Code is available at https://github.com/kaganhitit11/mergeval.

Additional Metrics?

[➜Log in]

Edit extra informations Login

Publication type Article: Journal article

Document type Scientific Article

ISSN (print) / ISBN 2835-8856

e-ISSN 2835-8856

Journal Transactions on Machine Learning Research

Quellenangaben Volume: 2026-March

Publisher Journal of Machine Learning Research Inc.

Reviewing status Peer reviewed

Institute(s) Helmholtz Artifical Intelligence Cooperation Unit (HAICU)