PuSH - Publikationsserver des Helmholtz Zentrums München

von Kleist, H. ; Wendland, J.R.* ; Shpitser, I.* ; Marr, C.

Feature Importance Metrics in the Presence of Missing Data.

In: (42nd International Conference on Machine Learning, ICML 2025, 13-19 July 2025, Vancouver). 2025. 61769-61789 (Proceedings of Machine Learning Research ; 267)
Verlagsversion
Open Access Hybrid
Feature importance metrics are critical for interpreting machine learning models and understanding the relevance of individual features. However, real-world data often exhibit missingness, thereby complicating how feature importance should be evaluated. We introduce the distinction between two evaluation frameworks under missing data: (1) feature importance under the full data, as if every feature had been fully measured, and (2) feature importance under the observed data, where missingness is governed by the current measurement policy. While the full data perspective offers insights into the data generating process, it often relies on unrealistic assumptions and cannot guide decisions when missingness persists at model deployment. Since neither framework directly informs improvements in data collection, we additionally introduce the feature measurement importance gradient (FMIG), a novel, model-agnostic metric that identifies features that should be measured more frequently to enhance predictive performance. Using synthetic data, we illustrate key differences between these metrics and the risks of conflating them.
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Konferenzbeitrag
Konferenztitel 42nd International Conference on Machine Learning, ICML 2025
Konferzenzdatum 13-19 July 2025
Konferenzort Vancouver
Quellenangaben Band: 267, Heft: , Seiten: 61769-61789 Artikelnummer: , Supplement: ,