PuSH - Publication Server of Helmholtz Zentrum München

Wahl, S. ; Boulesteix, A.L.* ; Zierer, A. ; Thorand, B. ; Avan de Wiel, M.*

Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.

BMC Med. Res. Methodol. 16:144 (2016)
Publ. Version/Full Text DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
BACKGROUND: Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. METHODS: In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. RESULTS: Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. CONCLUSIONS: When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.
Altmetric
Additional Metrics?
Edit extra informations Login
Publication type Article: Journal article
Document type Scientific Article
Corresponding Author
Keywords Bootstrap ; Cross-validation ; Incomplete Data ; Internal Validation ; Mice ; Missing Values ; Multiple Imputation ; Prediction Model ; Predictive Performance ; Resampling; Imputed Data; Data Sets; Microarray Classification; Regression-models; Prognostic Models; Cross-validation; Risk Prediction; Roc Curves; Error; Reclassification
e-ISSN 1471-2288
Quellenangaben Volume: 16, Issue: , Pages: , Article Number: 144 Supplement: ,
Publisher BioMed Central
Publishing Place London
Non-patent literature Publications
Reviewing status Peer reviewed