PuSH - Publikationsserver des Helmholtz Zentrums München

Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.

Metabolomics 14:128 (2018)
Postprint DOI PMC
Open Access Green
BACKGROUND: Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation. METHODS: We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci. RESULTS: Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable. CONCLUSION: Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
3.511
0.841
42
77
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Batch Effects ; K-nearest Neighbor ; Limit Of Detection ; Mice ; Mass Spectrometry ; Missing Values Imputation ; Untargeted Metabolomics; Multiple Imputation; Human Blood; Networks; Limit
Sprache englisch
Veröffentlichungsjahr 2018
HGF-Berichtsjahr 2018
ISSN (print) / ISBN 1573-3882
e-ISSN 1573-3890
Zeitschrift Metabolomics
Quellenangaben Band: 14, Heft: 10, Seiten: , Artikelnummer: 128 Supplement: ,
Verlag Springer
Verlagsort New York, NY
Begutachtungsstatus Peer reviewed
POF Topic(s) 30205 - Bioengineering and Digital Health
30505 - New Technologies for Biomedical Discoveries
30202 - Environmental Health
30201 - Metabolic Health
30501 - Systemic Analysis of Genetic and Environmental Factors that Impact Health
90000 - German Center for Diabetes Research
Forschungsfeld(er) Enabling and Novel Technologies
Genetics and Epidemiology
PSP-Element(e) G-554100-001
G-503700-001
G-503800-001
G-504091-002
G-500600-001
G-504100-001
G-504000-001
G-501900-402
G-504090-001
Scopus ID 85053638868
PubMed ID 30830398
Erfassungsdatum 2018-09-27