PuSH - Publication Server of Helmholtz Zentrum München

Faquih, T.* ; van Smeden, M.* ; Luo, J.* ; le Cessie, S.* ; Kastenmüller, G. ; Krumsiek, J.* ; Noordam, R.* ; van Heemst, D.* ; Rosendaal, F.R.* ; van Hylckama Vlieg, A.* ; Willems van Dijk, K.* ; Mook-Kanamori, D.O.*

A workflow for missing values imputation of untargeted metabolomics data.

Metabolites 10, 1-23:E486 (2020)
Publ. Version/Full Text Research data DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally imputed, can cause biased study results. We provided a publicly available, user-friendly R script to streamline the imputation of missing endogenous, unannotated, and xenobiotic metabolites. We evaluated the multivariate imputation by chained equations (MICE) and k-nearest neighbors (kNN) analyses implemented in our script by simulations using measured metabolites data from the Netherlands Epidemiology of Obesity (NEO) study (n = 599). We simulated missing values in four unique metabolites from different pathways with different correlation structures in three sample sizes (599, 150, 50) with three missing percentages (15%, 30%, 60%), and using two missing mechanisms (completely at random and not at random). Based on the simulations, we found that for MICE, larger sample size was the primary factor decreasing bias and error. For kNN, the primary factor reducing bias and error was the metabolite correlation with its predictor metabolites. MICE provided consistently higher performance measures particularly for larger datasets (n > 50). In conclusion, we presented an imputation workflow in a publicly available R script to impute untargeted metabolomics data. Our simulations provided insight into the effects of sample size, percentage missing, and correlation structure on the accuracy of the two imputation methods.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
4.097
0.990
2
5
Tags
Annotations
Special Publikation
Hide on homepage

Edit extra information
Edit own tags
Private
Edit own annotation
Private
Hide on publication lists
on hompage
Mark as special
publikation
Publication type Article: Journal article
Document type Scientific Article
Keywords Imputation ; K-nearest Neighbors ; Metabolon ; Multiple Imputation Using Chained Equations ; Simulation ; Untargeted Metabolomics ; Workflow
Language english
Publication Year 2020
HGF-reported in Year 2020
ISSN (print) / ISBN 2218-1989
e-ISSN 2218-1989
Journal Metabolites
Quellenangaben Volume: 10, Issue: 12, Pages: 1-23, Article Number: E486 Supplement: ,
Publisher MDPI
Publishing Place St Alban-anlage 66, Ch-4052 Basel, Switzerland
Reviewing status Peer reviewed
POF-Topic(s) 30505 - New Technologies for Biomedical Discoveries
Research field(s) Enabling and Novel Technologies
PSP Element(s) G-503700-001
Grants King Faisal Specialist Hospital & Research Center
King Abdullah Scholarship Program
China Scholarship Counsel
VELUX Stiftung
VENI grant (ZonMW-VENI Grant)
Leiden University, Research Profile Area 'Vascular and Regenerative Medicine'
Division and the Board of Directors of the Leiden University Medical Centre
Scopus ID 85098159619
PubMed ID 33256233
Erfassungsdatum 2020-12-14