Han, S. ; Huang, J. ; Foppiano, F. ; Prehn, C. ; Adamski, J.* ; Suhre, K.* ; Li, Y.* ; Matullo, G.* ; Schliess, F.* ; Gieger, C. ; Peters, A. ; Wang-Sattler, R.
TIGER: Technical variation elimination for metabolomics data using ensemble learning architecture.
Brief. Bioinform. 23:bbab535 (2022)
Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publication type
Article: Journal article
Document type
Scientific Article
Thesis type
Editors
Keywords
Ensemble Learning ; Longitudinal Analysis ; Machine Learning ; Metabolomics ; Predictive Modelling
Keywords plus
Language
english
Publication Year
2022
Prepublished in Year
HGF-reported in Year
2022
ISSN (print) / ISBN
1467-5463
e-ISSN
1477-4054
ISBN
Book Volume Title
Conference Title
Conference Date
Conference Location
Proceedings Title
Quellenangaben
Volume: 23,
Issue: 2,
Pages: ,
Article Number: bbab535
Supplement: ,
Series
Publisher
Oxford University Press
Publishing Place
Day of Oral Examination
0000-00-00
Advisor
Referee
Examiner
Topic
University
University place
Faculty
Publication date
0000-00-00
Application date
0000-00-00
Patent owner
Further owners
Application country
Patent priority
Reviewing status
Peer reviewed
POF-Topic(s)
30202 - Environmental Health
30205 - Bioengineering and Digital Health
30505 - New Technologies for Biomedical Discoveries
Research field(s)
Genetics and Epidemiology
Enabling and Novel Technologies
PSP Element(s)
G-504091-003
G-506700-001
A-630710-001
G-504091-004
G-504000-010
Grants
Ministry of Education
Copyright
Erfassungsdatum
2022-02-08