Robust Huber-LASSO for improved prediction of protein, metabolite and gene expression levels relying on individual genotype data.
Brief. Bioinform. 22:bbaa230 (2021)
Least absolute shrinkage and selection operator (LASSO) regression is often applied to select the most promising set of single nucleotide polymorphisms (SNPs) associated with a molecular phenotype of interest. While the penalization parameter λ restricts the number of selected SNPs and the potential model overfitting, the least-squares loss function of standard LASSO regression translates into a strong dependence of statistical results on a small number of individuals with phenotypes or genotypes divergent from the majority of the study population-typically comprised of outliers and high-leverage observations. Robust methods have been developed to constrain the influence of divergent observations and generate statistical results that apply to the bulk of study data, but they have rarely been applied to genetic association studies. In this article, we review, for newcomers to the field of robust statistics, a novel version of standard LASSO that utilizes the Huber loss function. We conduct comprehensive simulations and analyze real protein, metabolite, mRNA expression and genotype data to compare the stability of penalization, the cross-iteration concordance of the model, the false-positive and true-positive rates and the prediction accuracy of standard and robust Huber-LASSO. Although the two methods showed controlled false-positive rates ≤2.1% and similar true-positive rates, robust Huber-LASSO outperformed standard LASSO in the accuracy of predicted protein, metabolite and gene expression levels using individual SNP data. The conducted simulations and real-data analyses show that robust Huber-LASSO represents a valuable alternative to standard LASSO in genetic studies of molecular phenotypes.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Typ der Hochschulschrift
Herausgeber
Schlagwörter
Huber Loss Function ; Lasso ; Genetic Prediction ; Molecular Data ; Robust Statistics; Regression; Association; Selection; Atlas
Keywords plus
Sprache
englisch
Veröffentlichungsjahr
2021
Prepublished im Jahr
2020
HGF-Berichtsjahr
2020
ISSN (print) / ISBN
1467-5463
e-ISSN
1477-4054
ISBN
Bandtitel
Konferenztitel
Konferzenzdatum
Konferenzort
Konferenzband
Quellenangaben
Band: 22,
Heft: 4,
Seiten: ,
Artikelnummer: bbaa230
Supplement: ,
Reihe
Verlag
Oxford University Press
Verlagsort
Great Clarendon St, Oxford Ox2 6dp, England
Tag d. mündl. Prüfung
0000-00-00
Betreuer
Gutachter
Prüfer
Topic
Hochschule
Hochschulort
Fakultät
Veröffentlichungsdatum
0000-00-00
Anmeldedatum
0000-00-00
Anmelder/Inhaber
weitere Inhaber
Anmeldeland
Priorität
Begutachtungsstatus
Peer reviewed
POF Topic(s)
30202 - Environmental Health
30205 - Bioengineering and Digital Health
Forschungsfeld(er)
Genetics and Epidemiology
Enabling and Novel Technologies
PSP-Element(e)
G-504091-001
G-503891-001
Förderungen
National Institute on Aging
Biomedical Research Program at Weill Cornell Medicine in Qatar, a program funded by the Qatar Foundation
European Union's Horizon 2020 research and innovation programme
Federal Ministry of Education and Research Germany (Bundesministerium fur Bildung und Forschung, BMBF)
Copyright
Erfassungsdatum
2021-02-08