Type 2 diabetes mellitus (T2D) presents a major health and economic burden that could be alleviated with improved early prediction and intervention. While standard risk factors have shown good predictive performance, we show that the use of blood-based DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk. Previous studies have been largely constrained by linear assumptions, the use of cytosine-guanine pairs one-at-a-time and binary outcomes. We present a flexible approach (via an R package, MethylPipeR) based on a range of linear and tree-ensemble models that incorporate time-to-event data for prediction. Using the Generation Scotland cohort (training set ncases = 374, ncontrols = 9,461; test set ncases = 252, ncontrols = 4,526) our best-performing model (area under the receiver operating characteristic curve (AUC) = 0.872, area under the precision-recall curve (PRAUC) = 0.302) showed notable improvement in 10-year onset prediction beyond standard risk factors (AUC = 0.839, precision-recall AUC = 0.227). Replication was observed in the German-based KORA study (n = 1,451, ncases = 142, P = 1.6 × 10-5).
GrantsChief Scientist Office of the Scottish Government Health Directorates Scottish Funding Council Medical Research Council UK Brain & Behavior Research Foundation Royal College of Physicians of Edinburgh University of Edinburgh University of Helsinki joint PhD program in Human Genomics Alzheimer's Research UK Wellcome Trust United Kingdom Research and Innovation UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh NHS Research Scotland Chief Scientist Office of the Scottish Government Helmholtz Zentrum Munchen-German Research Center for Environmental Health - German Federal Ministry of Education and Research State of Bavaria Munich Center of Health Sciences German Centre for Cardiovascular Research - Bavarian State Ministry of Health and Care Alzheimer's Society