TY - JOUR AB - The pattern graph framework solves a wide range of missing data problems with nonignorable mechanisms. However, it faces two challenges of assessability and interpretability, particularly important in safety-critical problems such as clinical diagnosis: (i) How can one assess the validity of the framework's a priori assumption and make necessary adjustments to accommodate known information about the problem? (ii) How can one interpret the process of exponential tilting used for sensitivity analysis in the pattern graph framework and choose the tilt perturbations based on meaningful real-world quantities? In this paper, we introduce Informed Sensitivity Analysis, an extension of the pattern graph framework that enables us to incorporate substantive knowledge about the missingness mechanism into the pattern graph framework. Our extension allows us to examine the validity of assumptions underlying pattern graphs and interpret sensitivity analysis results in terms of realistic problem characteristics. We apply our method to a prevalent nonignorable missing data scenario in clinical research. We validate and compare our method's results of our method with a number of widely-used missing data methods, including Unweighted CCA, KNN Imputer, MICE, and MissForest. The validation is done using both boot-strapped simulated experiments as well as real-world clinical observations in the MIMIC-III public dataset. AU - Zamanian, A.* AU - Ahmidi, N. AU - Drton, M.* C1 - 68199 C2 - 54831 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 5419-5450 TI - Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms. JO - Stat. Med. VL - 42 IS - 29 PB - Wiley PY - 2023 SN - 0277-6715 ER - TY - JOUR AB - The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host-associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host-related features and amplicon-derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB-RRR) and negative binomial co-sparse factor regression (NB-FAR). While NB-RRR encodes the underlying dependency among the microbial abundances as outcomes and the host-associated features as predictors through a rank-constrained coefficient matrix, NB-FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit-rank components of the coefficient matrix sequentially, effectively delivering interpretable bi-clusters of taxa and host-associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block-wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families. AU - Mishra, A.K.* AU - Müller, C.L. C1 - 64867 C2 - 52569 SP - 2786-2803 TI - Negative binomial factor regression with application to microbiome data analysis. JO - Stat. Med. VL - 41 IS - 15 PY - 2022 SN - 0277-6715 ER - TY - JOUR AB - Analyzing epidemiological data with simplified mathematical models of disease development provides a link between the time-course of incidence and the underlying biological processes. Here we point out that considerable modeling flexibility is gained if the model is solved by simulation only. To this aim, a model of atherosclerosis is proposed: a Markov Chain with continuous state space which represents the coronary artery intimal surface area involved with atherosclerotic lesions of increasing severity. Myocardial infarction rates are assumed to be proportional to the area of most severe lesions. The model can be fitted simultaneously to infarction incidence rates observed in the KORA registry, and to the age-dependent prevalence and extent of atherosclerotic lesions in the PDAY study. Moreover, the simulation approach allows for non-linear transition rates, and to consider at the same time randomness and inter-individual heterogeneity. Interestingly, the fit revealed significant age dependence of parameters in females around the age of menopause, qualitatively reproducing the known vascular effects of female sex hormones. For males, the incidence curve flattens for higher ages. According to the model, frailty explains this flattening only partially, and saturation of the disease process plays also an important role. This study shows the feasibility of simulating subclinical and epidemiological data with the same mathematical model. The approach is very general and may be extended to investigate the effects of risk factors or interventions. Moreover, it offers an interface to integrate quantitative individual health data as assessed, for example, by imaging. AU - Simonetto, C. AU - Rospleszcz, S. AU - Heier, M. AU - Meisinger, C. AU - Peters, A. AU - Kaiser, J.C. C1 - 62093 C2 - 50645 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 3299-3312 TI - Simulating the dynamics of atherosclerosis to the incidence of myocardial infarction, applied to the KORA population. JO - Stat. Med. VL - 40 IS - 14 PB - Wiley PY - 2021 SN - 0277-6715 ER - TY - JOUR AB - When addressing environmental health-related questions, most often, only observational data are collected for ethical or practical reasons. However, the lack of randomized exposure often prevents the comparison of similar groups of exposed and unexposed units. This design barrier leads the environmental epidemiology field to mainly estimate associations between environmental exposures and health outcomes. A recently developed causal inference pipeline was developed to guide researchers interested in estimating the effects of plausible hypothetical interventions for policy recommendations. This article illustrates how this multistaged pipeline can help environmental epidemiologists reconstruct and analyze hypothetical randomized experiments by investigating whether an air pollution reduction intervention decreases the risk of multiple sclerosis relapses in Alsace region, France. The epidemiology literature reports conflicted findings on the relationship between air pollution and multiple sclerosis. Some studies found significant associations, whereas others did not. Two case-crossover studies reported significant associations between the risk of multiple sclerosis relapses and the exposure to air pollutants in the Alsace region. We use the same study population as these epidemiological studies to illustrate how appealing this causal inference approach is to estimate the effects of hypothetical, but plausible, environmental interventions. AU - Sommer, A. AU - Leray, E.* AU - Lee, Y.* AU - Bind, M.A.C.* C1 - 60819 C2 - 49666 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 1321-1335 TI - Assessing environmental epidemiology questions in practice with a causal inference pipeline: An investigation of the air pollution-multiple sclerosis relapses relationship. JO - Stat. Med. VL - 40 IS - 6 PB - Wiley PY - 2021 SN - 0277-6715 ER - TY - JOUR AB - Inpatient care is a large share of total health care spending, making analysis of inpatient utilization patterns an important part of understanding what drives health care spending growth. Common features of inpatient utilization measures such as length of stay and spending include zero inflation, overdispersion, and skewness, all of which complicate statistical modeling. Moreover, latent subgroups of patients may have distinct patterns of utilization and relationships between that utilization and observed covariates. In this work, we apply and compare likelihood-based and parametric Bayesian mixtures of negative binomial and zero-inflated negative binomial regression models. In a simulation, we find that the Bayesian approach finds the true number of mixture components more accurately than using information criteria to select among likelihood-based finite mixture models. When we apply the models to data on hospital lengths of stay for patients with lung cancer, we find distinct subgroups of patients with different means and variances of hospital days, health and treatment covariates, and relationships between covariates and length of stay. AU - Kurz, C.F. AU - Hatfield, L.A.* C1 - 56572 C2 - 47151 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 4423-4435 TI - Identifying and interpreting subgroups in health care utilization data with count mixture regression models. JO - Stat. Med. VL - 38 IS - 22 PB - Wiley PY - 2019 SN - 0277-6715 ER - TY - JOUR AB - Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. The association between marker and log hazard is assumed to be linear in existing shared random effects models, with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear covariate specific associations by making use of Bayesian P-splines. Our joint models are estimated in a Bayesian framework using structured additive predictors for all model components, allowing for great flexibility in the specification of smooth nonlinear, time-varying, and random effects terms for longitudinal submodel, survival submodel, and their association. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. All methods are implemented in the R package bamlss to facilitate the application of this flexible joint model in practice. AU - Köhler, M. AU - Umlauf, N.* AU - Greven, S.* C1 - 54567 C2 - 45665 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 4771-4788 TI - Nonlinear association structures in flexible Bayesian additive joint models. JO - Stat. Med. VL - 37 IS - 30 PB - Wiley PY - 2018 SN - 0277-6715 ER - TY - JOUR AB - We describe a flexible family of tests for evaluating the goodness of fit (calibration) of a pre-specified personal risk model to the outcomes observed in a longitudinal cohort. Such evaluation involves using the risk model to assign each subject an absolute risk of developing the outcome within a given time from cohort entry and comparing subjects' assigned risks with their observed outcomes. This comparison involves several issues. For example, subjects followed only for part of the risk period have unknown outcomes. Moreover, existing tests do not reveal the reasons for poor model fit when it occurs, which can reflect misspecification of the model's hazards for the competing risks of outcome development and death. To address these issues, we extend the model-specified hazards for outcome and death, and use score statistics to test the null hypothesis that the extensions are unnecessary. Simulated cohort data applied to risk models whose outcome and mortality hazards agreed and disagreed with those generating the data show that the tests are sensitive to poor model fit, provide insight into the reasons for poor fit, and accommodate a wide range of model misspecification. We illustrate the methods by examining the calibration of two breast cancer risk models as applied to a cohort of participants in the Breast Cancer Family Registry. The methods can be implemented using the Risk Model Assessment Program, an R package freely available at http://stanford.edu/~ggong/rmap/. AU - Gong, G.* AU - Quante, A.S. AU - Terry, M.B.* AU - Whittemore, A.S.* C1 - 31225 C2 - 34228 CY - Hoboken SP - 3179-3190 TI - Assessing the goodness of fit of personal risk models. JO - Stat. Med. VL - 33 IS - 18 PB - Wiley-blackwell PY - 2014 SN - 0277-6715 ER - TY - JOUR AB - Multi-state transition models are widely applied tools to analyze individual event histories in the medical or social sciences. In this paper, we propose the use of (discrete-time) competing-risks duration models to analyze multi-transition data. Unlike conventional Markov transition models, these models allow the estimated transition probabilities to depend on the time spent in the current state. Moreover, the models can be readily extended to allow for correlated transition probabilities. A further virtue of these models is that they can be estimated using conventional regression tools for discrete-response data, such as the multinomial logit model. The latter is implemented in many statistical software packages and can be readily applied by empirical researchers. Moreover, model estimation is feasible, even when dealing with very large data sets, and simultaneously allowing for a flexible form of duration dependence and correlation between transition probabilities. We derive the likelihood function for a model with three competing target states and discuss a feasible and readily applicable estimation method. We also present the results from a simulation study, which indicate adequate performance of the proposed approach. In an empirical application, we analyze dementia patients' transition probabilities from the domestic setting, taking into account several, partly duration-dependent covariates. AU - Hess, W.* AU - Schwarzkopf, L. AU - Hunger, M. AU - Holle, R. C1 - 31354 C2 - 34534 CY - Hoboken SP - 3919-3931 TI - Competing-risks duration models with correlated random effects: An application to dementia patients' transition histories. JO - Stat. Med. VL - 33 IS - 22 PB - Wiley-Blackwell PY - 2014 SN - 0277-6715 ER - TY - JOUR AB - New prognostic models are traditionally evaluated using measures of discrimination and risk reclassification, but these do not take full account of the clinical and health economic context. We propose a framework for comparing prognostic models by quantifying the public health impact (net benefit) of the treatment decisions they support, assuming a set of predetermined clinical treatment guidelines. The change in net benefit is more clinically interpretable than changes in traditional measures and can be used in full health economic evaluations of prognostic models used for screening and allocating risk reduction interventions. We extend previous work in this area by quantifying net benefits in life years, thus linking prognostic performance to health economic measures; by taking full account of the occurrence of events over time; and by considering estimation and cross-validation in a multiple-study setting. The method is illustrated in the context of cardiovascular disease risk prediction using an individual participant data meta-analysis. We estimate the number of cardiovascular-disease-free life years gained when statin treatment is allocated based on a risk prediction model with five established risk factors instead of a model with just age, gender and region. We explore methodological issues associated with the multistudy design and show that cost-effectiveness comparisons based on the proposed methodology are robust against a range of modelling assumptions, including adjusting for competing risks. AU - Rapsomaniki, E.* AU - White, I.R.* AU - Wood, A.M.* AU - Thompson, S.G.* AU - Emerging Risk Factors Collaboration (Döring, A. AU - Meisinger, C.) C1 - 7317 C2 - 29679 SP - 114-130 TI - A framework for quantifying net benefits of alternative prognostic models. JO - Stat. Med. VL - 31 IS - 2 PB - Wiley-Blackwell PY - 2012 SN - 0277-6715 ER - TY - JOUR AB - We study the link between two quality measures of SNP (single nucleotide polymorphism) data in genome-wide association (GWA) studies, that is, per SNP call rates (CR) and p-values for testing Hardy-Weinberg equilibrium (HWE). The aim is to improve these measures by applying methods based on realized randomized p-values, the false discovery rate and estimates for the proportion of false hypotheses. While exact non-randomized conditional p-values for testing HWE cannot be recommended for estimating the proportion of false hypotheses, their realized randomized counterparts should be used. P-values corresponding to the asymptotic unconditional chi-square test lead to reasonable estimates only if SNPs with low minor allele frequency are excluded. We provide an algorithm to compute the probability that SNPs violate HWE given the observed CR, which yields an improved measure of data quality. The proposed methods are applied to SNP data from the KORA (Cooperative Health Research in the Region of Augsburg, Southern Germany) 500 K project, a GWA study in a population-based sample genotyped by Affymetrix GeneChip 500 K arrays using the calling algorithm BRLMM 1.4.0. We show that all SNPs with CR = 100 per cent are nearly in perfect HWE which militates in favor of the population to meet the conditions required for HWE at least for these SNPs. Moreover, we show that the proportion of SNPs not being in HWE increases with decreasing CR. We conclude that using a single threshold for judging HWE p-values without taking the CR into account is problematic. Instead we recommend a stratified analysis with respect to CR. AU - Finner, H.* AU - Strassburger, K.* AU - Heid, I.M. AU - Herder, C.* AU - Rathmann, W.* AU - Giani, G.* AU - Dickhaus, T.* AU - Lichtner, P. AU - Meitinger, T. AU - Wichmann, H.-E. AU - Illig, T. AU - Gieger, C. C1 - 5644 C2 - 27422 SP - 2347-2358 TI - How to link call rate and p-values for Hardy-Weinberg equilibrium as measures of genome-wide SNP data quality. JO - Stat. Med. VL - 29 IS - 22 PB - Wiley-Blackwell PY - 2010 SN - 0277-6715 ER - TY - JOUR AU - Barnett, A.G.* AU - Dobson, A.J.* AU - WHO MONICA Project (Löwel, H. AU - Hörmann, A. AU - Gostomzyk, J.G. AU - Bolte, H.-D.) C1 - 2994 C2 - 22408 SP - 3505-3523 TI - Estimating trends and seasonality in coronary heart disease. JO - Stat. Med. VL - 23 PB - Wiley PY - 2004 SN - 0277-6715 ER - TY - JOUR AU - Kaiser, J.C. AU - Heidenreich, W.F. C1 - 2976 C2 - 22162 SP - 3333-3350 TI - Comparing regression methods for the two-stage clonal expansion model of carcinogenesis. JO - Stat. Med. VL - 23 PB - Wiley PY - 2004 SN - 0277-6715 ER - TY - JOUR AU - Heid, I.M. AU - Küchenhoff, H.* AU - Wellmann, J. AU - Gerken, M. AU - Kreienbrock, L.* AU - Wichmann, H.-E. C1 - 9854 C2 - 20668 SP - 3261-3278 TI - On the potential of measurement error to induce differential bias on odds ratio estimates : An example from radon epidemiology. JO - Stat. Med. VL - 21 PB - Wiley PY - 2002 SN - 0277-6715 ER - TY - JOUR AB - The two-step clonal expansion (TSCE) model is applied to large case-control studies, frequency matched for age, which allow estimation of the RR of lung tumour risk caused by smoking. For estimating background hazard rates, mortality data from the study areas are used to supplement the case-control data. Two approaches are used to analyse the data, based on the unconditional and the conditional likelihoods. They are demonstrated to give nearly identical results. Some model diagnostics are performed and demonstrate a good model fit. Our results indicate that smoking acts on the promotion and transformation parameters, but not on the initiation parameter of the TSCE model. The fitted relative risk of current smokers peaks between ages 50 and 60 years. The relative risk of male ex-smokers decreases strongly with time since end of exposure, but does not reach the risk of non-smokers, and does not decrease as much as for female ex-smokers. AU - Heidenreich, W.F. AU - Wellmann, J. AU - Jacob, P. AU - Wichmann, H.-E. C1 - 9853 C2 - 20339 SP - 3055-3070 TI - Mechanistic modelling in large case-control studies of lung cancer risk from smoking. JO - Stat. Med. VL - 21 PB - Wiley PY - 2002 SN - 0277-6715 ER - TY - JOUR AU - Hauptmann, M. AU - Lubin, J.H.* AU - Rosenberg, P.* AU - Wellmann, J.* AU - Kreienbrock, L.* C1 - 21579 C2 - 19705 SP - 2185-2194 TI - The use sliding time windows for the exploratory analysis of temporal effects of smoking histories on lung cancer risk. JO - Stat. Med. VL - 19 PB - Wiley PY - 2000 SN - 0277-6715 ER - TY - JOUR AU - Reitmeir, P. AU - Wassmer, G.* C1 - 21224 C2 - 19323 SP - 3453-3462 TI - Resampling-based methods for the analysis of multiple endpoints in clinical trials. JO - Stat. Med. VL - 18 PY - 1999 SN - 0277-6715 ER - TY - JOUR AU - Lehmacher, W. C1 - 18832 C2 - 11952 TI - Analysis of the Crossover Design in the Presence of Residual Effects. JO - Stat. Med. VL - 10 PY - 1991 SN - 0277-6715 ER - TY - JOUR AB - When a residual effect is suspected in a two-period crossover trial, an analysis of the first-period data is often chosen instead of the potentially biased crossover analysis. This paper indicates how the usual crossover test has to be interpreted correctly and that its bias has two different consequences, namely a conservative or a liberal test decision if a positive (carryover) or a negative (withdrawal) residual effect exists. A multiple testing procedure is presented allowing for simultaneous crossover and first-period analysis controlling the experimental error rate. This procedure together with the correct interpretation of the crossover test enables many useful applications of crossover designs. AU - Lehmacher, W. C1 - 40799 C2 - 38883 SP - 891-899 TI - Analysis of the crossover design in the presence of residual effects. JO - Stat. Med. VL - 10 IS - 6 PY - 1991 SN - 0277-6715 ER -