TY  - JOUR
AB  - The pattern graph framework solves a wide range of missing data problems with nonignorable mechanisms. However, it faces two challenges of assessability and interpretability, particularly important in safety-critical problems such as clinical diagnosis: (i) How can one assess the validity of the framework's a&nbsp;priori assumption and make necessary adjustments to accommodate known information about the problem? (ii) How can one interpret the process of exponential tilting used for sensitivity analysis in the pattern graph framework and choose the tilt perturbations based on meaningful real-world quantities? In this paper, we introduce Informed Sensitivity Analysis, an extension of the pattern graph framework that enables us to incorporate substantive knowledge about the missingness mechanism into the pattern graph framework. Our extension allows us to examine the validity of assumptions underlying pattern graphs and interpret sensitivity analysis results in terms of realistic problem characteristics. We apply our method to a prevalent nonignorable missing data scenario in clinical research. We validate and compare our method's&nbsp;results of our method with a number of widely-used missing data methods, including Unweighted CCA, KNN Imputer, MICE, and MissForest. The validation is done using both boot-strapped simulated experiments as well as real-world clinical observations in the MIMIC-III public dataset.
AU  - Zamanian, A.*
AU  - Ahmidi, N.
AU  - Drton, M.*
C1  - 68199
C2  - 54831
CY  - 111 River St, Hoboken 07030-5774, Nj Usa
SP  - 5419-5450
TI  - Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms.
JO  - Stat. Med.
VL  - 42
IS  - 29
PB  - Wiley
PY  - 2023
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host-associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host-related features and amplicon-derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB-RRR) and negative binomial co-sparse factor regression (NB-FAR). While NB-RRR encodes the underlying dependency among the microbial abundances as outcomes and the host-associated features as predictors through a rank-constrained coefficient matrix, NB-FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit-rank components of the coefficient matrix sequentially, effectively delivering interpretable bi-clusters of taxa and host-associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block-wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.
AU  - Mishra, A.K.*
AU  - Müller, C.L.
C1  - 64867
C2  - 52569
SP  - 2786-2803
TI  - Negative binomial factor regression with application to microbiome data analysis.
JO  - Stat. Med.
VL  - 41
IS  - 15
PY  - 2022
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - Analyzing epidemiological data with simplified mathematical models of disease development provides a link between the time-course of incidence and the underlying biological processes. Here we point out that considerable modeling flexibility is gained if the model is solved by simulation only. To this aim, a model of atherosclerosis is proposed: a Markov Chain with continuous state space which represents the coronary artery intimal surface area involved with atherosclerotic lesions of increasing severity. Myocardial infarction rates are assumed to be proportional to the area of most severe lesions. The model can be fitted simultaneously to infarction incidence rates observed in the KORA registry, and to the age-dependent prevalence and extent of atherosclerotic lesions in the PDAY study. Moreover, the simulation approach allows for non-linear transition rates, and to consider at the same time randomness and inter-individual heterogeneity. Interestingly, the fit revealed significant age dependence of parameters in females around the age of menopause, qualitatively reproducing the known vascular effects of female sex hormones. For males, the incidence curve flattens for higher ages. According to the model, frailty explains this flattening only partially, and saturation of the disease process plays also an important role. This study shows the feasibility of simulating subclinical and epidemiological data with the same mathematical model. The approach is very general and may be extended to investigate the effects of risk factors or interventions. Moreover, it offers an interface to integrate quantitative individual health data as assessed, for example, by imaging.
AU  - Simonetto, C.
AU  - Rospleszcz, S.
AU  - Heier, M.
AU  - Meisinger, C.
AU  - Peters, A.
AU  - Kaiser, J.C.
C1  - 62093
C2  - 50645
CY  - 111 River St, Hoboken 07030-5774, Nj Usa
SP  - 3299-3312
TI  - Simulating the dynamics of atherosclerosis to the incidence of myocardial infarction, applied to the KORA population.
JO  - Stat. Med.
VL  - 40
IS  - 14
PB  - Wiley
PY  - 2021
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - When addressing environmental health-related questions, most often, only observational data are collected for ethical or practical reasons. However, the lack of randomized exposure often prevents the comparison of similar groups of exposed and unexposed units. This design barrier leads the environmental epidemiology field to mainly estimate associations between environmental exposures and health outcomes. A recently developed causal inference pipeline was developed to guide researchers interested in estimating the effects of plausible hypothetical interventions for policy recommendations. This article illustrates how this multistaged pipeline can help environmental epidemiologists reconstruct and analyze hypothetical randomized experiments by investigating whether an air pollution reduction intervention decreases the risk of multiple sclerosis relapses in Alsace region, France. The epidemiology literature reports conflicted findings on the relationship between air pollution and multiple sclerosis. Some studies found significant associations, whereas others did not. Two case-crossover studies reported significant associations between the risk of multiple sclerosis relapses and the exposure to air pollutants in the Alsace region. We use the same study population as these epidemiological studies to illustrate how appealing this causal inference approach is to estimate the effects of hypothetical, but plausible, environmental interventions.
AU  - Sommer, A.
AU  - Leray, E.*
AU  - Lee, Y.*
AU  - Bind, M.A.C.*
C1  - 60819
C2  - 49666
CY  - 111 River St, Hoboken 07030-5774, Nj Usa
SP  - 1321-1335
TI  - Assessing environmental epidemiology questions in practice with a causal inference pipeline: An investigation of the air pollution-multiple sclerosis relapses relationship.
JO  - Stat. Med.
VL  - 40
IS  - 6
PB  - Wiley
PY  - 2021
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - Inpatient care is a large share of total health care spending, making analysis of inpatient utilization patterns an important part of understanding what drives health care spending growth. Common features of inpatient utilization measures such as length of stay and spending include zero inflation, overdispersion, and skewness, all of which complicate statistical modeling. Moreover, latent subgroups of patients may have distinct patterns of utilization and relationships between that utilization and observed covariates. In this work, we apply and compare likelihood-based and parametric Bayesian mixtures of negative binomial and zero-inflated negative binomial regression models. In a simulation, we find that the Bayesian approach finds the true number of mixture components more accurately than using information criteria to select among likelihood-based finite mixture models. When we apply the models to data on hospital lengths of stay for patients with lung cancer, we find distinct subgroups of patients with different means and variances of hospital days, health and treatment covariates, and relationships between covariates and length of stay.
AU  - Kurz, C.F.
AU  - Hatfield, L.A.*
C1  - 56572
C2  - 47151
CY  - 111 River St, Hoboken 07030-5774, Nj Usa
SP  - 4423-4435
TI  - Identifying and interpreting subgroups in health care utilization data with count mixture regression models.
JO  - Stat. Med.
VL  - 38
IS  - 22
PB  - Wiley
PY  - 2019
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. The association between marker and log hazard is assumed to be linear in existing shared random effects models, with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear covariate specific associations by making use of Bayesian P-splines. Our joint models are estimated in a Bayesian framework using structured additive predictors for all model components, allowing for great flexibility in the specification of smooth nonlinear, time-varying, and random effects terms for longitudinal submodel, survival submodel, and their association. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. All methods are implemented in the R package bamlss to facilitate the application of this flexible joint model in practice.
AU  - Köhler, M.
AU  - Umlauf, N.*
AU  - Greven, S.*
C1  - 54567
C2  - 45665
CY  - 111 River St, Hoboken 07030-5774, Nj Usa
SP  - 4771-4788
TI  - Nonlinear association structures in flexible Bayesian additive joint models.
JO  - Stat. Med.
VL  - 37
IS  - 30
PB  - Wiley
PY  - 2018
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - We describe a flexible family of tests for evaluating the goodness of fit (calibration) of a pre-specified personal risk model to the outcomes observed in a longitudinal cohort. Such evaluation involves using the risk model to assign each subject an absolute risk of developing the outcome within a given time from cohort entry and comparing subjects&#39; assigned risks with their observed outcomes. This comparison involves several issues. For example, subjects followed only for part of the risk period have unknown outcomes. Moreover, existing tests do not reveal the reasons for poor model fit when it occurs, which can reflect misspecification of the model&#39;s hazards for the competing risks of outcome development and death. To address these issues, we extend the model-specified hazards for outcome and death, and use score statistics to test the null hypothesis that the extensions are unnecessary. Simulated cohort data applied to risk models whose outcome and mortality hazards agreed and disagreed with those generating the data show that the tests are sensitive to poor model fit, provide insight into the reasons for poor fit, and accommodate a wide range of model misspecification. We illustrate the methods by examining the calibration of two breast cancer risk models as applied to a cohort of participants in the Breast Cancer Family Registry. The methods can be implemented using the Risk Model Assessment Program, an R package freely available at http://stanford.edu/~ggong/rmap/.
AU  - Gong, G.*
AU  - Quante, A.S.
AU  - Terry, M.B.*
AU  - Whittemore, A.S.*
C1  - 31225
C2  - 34228
CY  - Hoboken
SP  - 3179-3190
TI  - Assessing the goodness of fit of personal risk models.
JO  - Stat. Med.
VL  - 33
IS  - 18
PB  - Wiley-blackwell
PY  - 2014
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - Multi-state transition models are widely applied tools to analyze individual event histories in the medical or social sciences. In this paper, we propose the use of (discrete-time) competing-risks duration models to analyze multi-transition data. Unlike conventional Markov transition models, these models allow the estimated transition probabilities to depend on the time spent in the current state. Moreover, the models can be readily extended to allow for correlated transition probabilities. A further virtue of these models is that they can be estimated using conventional regression tools for discrete-response data, such as the multinomial logit model. The latter is implemented in many statistical software packages and can be readily applied by empirical researchers. Moreover, model estimation is feasible, even when dealing with very large data sets, and simultaneously allowing for a flexible form of duration dependence and correlation between transition probabilities. We derive the likelihood function for a model with three competing target states and discuss a feasible and readily applicable estimation method. We also present the results from a simulation study, which indicate adequate performance of the proposed approach. In an empirical application, we analyze dementia patients&#39; transition probabilities from the domestic setting, taking into account several, partly duration-dependent covariates.
AU  - Hess, W.*
AU  - Schwarzkopf, L.
AU  - Hunger, M.
AU  - Holle, R.
C1  - 31354
C2  - 34534
CY  - Hoboken
SP  - 3919-3931
TI  - Competing-risks duration models with correlated random effects: An application to dementia patients&#39; transition histories.
JO  - Stat. Med.
VL  - 33
IS  - 22
PB  - Wiley-Blackwell
PY  - 2014
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - New prognostic models are traditionally evaluated using measures of discrimination and risk reclassification, but these do not take full account of the clinical and health economic context. We propose a framework for comparing prognostic models by quantifying the public health impact (net benefit) of the treatment decisions they support, assuming a set of predetermined clinical treatment guidelines. The change in net benefit is more clinically interpretable than changes in traditional measures and can be used in full health economic evaluations of prognostic models used for screening and allocating risk reduction interventions. We extend previous work in this area by quantifying net benefits in life years, thus linking prognostic performance to health economic measures; by taking full account of the occurrence of events over time; and by considering estimation and cross-validation in a multiple-study setting. The method is illustrated in the context of cardiovascular disease risk prediction using an individual participant data meta-analysis. We estimate the number of cardiovascular-disease-free life years gained when statin treatment is allocated based on a risk prediction model with five established risk factors instead of a model with just age, gender and region. We explore methodological issues associated with the multistudy design and show that cost-effectiveness comparisons based on the proposed methodology are robust against a range of modelling assumptions, including adjusting for competing risks.
AU  - Rapsomaniki, E.*
AU  - White, I.R.*
AU  - Wood, A.M.*
AU  - Thompson, S.G.*
AU  - Emerging Risk Factors Collaboration (Döring, A.
AU  - Meisinger, C.)
C1  - 7317
C2  - 29679
SP  - 114-130
TI  - A framework for quantifying net benefits of alternative prognostic models.
JO  - Stat. Med.
VL  - 31
IS  - 2
PB  - Wiley-Blackwell
PY  - 2012
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - We study the link between two quality measures of SNP (single nucleotide polymorphism) data in genome-wide association (GWA) studies, that is, per SNP call rates (CR) and p-values for testing Hardy-Weinberg equilibrium (HWE). The aim is to improve these measures by applying methods based on realized randomized p-values, the false discovery rate and estimates for the proportion of false hypotheses. While exact non-randomized conditional p-values for testing HWE cannot be recommended for estimating the proportion of false hypotheses, their realized randomized counterparts should be used. P-values corresponding to the asymptotic unconditional chi-square test lead to reasonable estimates only if SNPs with low minor allele frequency are excluded. We provide an algorithm to compute the probability that SNPs violate HWE given the observed CR, which yields an improved measure of data quality. The proposed methods are applied to SNP data from the KORA (Cooperative Health Research in the Region of Augsburg, Southern Germany) 500&thinsp;K project, a GWA study in a population-based sample genotyped by Affymetrix GeneChip 500&thinsp;K arrays using the calling algorithm BRLMM 1.4.0. We show that all SNPs with CR = 100 per cent are nearly in perfect HWE which militates in favor of the population to meet the conditions required for HWE at least for these SNPs. Moreover, we show that the proportion of SNPs not being in HWE increases with decreasing CR. We conclude that using a single threshold for judging HWE p-values without taking the CR into account is problematic. Instead we recommend a stratified analysis with respect to CR.
AU  - Finner, H.*
AU  - Strassburger, K.*
AU  - Heid, I.M.
AU  - Herder, C.*
AU  - Rathmann, W.*
AU  - Giani, G.*
AU  - Dickhaus, T.*
AU  - Lichtner, P.
AU  - Meitinger, T.
AU  - Wichmann, H.-E.
AU  - Illig, T.
AU  - Gieger, C.
C1  - 5644
C2  - 27422
SP  - 2347-2358
TI  - How to link call rate and p-values for Hardy-Weinberg equilibrium as measures of genome-wide SNP data quality.
JO  - Stat. Med.
VL  - 29
IS  - 22
PB  - Wiley-Blackwell
PY  - 2010
SN  - 0277-6715
ER  - 

TY  - JOUR
AU  - Barnett, A.G.*
AU  - Dobson, A.J.*
AU  - WHO MONICA Project (Löwel, H.
AU  - Hörmann, A.
AU  - Gostomzyk, J.G.
AU  - Bolte, H.-D.)
C1  - 2994
C2  - 22408
SP  - 3505-3523
TI  - Estimating trends and seasonality in coronary heart disease.
JO  - Stat. Med.
VL  - 23
PB  - Wiley
PY  - 2004
SN  - 0277-6715
ER  - 

TY  - JOUR
AU  - Kaiser, J.C.
AU  - Heidenreich, W.F.
C1  - 2976
C2  - 22162
SP  - 3333-3350
TI  - Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.
JO  - Stat. Med.
VL  - 23
PB  - Wiley
PY  - 2004
SN  - 0277-6715
ER  - 

TY  - JOUR
AU  - Heid, I.M.
AU  - Küchenhoff, H.*
AU  - Wellmann, J.
AU  - Gerken, M.
AU  - Kreienbrock, L.*
AU  - Wichmann, H.-E.
C1  - 9854
C2  - 20668
SP  - 3261-3278
TI  - On the potential of measurement error to induce differential bias on odds ratio estimates : An example from radon epidemiology.
JO  - Stat. Med.
VL  - 21
PB  - Wiley
PY  - 2002
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - The two-step clonal expansion (TSCE) model is applied to large case-control studies, frequency matched for age, which allow estimation of the RR of lung tumour risk caused by smoking. For estimating background hazard rates, mortality data from the study areas are used to supplement the case-control data. Two approaches are used to analyse the data, based on the unconditional and the conditional likelihoods. They are demonstrated to give nearly identical results. Some model diagnostics are performed and demonstrate a good model fit. Our results indicate that smoking acts on the promotion and transformation parameters, but not on the initiation parameter of the TSCE model. The fitted relative risk of current smokers peaks between ages 50 and 60 years. The relative risk of male ex-smokers decreases strongly with time since end of exposure, but does not reach the risk of non-smokers, and does not decrease as much as for female ex-smokers.
AU  - Heidenreich, W.F.
AU  - Wellmann, J.
AU  - Jacob, P.
AU  - Wichmann, H.-E.
C1  - 9853
C2  - 20339
SP  - 3055-3070
TI  - Mechanistic modelling in large case-control studies of lung cancer risk from smoking.
JO  - Stat. Med.
VL  - 21
PB  - Wiley
PY  - 2002
SN  - 0277-6715
ER  - 

TY  - JOUR
AU  - Hauptmann, M.
AU  - Lubin, J.H.*
AU  - Rosenberg, P.*
AU  - Wellmann, J.*
AU  - Kreienbrock, L.*
C1  - 21579
C2  - 19705
SP  - 2185-2194
TI  - The use sliding time windows for the exploratory analysis of temporal effects of smoking histories on lung cancer risk.
JO  - Stat. Med.
VL  - 19
PB  - Wiley
PY  - 2000
SN  - 0277-6715
ER  - 

TY  - JOUR
AU  - Reitmeir, P.
AU  - Wassmer, G.*
C1  - 21224
C2  - 19323
SP  - 3453-3462
TI  - Resampling-based methods for the analysis of multiple endpoints in clinical trials.
JO  - Stat. Med.
VL  - 18
PY  - 1999
SN  - 0277-6715
ER  - 

TY  - JOUR
AU  - Lehmacher, W.
C1  - 18832
C2  - 11952
TI  - Analysis of the Crossover Design in the Presence of Residual Effects.
JO  - Stat. Med.
VL  - 10
PY  - 1991
SN  - 0277-6715
ER  - 

TY  - JOUR
AB  - When a residual effect is suspected in a two-period crossover trial, an analysis of the first-period data is often chosen instead of the potentially biased crossover analysis. This paper indicates how the usual crossover test has to be interpreted correctly and that its bias has two different consequences, namely a conservative or a liberal test decision if a positive (carryover) or a negative (withdrawal) residual effect exists. A multiple testing procedure is presented allowing for simultaneous crossover and first-period analysis controlling the experimental error rate. This procedure together with the correct interpretation of the crossover test enables many useful applications of crossover designs.
AU  - Lehmacher, W.
C1  - 40799
C2  - 38883
SP  - 891-899
TI  - Analysis of the crossover design in the presence of residual effects.
JO  - Stat. Med.
VL  - 10
IS  - 6
PY  - 1991
SN  - 0277-6715
ER  -