TY - JOUR AB - Inflation in genome-wide association studies (GWAS) summary statistics represents a major challenge, for which correction methods have been developed. These include the genomic control (GC) method, which uses the λ-value to correct summary statistics, and the linkage disequilibrium score regression (LDSR) method, which uses the LDSR intercept. By using type 2 diabetes (T2D) as an exemplar, we explore factors influencing λ-values and the impact of these corrections on association signals. We find that larger sample sizes increase λ-values due to increased captured polygenicity, while including lower frequency variants decreases λ-values due to reduced power. Comparing T2D genetic associations described in overlapping GWAS meta-analyses of increasing sample size, we find that GC correction reduces the false positive rate and leads to the loss of robust associations. In one of the largest meta-analysis, GC correction results in 39.7% loss of independent loci, substantially reducing the number of detected associations. In comparison, the LDSR intercept correction leads to a loss of up to 25.2% of the independent loci, being therefore less conservative than the GC correction. We conclude that in large, well-powered GWAS meta-analysis of polygenic traits, both GC and LDSR intercept correction leads to power loss, highlighting the need for improved genomic inflation correction methods. AU - Singh, A. AU - Southam, L. AU - Hatzikotoulas, K. AU - Rayner, N.W. AU - Suzuki, K.* AU - Taylor, H.J.* AU - Yin, X.* AU - Mandla, R.* AU - Huerta-Chagoya, A.* AU - Morris, A.P. AU - Zeggini, E. AU - Bocher, O. C1 - 75311 C2 - 57914 CY - 111 River St, Hoboken 07030-5774, Nj Usa TI - Correcting for genomic inflation leads to loss of power in large-scale Genome-wide association study meta-analysis. JO - Genet. Epidemiol. VL - 49 IS - 6 PB - Wiley PY - 2025 SN - 0741-0395 ER - TY - JOUR AB - The introduction of Next-Generation Sequencing technologies in the clinics has improved rare disease diagnosis. Nonetheless, for very heterogeneous or very rare diseases, more than half of cases still lack molecular diagnosis. Novel strategies are needed to prioritize variants within a single individual. The Population Sampling Probability (PSAP) method was developed to meet this aim but only for coding variants in exome data. Here, we propose an extension of the PSAP method to the non-coding genome called PSAP-genomic-regions. In this extension, instead of considering genes as testing units (PSAP-genes strategy), we use genomic regions defined over the whole genome that pinpoint potential functional constraints. We conceived an evaluation protocol for our method using artificially generated disease exomes and genomes, by inserting coding and non-coding pathogenic ClinVar variants in large data sets of exomes and genomes from the general population. PSAP-genomic-regions significantly improves the ranking of these variants compared to using a pathogenicity score alone. Using PSAP-genomic-regions, more than 50% of non-coding ClinVar variants were among the top 10 variants of the genome. On real sequencing data from six patients with Cerebral Small Vessel Disease and nine patients with male infertility, all causal variants were ranked in the top 100 variants with PSAP-genomic-regions. By revisiting the testing units used in the PSAP method to include non-coding variants, we have developed PSAP-genomic-regions, an efficient whole-genome prioritization tool which offers promising results for the diagnosis of unresolved rare diseases. AU - Ogloblinsky, M.C.* AU - Bocher, O. AU - Aloui, C.* AU - Leutenegger, A.L.* AU - Ozisik, O.* AU - Baudot, A.* AU - Tournier-Lasserve, E.* AU - Castillo-Madeen, H.* AU - Lewinsohn, D.* AU - Conrad, D.F.* AU - Genin, E.* AU - Marenne, G.* C1 - 71835 C2 - 56450 CY - 111 River St, Hoboken 07030-5774, Nj Usa TI - PSAP-Genomic-Regions: A method leveraging population data to prioritize coding and non-coding variants in whole genome sequencing for rare Disease diagnosis. JO - Genet. Epidemiol. PB - Wiley PY - 2024 SN - 0741-0395 ER - TY - JOUR AB - Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages. AU - Bocher, O. AU - Marenne, G.* AU - Genin, E.* AU - Perdry, H.* C1 - 67818 C2 - 54296 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 450-460 TI - Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes. JO - Genet. Epidemiol. VL - 47 IS - 6 PB - Wiley PY - 2023 SN - 0741-0395 ER - TY - JOUR AB - It is still unclear how genetic information, provided as single-nucleotide polymorphisms (SNPs), can be most effectively integrated into risk prediction models for coronary heart disease (CHD) to add significant predictive value beyond clinical risk models. For the present study, a population-based case-cohort was used as a trainingset (451 incident cases, 1488 noncases) and an independent cohort as testset (160 incident cases, 2749 noncases). The following strategies to quantify genetic information were compared: A weighted genetic risk score including Metabochip SNPs associated with CHD in the literature (GRSMetabo ); selection of the most predictive SNPs among these literature-confirmed variants using priority-Lasso (PLMetabo ); validation of two comprehensive polygenic risk scores: GRSGola based on Metabochip data, and GRSKhera (available in the testset only) based on cross-validated genome-wide genotyping data. We used Cox regression to assess associations with incident CHD. C-index, category-free net reclassification index (cfNRI) and relative integrated discrimination improvement (IDIrel ) were used to quantify the predictive performance of genetic information beyond Framingham risk score variables. In contrast to GRSMetabo and PLMetabo , GRSGola significantly improved the prediction (delta C-index [95% confidence interval]: 0.0087 [0.0044, 0.0130]; IDIrel : 0.0509 [0.0131, 0.0894]; cfNRI improved only in cases: 0.1761 [0.0253, 0.3219]). GRSKhera yielded slightly worse prediction results than GRSGola . AU - Bauer, A. AU - Zierer, A. AU - Gieger, C. AU - Büyüközkan, M. AU - Müller-Nurasyid, M. AU - Grallert, H. AU - Meisinger, C. AU - Strauch, K. AU - Prokisch, H. AU - Roden, M.* AU - Peters, A. AU - Krumsiek, J. AU - Herder, C. AU - Koenig, W.* AU - Thorand, B. AU - Huth, C. C1 - 62200 C2 - 50722 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 633-650 TI - Comparison of genetic risk prediction models to improve prediction of coronary heart disease in two large cohorts of the MONICA/KORA study. JO - Genet. Epidemiol. VL - 45 IS - 6 PB - Wiley PY - 2021 SN - 0741-0395 ER - TY - JOUR AB - Clinical trial results have recently demonstrated that inhibiting inflammation by targeting the interleukin-1 beta pathway can offer a significant reduction in lung cancer incidence and mortality, highlighting a pressing and unmet need to understand the benefits of inflammation-focused lung cancer therapies at the genetic level. While numerous genome-wide association studies (GWAS) have explored the genetic etiology of lung cancer, there remains a large gap between the type of information that may be gleaned from an association study and the depth of understanding necessary to explain and drive translational findings. Thus, in this study we jointly model and integrate extensive multiomics data sources, utilizing a total of 40 genome-wide functional annotations that augment previously published results from the International Lung Cancer Consortium (ILCCO) GWAS, to prioritize and characterize single nucleotide polymorphisms (SNPs) that increase risk of squamous cell lung cancer through the inflammatory and immune responses. Our work bridges the gap between correlative analysis and translational follow-up research, refining GWAS association measures in an interpretable and systematic manner. In particular, reanalysis of the ILCCO data highlights the impact of highly associated SNPs from nuclear factor-kappa B signaling pathway genes as well as major histocompatibility complex mediated variation in immune responses. One consequence of prioritizing likely functional SNPs is the pruning of variants that might be selected for follow-up work by over an order of magnitude, from potentially tens of thousands to hundreds. The strategies we introduce provide informative and interpretable approaches for incorporating extensive genome-wide annotation data in analysis of genetic association studies. AU - Sun, R.* AU - Xu, M.* AU - Li, X.* AU - Gaynor, S.* AU - Zhou, H.* AU - Li, Z.* AU - Bossé, Y.* AU - Lam, S.* AU - Tsao, M.S.* AU - Tardón, A.* AU - Chen, C.* AU - Doherty, J.* AU - Goodman, G.* AU - Bojesen, S.E.* AU - Landi, M.T.* AU - Johansson, M.* AU - Field, J.K.* AU - Bickeböller, H.* AU - Wichmann, H.-E. AU - Risch, A.* AU - Rennert, G.* AU - Arnold, S.* AU - Wu, X.* AU - Melander, O.* AU - Brunnström, H.* AU - Le Marchand, L.* AU - Liu, G.* AU - Andrew, A.* AU - Duell, E.* AU - Kiemeney, L.A.* AU - Shen, H.* AU - Haugen, A.* AU - Grankvist, K.* AU - Caporaso, N.* AU - Woll, P.* AU - Dawn Teare, M.* AU - Scelo, G.* AU - Hong, Y.C.* AU - Yuan, J.M.* AU - Lazarus, P.* AU - Schabath, M.B.* AU - Aldrich, M.C.* AU - Albanes, D.* AU - Mak, R.* AU - Barbie, D.* AU - Brennan, P.* AU - Hung, R.J.* AU - Amos, C.I.* AU - Christiani, D.C.* AU - Lin, X.* C1 - 60081 C2 - 49227 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 99-114 TI - Integration of multiomic annotation data to prioritize and characterize inflammation and immune-related risk variants in squamous cell lung cancer. JO - Genet. Epidemiol. VL - 45 IS - 1 PB - Wiley PY - 2021 SN - 0741-0395 ER - TY - JOUR AB - © 2019 Wiley Periodicals, Inc. Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree-based approach to call germline CNVs from whole-genome sequencing (WGS, >18x) variant call sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. Eighty-one percent of detected events have been previously reported in the Database of Genomic Variants. Twenty-three percent of high-quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe complex CNV patterns underlying an association with levels of the CCL3 protein (MAF = 0.15, p = 3.6x10−12) at the CCL3L3 locus, and a novel cis-association between a low-frequency NOMO1 deletion and NOMO1 protein levels (MAF = 0.02, p = 2.2x10−7). This study demonstrates that existing population-wide WGS call sets can be mined for germline CNVs with minimal computational overhead, delivering insight into a less well-studied, yet potentially impactful class of genetic variant. AU - Png, G. AU - Suveges, D.* AU - Park, Y.C.* AU - Walter, K.* AU - Kundu, K.* AU - Ntalla, I.* AU - Tsafantakis, E.* AU - Karaleftheri, M.* AU - Dedoussis, G.* AU - Zeggini, E. AU - Gilly, A. C1 - 56904 C2 - 47266 SP - 79-89 TI - Population-wide copy number variation calling using variant call format files from 6,898 individuals. JO - Genet. Epidemiol. VL - 44 IS - 1 PY - 2020 SN - 0741-0395 ER - TY - JOUR AU - Müller-Nurasyid, M. AU - Schramm, K. AU - Heier, M. AU - Pietzner, M.* AU - Budde, K.* AU - Adamski, J. AU - Gieger, C. AU - Suhre, K.* AU - Kastenmüller, G. AU - Strauch, K. C1 - 54618 C2 - 45711 CY - 111 River St, Hoboken 07030-5774, Nj Usa SP - 719-720 TI - Pharmacogenetic effects in population-based metabolic profiles. JO - Genet. Epidemiol. VL - 42 IS - 7 PB - Wiley PY - 2018 SN - 0741-0395 ER - TY - JOUR AB - Myopia is the largest cause of uncorrected visual impairments globally and its recent dramatic increase in the population has made it a major public health problem. In observational studies, educational attainment has been consistently reported to be correlated to myopia. Nonetheless, correlation does not imply causation. Observational studies do not tell us if education causes myopia or if instead there are confounding factors underlying the association. In this work, we use a two-step least squares instrumental-variable (IV) approach to estimate the causal effect of education on refractive error, specifically myopia. We used the results from the educational attainment GWAS from the Social Science Genetic Association Consortium to define a polygenic risk score (PGRS) in three cohorts of late middle age and elderly Caucasian individuals (N = 5,649). In a meta-analysis of the three cohorts, using the PGRS as an IV, we estimated that each z-score increase in education (approximately 2 years of education) results in a reduction of 0.92 ± 0.29 diopters (P = 1.04 × 10-3). Our estimate of the effect of education on myopia was higher (P = 0.01) than the observed estimate (0.25 ± 0.03 diopters reduction per education z-score [∼2 years] increase). This suggests that observational studies may actually underestimate the true effect. Our Mendelian Randomization (MR) analysis provides new evidence for a causal role of educational attainment on refractive error. AU - Cuellar-Partida, G.* AU - Lu, Y.* AU - Kho, P.F.* AU - Hewitt, A.W.* AU - Wichmann, H.-E. AU - Yazar, S.* AU - Stambolian, D.* AU - Bailey-Wilson, J.E.* AU - Wojciechowski, R.* AU - Wang, J.J.* AU - Mitchell, P.* AU - Mackey, D.A.* AU - MacGregor, S.* C1 - 47566 C2 - 39444 SP - 66-72 TI - Assessing the genetic predisposition of education on myopia: A mendelian randomization study. JO - Genet. Epidemiol. VL - 40 IS - 1 PY - 2016 SN - 0741-0395 ER - TY - JOUR AB - Diseases often cooccur in individuals more often than expected by chance, and may be explained by shared underlying genetic etiology. A common approach to genetic overlap analyses is to use summary genome-wide association study data to identify single-nucleotide polymorphisms (SNPs) that are associated with multiple traits at a selected P-value threshold. However, P-values do not account for differences in power, whereas Bayes' factors (BFs) do, and may be approximated using summary statistics. We use simulation studies to compare the power of frequentist and Bayesian approaches with overlap analyses, and to decide on appropriate thresholds for comparison between the two methods. It is empirically illustrated that BFs have the advantage over P-values of a decreasing type I error rate as study size increases for single-disease associations. Consequently, the overlap analysis of traits from different-sized studies encounters issues in fair P-value threshold selection, whereas BFs are adjusted automatically. Extensive simulations show that Bayesian overlap analyses tend to have higher power than those that assess association strength with P-values, particularly in low-power scenarios. Calibration tables between BFs and P-values are provided for a range of sample sizes, as well as an approximation approach for sample sizes that are not in the calibration table. Although P-values are sometimes thought more intuitive, these tables assist in removing the opaqueness of Bayesian thresholds and may also be used in the selection of a BF threshold to meet a certain type I error rate. An application of our methods is used to identify variants associated with both obesity and osteoarthritis. AU - Asimit, J.L.* AU - Panoutsopoulou, K.* AU - Wheeler, E.* AU - Berndt, S.I.* AU - GIANT Consortium (Albrecht, E. AU - Gieger, C. AU - Grallert, H. AU - Heid, I.M. AU - Illig, T. AU - Müller-Nurasyid, M. AU - Peters, A. AU - Thorand, B. AU - Wichmann, H.-E.) AU - ArcOGEN Consortium (*) AU - Cordell, H.J.* AU - Morris, A.P.* AU - Zeggini, E.* AU - Barroso, I.* C1 - 47640 C2 - 41225 SP - 624-634 TI - A Bayesian approach to the overlap analysis of epidemiologically linked traits. JO - Genet. Epidemiol. VL - 39 IS - 8 PY - 2015 SN - 0741-0395 ER - TY - JOUR AB - Genome-wide association studies (GWAS) successfully identified various chromosomal regions to be associated with multiple sclerosis (MS). The primary aim of this study was to replicate reported associations from GWAS using an exome array in a large German study. German MS cases (n = 4,476) and German controls (n = 5,714) were genotyped using the Illumina HumanExome v1-Chip. Genotype calling was performed with the Illumina Genome Studio(TM) Genotyping Module, followed by zCall. Single-nucleotide polymorphisms (SNPs) in seven regions outside the human leukocyte antigen (HLA) region showed genome-wide significant associations with MS (P values < 5 × 10(-8) ). These associations have been reported previously. In addition, SNPs in three previously reported regions outside the HLA region yielded P values < 10(-5) . The effect of nine SNPs in the HLA region remained (P < 10(-5) ) after adjustment for other significant SNPs in the HLA region. All of these findings have been reported before or are driven by known risk loci. In summary, findings from previous GWAS for MS could be successfully replicated. We conclude that the regions identified in previous GWAS are also associated in the German population. This reassures the need for detailed investigations of the functional mechanisms underlying the replicated associations. AU - Dankowski, T.* AU - Buck, D.* AU - Andlauer, T.F.* AU - Antony, G.* AU - Bayas, A.* AU - Bechmann, L.* AU - Berthele, A.* AU - Bettecken, T.* AU - Chan, A.* AU - Franke, A.* AU - Gold, R.* AU - Graetz, C.* AU - Haas, J.* AU - Hecker, M.* AU - Herms, S.* AU - Infante-Duarte, C.* AU - Jöckel, K.-H.* AU - Kieseier, B.C.* AU - Knier, B.* AU - Knop, M.* AU - Kümpfel, T.* AU - Lichtner, P. AU - Lieb, W.* AU - Lill, C.M.* AU - Limmroth, V.* AU - Linker, R.A.* AU - Loleit, V.* AU - Meuth, S.G.* AU - Moebus, S.* AU - Müller-Myhsok, B.* AU - Nischwitz, S.* AU - Nöthen, M.M.* AU - Paul, F.* AU - Pütz, M.* AU - Ruck, T.* AU - Salmen, A.* AU - Stangel, M.* AU - Stellmann, J.P.* AU - Strauch, K. AU - Stürner, K.H.* AU - Tackenberg, B.* AU - Then Bergh, F.* AU - Tumani, H.* AU - Waldenberger, M. AU - Weber, F.* AU - Wiendl, H.* AU - Wildemann, B.* AU - Zettl, U.K.* AU - Ziemann, U.* AU - Zipp, F.* AU - Hemmer, B.* AU - Ziegler, A.* C1 - 47441 C2 - 39331 SP - 601-608 TI - Successful replication of GWAS hits for multiple sclerosis in 10,000 Germans using the exome array. JO - Genet. Epidemiol. VL - 39 IS - 8 PY - 2015 SN - 0741-0395 ER - TY - JOUR AU - Friedrichs, S.* AU - Amos, C.I.* AU - Brennan, P.* AU - Christiani, D.C.* AU - Hung, R.J.* AU - Risch, A.* AU - Brüske, I. AU - Caporaso, N.* AU - Landi, M.T.* AU - Rafnar, T.* AU - Bickeboeller, H.* C1 - 47283 C2 - 39227 SP - 549 TI - Kernel-based pathway meta-analysis in ILCCO / TRICL genome-wide association studies. JO - Genet. Epidemiol. VL - 39 IS - 7 PY - 2015 SN - 0741-0395 ER - TY - JOUR AB - Although genome-wide association studies (GWAS) have identified thousands of trait-associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low-frequency variants (minor allele frequency <5%), investigators can use region- or gene-based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene-based tests designed for association testing of low-frequency variants on the X chromosome. Here we propose three gene-based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT-O). Using simulated case-control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X-inactivation. For case-control studies, all three tests are reasonably well-calibrated for all scenarios we evaluated. As expected, power for gene-based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene-based tests for X-chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study. AU - Ma, C.* AU - Boehnke, M.* AU - Lee, S.* AU - GoT2D Consortium (Gieger, C. AU - Grallert, H. AU - Hrabě de Angelis, M. AU - Huth, C. AU - Kriebel, J. AU - Meisinger, C. AU - Meitinger, T. AU - Müller-Nurasyid, M. AU - Peters, A. AU - Ried, J.S. AU - Strauch, K. AU - Strom, T.M.) C1 - 47437 C2 - 39334 SP - 499-508 TI - Evaluating the calibration and power of three gene-based association tests of rare variants for the X chromosome. JO - Genet. Epidemiol. VL - 39 IS - 7 PY - 2015 SN - 0741-0395 ER - TY - JOUR AU - Müller-Nurasyid, M. AU - Roselli, C.* AU - Greven, S.* AU - Sinner, M.F.* AU - Waldenberger, M. AU - Peters, A. AU - Strauch, K. AU - Kääb, S. C1 - 47284 C2 - 39226 SP - 571 TI - Genome-wide association analysis with multivariate ECG traits. JO - Genet. Epidemiol. VL - 39 IS - 7 PY - 2015 SN - 0741-0395 ER - TY - JOUR AB - Lung cancer is the leading cause of cancer death worldwide. Although several genetic variants associated with lung cancer have been identified in the past, stringent selection criteria of genome-wide association studies (GWAS) can lead to missed variants. The objective of this study was to uncover missed variants by using the known association between lung cancer and first-degree family history of lung cancer to enrich the variant prioritization for lung cancer susceptibility regions. In this two-stage GWAS study, we first selected a list of variants associated with both lung cancer and family history of lung cancer in four GWAS (3,953 cases, 4,730 controls), then replicated our findings for 30 variants in a meta-analysis of four additional studies (7,510 cases, 7,476 controls). The top ranked genetic variant rs12415204 in chr10q23.33 encoding FFAR4 in the Discovery set was validated in the Replication set with an overall OR of 1.09 (95% CI = 1.04, 1.14, P = 1.63 × 10-4). When combining the two stages of the study, the strongest association was found in rs1158970 at Ch4p15.2 encoding KCNIP4 with an OR of 0.89 (95% CI = 0.85, 0.94, P = 9.64 × 10-6). We performed a stratified analysis of rs12415204 and rs1158970 across all eight studies by age, gender, smoking status, and histology, and found consistent results across strata. Four of the 30 replicated variants act as expression quantitative trait loci (eQTL) sites in 1,111 nontumor lung tissues and meet the genome-wide 10% FDR threshold. AU - Poirier, J.G.* AU - Brennan, P.J.* AU - McKay, J.D.* AU - Spitz, M.R.* AU - Bickeböller, H.* AU - Risch, A.* AU - Liu, G.* AU - Le Marchand, L.* AU - Tworoger, S.S.* AU - McLaughlin, J.R.* AU - Rosenberger, A.* AU - Heinrich, J. AU - Brüske, I. AU - Muley, T.R.* AU - Henderson, B.* AU - Wilkens, L.R.* AU - Zong, X.* AU - Li, Y.* AU - Hao, K.* AU - Timens, W.* AU - Bossé, Y.* AU - Sin, D.* AU - Obeidat, M.A.* AU - Amos, C.I.* AU - Hung, R.* C1 - 43198 C2 - 36313 CY - Hoboken SP - 197-206 TI - Informed genome-wide association analysis with family history as a secondary phenotype identifies novel loci of lung cancer. JO - Genet. Epidemiol. VL - 39 IS - 3 PB - Wiley-blackwell PY - 2015 SN - 0741-0395 ER - TY - JOUR AU - Rosenberger, A.* AU - Friedrichs, S.* AU - Amos, C.I.* AU - Brennan, P.* AU - Fehringer, G.* AU - Brüske, I. AU - Hunh, R.J.* AU - Müller-Nurasyid, M. AU - Risch, A.* AU - Bickeboeller, H.* C1 - 47285 C2 - 39225 SP - 576 TI - Meta-analysis of gene-set analyses based on genome wide association studies, method development and application within ILCCO/TRICL consortia. JO - Genet. Epidemiol. VL - 39 IS - 7 PY - 2015 SN - 0741-0395 ER - TY - JOUR AB - Genome-wide association studies are usually accompanied by imputation techniques to complement genome-wide SNP chip genotypes. Current imputation approaches separate the phasing of study data from imputing, which makes the phasing independent from the reference data. The two-step approach allows for updating the imputation for a new reference panel without repeating the tedious phasing step. This advantage, however, does no longer hold, when the build of the study data differs from the build of the reference data. In this case, the current approach is to harmonize the study data annotation with the reference data (prephasing lift-over), requiring rephasing and re-imputing. As a novel approach, we propose to harmonize study haplotypes with reference haplotypes (postphasing lift-over). This allows for updating imputed study data for new reference panels without requiring rephasing. With continuously updated reference panels, our approach can save considerable computing time of up to 1 month per re-imputation. We evaluated the rephasing and postphasing lift-over approaches by using data from 1,644 unrelated individuals imputed by both approaches and comparing it with directly typed genotypes. On average, both approaches perform equally well with mean concordances of 93% between imputed and typed genotypes for both approaches. Also, imputation qualities are similar (mean difference in RSQ < 0.1%). We demonstrate that our novel postphasing lift-over approach is a practical and time-saving alternative to the prephasing lift-over. This might encourage study partners to accommodate updated reference builds and ultimately improve the information content of study data. Our novel approach is implemented in the software PhaseLift. AU - Gorski, M.* AU - Winkler, T.W.* AU - Stark, K.* AU - Müller-Nurasyid, M. AU - Ried, J.S. AU - Grallert, H. AU - Weber, B.H.F.* AU - Heid, I.M. C1 - 31741 C2 - 34696 CY - Hoboken SP - 381-388 TI - Harmonization of study and reference data by PhaseLift: Saving time when imputing study data. JO - Genet. Epidemiol. VL - 38 IS - 5 PB - Wiley-blackwell PY - 2014 SN - 0741-0395 ER - TY - JOUR AB - In genome-wide association studies of binary traits, investigators typically use logistic regression to test common variants for disease association within studies, and combine association results across studies using meta-analysis. For common variants, logistic regression tests are well calibrated, and meta-analysis of study-specific association results is only slightly less powerful than joint analysis of the combined individual-level data. In recent sequencing and dense chip based association studies, investigators increasingly test low-frequency variants for disease association. In this paper, we seek to (1) identify the association test with maximal power among tests with well controlled type I error rate and (2) compare the relative power of joint and meta-analysis tests. We use analytic calculation and simulation to compare the empirical type I error rate and power of four logistic regression based tests: Wald, score, likelihood ratio, and Firth bias-corrected. We demonstrate for low-count variants (roughly minor allele count [MAC] < 400) that: (1) for joint analysis, the Firth test has the best combination of type I error and power; (2) for meta-analysis of balanced studies (equal numbers of cases and controls), the score test is best, but is less powerful than Firth test based joint analysis; and (3) for meta-analysis of sufficiently unbalanced studies, all four tests can be anti-conservative, particularly the score test. We also establish MAC as the key parameter determining test calibration for joint and meta-analysis. AU - Ma, C.* AU - Blackwell, T.* AU - Boehnke, M.* AU - Scott, L.J.* AU - GoT2D Consortium (Gieger, C. AU - Grallert, H. AU - Hrabě de Angelis, M. AU - Huth, C. AU - Kriebel, J. AU - Meisinger, C. AU - Meitinger, T. AU - Müller-Nurasyid, M. AU - Peters, A. AU - Rathmann, W. AU - Ried, J.S. AU - Strauch, K. AU - Strom, T.M.) C1 - 43143 C2 - 36018 SP - 539-550 TI - Recommended joint and meta-analysis strategies for case-control association testing of single low-count variants. JO - Genet. Epidemiol. VL - 37 IS - 6 PY - 2013 SN - 0741-0395 ER - TY - JOUR AB - Biological plausibility and other prior information could help select genome-wide association (GWA) findings for further follow-up, but there is no consensus on which types of knowledge should be considered or how to weight them. We used experts opinions and empirical evidence to estimate the relative importance of 15 types of information at the single-nucleotide polymorphism (SNP) and gene levels. Opinions were elicited from 10 experts using a two-round Delphi survey. Empirical evidence was obtained by comparing the frequency of each type of characteristic in SNPs established as being associated with seven disease traits through GWA meta-analysis and independent replication, with the corresponding frequency in a randomly selected set of SNPs. SNP and gene characteristics were retrieved using a specially developed bioinformatics tool. Both the expert and the empirical evidence rated previous association in a meta-analysis or more than one study as conferring the highest relative probability of true association, whereas previous association in a single study ranked much lower. High relative probabilities were also observed for location in a functional protein domain, although location in a region evolutionarily conserved in vertebrates was ranked high by the data but not by the experts. Our empirical evidence did not support the importance attributed by the experts to whether the gene encodes a protein in a pathway or shows interactions relevant to the trait. Our findings provide insight into the selection and weighting of different types of knowledge in SNP or gene prioritization, and point to areas requiring further research. AU - Minelli, C.* AU - de Grandi, A.* AU - Weichenberger, C.X.* AU - Gögele, M.* AU - Modenese, M.* AU - Attia, J.* AU - Barrett, J.H.* AU - Boehnke, M.* AU - Borsani, G.* AU - Casari, G.* AU - Fox, C.S.* AU - Freina, T.* AU - Hicks, A.A.* AU - Marroni, F.* AU - Parmigiani, G.* AU - Pastore, A.* AU - Pattaro, C.* AU - Pfeufer, A. AU - Ruggeri, F.* AU - Schwienbacher, C.* AU - Taliun, D.* AU - Pramstaller, P.P.* AU - Domingues, F.S.* AU - Thompson, J.R.* C1 - 22655 C2 - 30938 SP - 205-213 TI - Importance of different types of prior knowledge in selecting genome-wide findings for follow-up. JO - Genet. Epidemiol. VL - 37 IS - 2 PB - Wiley-Blackwell PY - 2013 SN - 0741-0395 ER - TY - JOUR AB - The analysis of gene-environment (G × E) interactions remains one of the greatest challenges in the postgenome-wide association studies (GWASs) era. Recent methods constitute a compromise between the robust but underpowered case-control and powerful case-only methods. Inferences of the latter are biased when the assumption of gene-environment (G-E) independence in controls fails. We propose a novel empirical hierarchical Bayes approach to G × E interaction (EHB-GE), which benefits from greater rank power while accounting for population-based G-E correlation. Building on Lewinger et al.'s ([2007] Genet Epidemiol 31:871-882) hierarchical Bayes prioritization approach, the method first obtains posterior G-E correlation estimates in controls for each marker, borrowing strength from G-E information across the genome. These posterior estimates are then subtracted from the corresponding case-only G × E estimates. We compared EHB-GE with rival methods using simulation. EHB-GE has similar or greater rank power to detect G × E interactions in the presence of large numbers of G-E correlations with weak to strong effects or only a low number of such correlations with large effect. When there are no or only a few weak G-E correlations, Murcray et al.'s method ([2009] Am J Epidemiol 169:219-226) identifies markers with low G × E interaction effects better. We applied EHB-GE and competing methods to four lung cancer case-control GWAS from the Interdisciplinary Research in Cancer of the Lung/International Lung Cancer Consortium with smoking as environmental factor. A number of genes worth investigating were identified by the EHB-GE approach. AU - Sohns, M.* AU - Viktorova, E.* AU - Amos, C.I.* AU - Brennan, P.* AU - Fehringer, G.* AU - Gaborieau, V.* AU - Han, Y.* AU - Heinrich, J. AU - Chang-Claude, J.* AU - Hung, R.J.* AU - Müller-Nurasyid, M. AU - Risch, A.* AU - Lewinger, J.P.* AU - Thomas, D.C.* AU - Bickeböller, H.* C1 - 26322 C2 - 32173 SP - 551-559 TI - Empirical hierarchical Bayes approach to gene-environment interactions: Development and application to genome-wide association studies of lung cancer in TRICL. JO - Genet. Epidemiol. VL - 37 IS - 6 PB - Wiley-Blackwell PY - 2013 SN - 0741-0395 ER - TY - JOUR AB - Prioritization is the process whereby a set of possible candidate genes or SNPs is ranked so that the most promising can be taken forward into further studies. In a genome-wide association study, prioritization is usually based on the P-values alone, but researchers sometimes take account of external annotation information about the SNPs such as whether the SNP lies close to a good candidate gene. Using external information in this way is inherently subjective and is often not formalized, making the analysis difficult to reproduce. Building on previous work that has identified 14 important types of external information, we present an approximate Bayesian analysis that produces an estimate of the probability of association. The calculation combines four sources of information: the genome-wide data, SNP information derived from bioinformatics databases, empirical SNP weights, and the researchers subjective prior opinions. The calculation is fast enough that it can be applied to millions of SNPS and although it does rely on subjective judgments, those judgments are made explicit so that the final SNP selection can be reproduced. We show that the resulting probability of association is intuitively more appealing than the P-value because it is easier to interpret and it makes allowance for the power of the study. We illustrate the use of the probability of association for SNP prioritization by applying it to a meta-analysis of kidney function genome-wide association studies and demonstrate that SNP selection performs better using the probability of association compared with P-values alone. AU - Thompson, J.R.* AU - Gögele, M.* AU - Weichenberger, C.X.* AU - Modenese, M.* AU - Attia, J.* AU - Barrett, J.H.* AU - Boehnke, M.* AU - de Grandi, A.* AU - Domingues, F.S.* AU - Hicks, A.A.* AU - Marroni, F.* AU - Pattaro, C.* AU - Ruggeri, F.* AU - Borsani, G.* AU - Casari, G.* AU - Parmigiani, G.* AU - Pastore, A.* AU - Pfeufer, A. AU - Schwienbacher, C.* AU - Taliun, D.* AU - CKDGen Consortium (*) AU - Fox, C.S.* AU - Pramstaller, P.P.* AU - Minelli, C.* C1 - 22653 C2 - 30940 SP - 214-221 TI - SNP prioritization using a Bayesian probability of association. JO - Genet. Epidemiol. VL - 37 IS - 2 PB - Wiley-Blackwell PY - 2013 SN - 0741-0395 ER - TY - JOUR AB - Meta-analyses of genome-wide association studies require numerous study partners to conduct pre-defined analyses and thus simple but efficient analyses plans. Potential differences between strata (e.g. men and women) are usually ignored, but often the question arises whether stratified analyses help to unravel the genetics of a phenotype or if they unnecessarily increase the burden of analyses. To decide whether to stratify or not to stratify, we compare general analytical power computations for the overall analysis with those of stratified analyses considering quantitative trait analyses and two strata. We also relate the stratification problem to interaction modeling and exemplify theoretical considerations on obesity and renal function genetics. We demonstrate that the overall analyses have better power compared to stratified analyses as long as the signals are pronounced in both strata with consistent effect direction. Stratified analyses are advantageous in the case of signals with zero (or very small) effect in one stratum and for signals with opposite effect direction in the two strata. Applying the joint test for a main SNP effect and SNP-stratum interaction beats both overall and stratified analyses regarding power, but involves more complex models. In summary, we recommend to employ stratified analyses or the joint test to better understand the potential of strata-specific signals with opposite effect direction. Only after systematic genome-wide searches for opposite effect direction loci have been conducted, we will know if such signals exist and to what extent stratified analyses can depict loci that otherwise are missed. AU - Behrens, G.* AU - Winkler, T.W.* AU - Gorski, M. AU - Leitzmann, M.F.* AU - Heid, I.M. C1 - 6896 C2 - 29447 CY - New York, N.Y. SP - 867-879 TI - To stratify or not to stratify: Power considerations for population-based genome-wide association studies of quantitative traits. JO - Genet. Epidemiol. VL - 35 IS - 8 PB - Wiley-Blackwell PY - 2012 SN - 0741-0395 ER - TY - JOUR AB - Most genome-wide association studies (GWAS) are restricted to one phenotype, even if multiple related or unrelated phenotypes are available. However, an integrated analysis of multiple phenotypes can provide insight into their shared genetic basis and may improve the power of association studies. We present a new method, called "phenotype set enrichment analysis" (PSEA), which uses ideas of gene set enrichment analysis for the investigation of phenotype sets. PSEA combines statistics of univariate phenotype analyses and tests by permutation. It does not only allow analyzing predefined phenotype sets, but also to identify new phenotype sets. Apart from the application to situations where phenotypes and genotypes are available for each person, the method was adjusted to the analysis of GWAS summary statistics. PSEA was applied to data from the population-based cohort KORA F4 (N = 1,814) using iron-related and blood count traits. By confirming associations previously found in large meta-analyses on these traits, PSEA was shown to be a reliable tool. Many of these associations were not detectable by GWAS on single phenotypes in KORA F4. Therefore, the results suggest that PSEA can be more powerful than a single phenotype GWAS for the identification of association with multiple phenotypes. PSEA is a valuable method for analysis of multiple phenotypes, which can help to understand phenotype networks. Its flexible design enables both the use of prior knowledge and the generation of new knowledge on connection of multiple phenotypes. A software program for PSEA based on GWAS results is available upon request. AU - Ried, J.S. AU - Döring, A. AU - Oexle, K.* AU - Meisinger, C. AU - Winkelmann, J. AU - Klopp, N. AU - Meitinger, T. AU - Peters, A. AU - Suhre, K. AU - Wichmann, H.-E. AU - Gieger, C. C1 - 7769 C2 - 29806 SP - 244-252 TI - PSEA: Phenotype Set Enrichment Analysis - a new method for analysis of multiple phenotypes. JO - Genet. Epidemiol. VL - 36 IS - 3 PB - Wiley-Blackwell PY - 2012 SN - 0741-0395 ER - TY - JOUR AB - The objective of this study is to investigate the prevalence of Down syndrome (DS) associated with Chernobyl fallout. Maternal age-adjusted DS data and corresponding live birth data from the following seven European countries or regions were analyzed: Bavaria and West Berlin in Germany, Belarus, Hungary, the Lothian Region of Scotland, North West England, and Sweden from 1981 to 1992. To assess the underlying time trends in the DS occurrence, and to investigate whether there have been significant changes in the trend functions after Chernobyl, we applied logistic regression allowing for peaks and jumps from January 1987 onward. The majority of the trisomy 21 cases of the previously reported, highly significant January 1987 clusters in Belarus and West Berlin were conceived when the radioactive clouds with significant amounts of radionuclides with short physical half-lives, especially (131) iodine, passed over these regions. Apart from this, we also observed a significant longer lasting effect in both areas. Moreover, evidence for long-term changes in the DS prevalence in several other European regions is presented and explained by exposure, especially to (137) Cs. In many areas, (137) Cs uptake reached its maximum one year after the Chernobyl accident. Thus, the highest increase in trisomy 21 should be observed in 1987/1988, which is indeed the case. Based on the fact that maternal meiosis is an error prone process, the assumption of a causal relationship between low-dose irradiation and nondisjunction is the most likely explanation for the observed increase in DS after the Chernobyl reactor accident. AU - Sperling, K.* AU - Neitzel, H.* AU - Scherb, H. C1 - 7180 C2 - 29525 SP - 48-55 TI - Evidence for an increase in trisomy 21 (Down syndrome) in Europe after the Chernobyl reactor accident. JO - Genet. Epidemiol. VL - 36 IS - 1 PB - Wiley-Blackwell PY - 2012 SN - 0741-0395 ER - TY - JOUR AB - Current approaches for analysis of longitudinal genetic epidemiological data of quantitative traits are typically restricted to normality assumptions of the trait. We introduce the longitudinal nonparametric test (LNPT) for cohorts with quantitative follow-up data to test for overall main effects of genes and for gene-gene and gene-time interactions. The LNPT is a rank procedure and does not depend on normality assumptions of the trait. We demonstrate by simulations that the LNPT is powerful, keeps the type-1 error level, and has very good small sample size behavior. For phenotypes with normal residuals, loss of power compared to parametric approaches (linear mixed models) was small for the quite general scenarios, which we simulated. For phenotypes with non-normal residuals, gain in power by the LNPT can be substantial. In contrast to parametric approaches, the LNPT is invariant with respect to monotone transformations of the trait. It is mathematically valid for arbitrary trait distribution. AU - Malzahn, D.* AU - Schillert, A.* AU - Müller, M. AU - Bickeböller, H.* C1 - 1804 C2 - 27380 SP - 469-478 TI - The longitudinal nonparametric test as a new tool to explore gene-gene and gene-time effects in cohorts. JO - Genet. Epidemiol. VL - 34 IS - 5 PB - Wiley-Blackwell PY - 2010 SN - 0741-0395 ER - TY - JOUR AU - Malzahn, D.* AU - Schillert, A.* AU - Müller-Nurasyid, M. AU - Heid, I.M. AU - Wichmann, H.-E. AU - Bickeboeller, H.* C1 - 60088 C2 - 0 CY - Div John Wiley & Sons Inc, 111 River St, Hoboken, Nj 07030 Usa SP - 706-706 TI - Gene-gene and gene-time effects in cohorts: Simulation study of a nonparametric longitudinal approach and results on real data. JO - Genet. Epidemiol. VL - 32 IS - 7 PB - Wiley-liss PY - 2008 SN - 0741-0395 ER - TY - JOUR AB - Particularly in studies based on population representative samples, it is of major interest what impact a genetic variant has on the phenotype of interest, which cannot be answered by mere association estimates alone. One possible measure for quantifying the phenotype's variance explained by the genetic variant is R(2). However, for survival outcomes, no clear definition of R(2) is available in the presence of censored observations. We selected three criteria proposed for this purpose in the literature and compared their performance for single nucleotide polymorphism (SNP) data through simulation studies and for mortality data with candidate SNPs in the general population-based KORA cohort. The evaluated criteria were based on: (1) the difference of deviance residuals, (2) the variation of individual survival curves, and (3) the variation of Schoenfeld residuals. Our simulation studies included various censoring and genetic scenarios. The simulation studies revealed that the deviance residuals' criterion had a high dependence on the censoring percentage, was generally not limited to the range [0; 1] and therefore lacked interpretation as a percentage of explained variation. The second criterion (variation of survival curves) hardly reached values above 60%. Our requirements were best fulfilled by the criterion based on Schoenfeld residuals. Our mortality data analysis also supported the findings in simulation studies. With the criterion based on Schoenfeld residuals, we recommend a powerful and flexible tool for genetic epidemiological studies to refine genetic association studies by judging the contribution of genetic variants to survival phenotype. AU - Müller, M. AU - Döring, A. AU - Küchenhoff, H.* AU - Lamina, C. AU - Malzahn, D.* AU - Bickeböller, H.* AU - Vollmert, C. AU - Klopp, N. AU - Meisinger, C. AU - Heinrich, J. AU - Kronenberg, F.* AU - Wichmann, H.-E. AU - Heid, I.M. C1 - 1824 C2 - 25570 SP - 574-585 TI - Quantifying the contribution of genetic variants for survival phenotypes. JO - Genet. Epidemiol. VL - 32 IS - 6 PB - Wiley-Blackwell PY - 2008 SN - 0741-0395 ER - TY - JOUR AU - Loesgen, S. AU - Scholz, M.* AU - Schmidt, S.* AU - Bickeböller, H.* C1 - 21328 C2 - 19443 SP - 235-240 TI - Incorporating Larger Families in Identity-by-Descent Based Linkage Analysis. JO - Genet. Epidemiol. VL - 17 (Suppl.1) PY - 1999 SN - 0741-0395 ER - TY - JOUR AU - Schmidt, S.* AU - Scholz, M.* AU - Loesgen, S. AU - Bickeböller, H.* C1 - 21327 C2 - 19442 SP - 709-714 TI - Systematic Search for Susceptibility Genes in Different Populations. JO - Genet. Epidemiol. VL - 17 (Suppl.1) PY - 1999 SN - 0741-0395 ER - TY - JOUR AB - Principal component analysis was used to construct quantitative phenotypes for alcoholism. These were analyzed for linkage to genomic regions with a variance components approach. The four phenotypes considered were a factor describing medical symptoms of alcohol dependency, a factor describing a psychological profile correlated with susceptibility to alcoholism, monoamine oxidase B (MAOB) activity and an average measurement of the P3 component of event-related potentials (ERP) at the Fp electrode placements. One region (around marker GATA123C09 on chromosome 3) with suggestive evidence for linkage was detected for the P3 (Fp) measurement. For three of the four distinct phenotypes, modest evidence for linkage to a similar region (around marker ADH3 on chromosome 4) was found. AU - Scholz, M.* AU - Schmidt, S.* AU - Loesgen, S. AU - Bickeböller, H.* C1 - 21326 C2 - 19441 SP - 313-318 TI - Analysis of Principal Component Based Quantitative Phenotypes for Alcoholism. JO - Genet. Epidemiol. VL - 17 (Suppl.1) PY - 1999 SN - 0741-0395 ER -