TY - JOUR AB - Integration of single-cell RNA-sequencing (scRNA-seq) datasets is standard in scRNA-seq analysis. Nevertheless, current computational methods struggle to harmonize datasets across systems such as species, organoids and primary tissue, or different scRNA-seq protocols, including single-cell and single-nuclei. Conditional variational autoencoders (cVAE) are a popular integration method, however, existing strategies for stronger batch correction have limitations. Increasing the Kullback-Leibler divergence regularization does not improve integration and adversarial learning removes biological signals. Here, we propose sysVI, a cVAE-based method employing VampPrior and cycle-consistency constraints. We show that sysVI integrates across systems and improves biological signals for downstream interpretation of cell states and conditions. AU - Hrovatin, K. AU - Moinfar, A.A. AU - Zappia, L. AU - Parikh, S. AU - Tejada Lapuerta, A. AU - Lengerich, B.* AU - Kellis, M.* AU - Theis, F.J. C1 - 75927 C2 - 58203 CY - Campus, 4 Crinan St, London N1 9xw, England TI - Integrating single-cell RNA-seq datasets with substantial batch effects. JO - BMC Genomics VL - 26 IS - 1 PB - Bmc PY - 2025 SN - 1471-2164 ER - TY - JOUR AB - PURPOSE: Corneal dysmorphologies (CDs) are typically classified as either regressive degenerative corneal dystrophies (CDtrs) or defective growth and differentiation-driven corneal dysplasias (CDyps). Both eye disorders have multifactorial etiologies. While previous work has elucidated many aspects of CDs, such as presenting symptoms, epidemiology, and pathophysiology, the genetic mechanisms remain incompletely understood. The purpose of this study was to analyze phenotype data from 8,707 knockout mouse lines to identify new genes associated with the development of CDs in humans. METHODS: 8,707 knockout mouse lines phenotyped by the International Mouse Phenotyping Consortium were queried for genes associated with statistically significant (P < 0.0001) abnormal cornea morphology to identify candidate CD genes. Corneal abnormalities were investigated by histopathology. A literature search was used to determine the proportion of candidate genes previously associated with CDs in mice and humans. Phenotypes of human orthologues of mouse candidate genes were compared with known human CD genes to identify protein-protein interactions and molecular pathways using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING), Protein Analysis Through Evolutionary Relationships (PANTHER), and Kyoto Encyclopedia of Genes and Genomes. RESULTS: Analysis of data from 8,707 knockout mouse lines identified 213 candidate CD genes. Of these, 37 (17%) genes were previously known to be associated with CD, including 14 in the mouse, 16 in humans, and 7 in both. The remaining 176 (83%) genes have not been previously implicated in CD. We also searched publicly available RNAseq data and found that 131 of the total 213 (61.5%) were expressed in adult human corneal tissue. STRING analysis showed several interactions within and between candidate and established CD proteins. All cellular pathways of the established genes were found in the PANTHER analysis of the candidate genes. Several of the candidate genes were implicated in corneal disease, such as TGF-ß signaling. We also identified other possible underappreciated mechanisms relevant to the human cornea. CONCLUSIONS: We identified 213 mouse genes that resulted in statistically significant abnormal corneal phenotypes in knockout mice, many of which have not previously been implicated in corneal pathology. Bioinformatic analyses implicated candidate genes in several signaling pathways which are potential therapeutic targets. AU - Vo, P.* AU - Imai-Leonard, D.M.* AU - Yang, B.* AU - Briere, A.* AU - Shao, A.* AU - Casanova, M.I.* AU - Adams, D.* AU - Amano, T.* AU - Amarie, O.V. AU - Berberovic, Z.* AU - Bower, L.* AU - Braun, R.* AU - Brown, S.* AU - Burrill, S.* AU - Cho, S.Y.* AU - Clementson-Mobbs, S.* AU - D'Souza, A.R.* AU - Dickinson, M.* AU - Eskandarian, M.* AU - Flenniken, A.M.* AU - Fuchs, H. AU - Gailus-Durner, V. AU - Heaney, J.D.* AU - Herault, Y.* AU - Hrabě de Angelis, M. AU - Hsu, C.W.* AU - Jin, S.* AU - Joynson, R.* AU - Kang, Y.K.* AU - Kim, H.* AU - Masuya, H.* AU - Meziane, H.* AU - Murray, S.A.* AU - Nam, K.H.* AU - Noh, H.* AU - Nutter, L.M.J.* AU - Palkova, M.* AU - Prochazka, J.* AU - Raishbrook, M.J.* AU - Riet, F.* AU - Ryan, J.* AU - Salazar, J.* AU - Seavey, Z.* AU - Seavitt, J.R.* AU - Sedlacek, R.* AU - Selloum, M.* AU - Seo, K.Y.* AU - Seong, J.K.* AU - Shin, H.S.* AU - Shiroishi, T.* AU - Stewart, M.* AU - Svenson, K.L.* AU - Tamura, M.* AU - Tolentino, H.* AU - Udensi, U.* AU - Wells, S.* AU - White, J.* AU - Willett, A.M.* AU - Wotton, J.M.* AU - Wurst, W. AU - Yoshiki, A.* AU - Lanoue, L.* AU - Lloyd, K.C.K.* AU - Leonard, B.C.* AU - Roux, M.J.* AU - McKerlie, C.* AU - Moshiri, A.* C1 - 73128 C2 - 56855 CY - Campus, 4 Crinan St, London N1 9xw, England TI - Systematic ocular phenotyping of 8,707 knockout mouse lines identifies genes associated with abnormal corneal phenotypes. JO - BMC Genomics VL - 26 IS - 1 PB - Bmc PY - 2025 SN - 1471-2164 ER - TY - JOUR AB - Background: Fusarium head blight (FHB) is a devastating disease of wheat worldwide. Resistance to FHB is quantitatively controlled by the combined effects of many small to medium effect QTL. Flowering traits, especially the extent of extruded anthers, are strongly associated with FHB resistance. Results: To characterize the genetic basis of FHB resistance, we generated and analyzed phenotypic and gene expression data on the response to Fusarium graminearum (Fg) infection in 96 European winter wheat genotypes, including several lines containing introgressions from the highly resistant Asian cultivar Sumai3. The 96 lines represented a broad range in FHB resistance and were assigned to sub-groups based on their phenotypic FHB severity score. Comparative analyses were conducted to connect sub-group-specific expression profiles in response to Fg infection with FHB resistance level. Collectively, over 12,300 wheat genes were Fusarium responsive. The core set of genes induced in response to Fg was common across different resistance groups, indicating that the activation of basal defense response mechanisms was largely independent of the resistance level of the wheat line. Fg-induced genes tended to have higher expression levels in more susceptible genotypes. Compared to the more susceptible non-Sumai3 lines, the Sumai3-derivatives demonstrated higher constitutive expression of genes associated with cell wall and plant-type secondary cell wall biogenesis and higher constitutive and Fg-induced expression of genes involved in terpene metabolism. Gene expression analysis of the FHB QTL Qfhs.ifa-5A identified a constitutively expressed gene encoding a stress response NST1-like protein (TraesCS5A01G211300LC) as a candidate gene for FHB resistance. NST1 genes are key regulators of secondary cell wall biosynthesis in anther endothecium cells. Whether the stress response NST1-like gene affects anther extrusion, thereby affecting FHB resistance, needs further investigation. Conclusion: Induced and preexisting cell wall components and terpene metabolites contribute to resistance and limit fungal colonization early on. In contrast, excessive gene expression directs plant defense response towards programmed cell death which favors necrotrophic growth of the Fg pathogen and could thus lead to increased fungal colonization. AU - Buerstmayr, M.* AU - Wagner, C.* AU - Nosenko, T. AU - Omony, J. AU - Steiner, B.* AU - Nussbaumer, T. AU - Mayer, K.F.X. AU - Buerstmayr, H.* C1 - 62391 C2 - 50853 CY - Campus, 4 Crinan St, London N1 9xw, England TI - Fusarium head blight resistance in European winter wheat: Insights from genome-wide transcriptome analysis. JO - BMC Genomics VL - 22 IS - 1 PB - Bmc PY - 2021 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Most insects are relatively short-lived, with a maximum lifespan of a few weeks, like the aging model organism, the fruit-fly Drosophila melanogaster. By contrast, the queens of many social insects (termites, ants and some bees) can live from a few years to decades. This makes social insects promising models in aging research providing insights into how a long reproductive life can be achieved. Yet, aging studies on social insect reproductives are hampered by a lack of quantitative data on age-dependent survival and time series analyses that cover the whole lifespan of such long-lived individuals. We studied aging in queens of the drywood termite Cryptotermes secundus by determining survival probabilities over a period of 15 years and performed transcriptome analyses for queens of known age that covered their whole lifespan. RESULTS: The maximum lifespan of C. secundus queens was 13 years, with a median maximum longevity of 11.0 years. Time course and co-expression network analyses of gene expression patterns over time indicated a non-gradual aging pattern. It was characterized by networks of genes that became differentially expressed only late in life, namely after ten years, which associates well with the median maximum lifespan for queens. These old-age gene networks reflect processes of physiological upheaval. We detected strong signs of stress, decline, defense and repair at the transcriptional level of epigenetic control as well as at the post-transcriptional level with changes in transposable element activity and the proteostasis network. The latter depicts an upregulation of protein degradation, together with protein synthesis and protein folding, processes which are often down-regulated in old animals. The simultaneous upregulation of protein synthesis and autophagy is indicative of a stress-response mediated by the transcription factor cnc, a homolog of human nrf genes. CONCLUSIONS: Our results show non-linear senescence with a rather sudden physiological upheaval at old-age. Most importantly, they point to a re-wiring in the proteostasis network and stress as part of the aging process of social insect queens, shortly before queens die. AU - Monroy Kuhn, J.M. AU - Meusemann, K.* AU - Korb, J.* C1 - 62039 C2 - 50597 CY - Campus, 4 Crinan St, London N1 9xw, England TI - Disentangling the aging gene expression network of termite queens. JO - BMC Genomics VL - 22 IS - 1 PB - Bmc PY - 2021 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: RNA-seq emerges as a valuable method for clinical genetics. The transcriptome is "dynamic" and tissue-specific, but typically the probed tissues to analyze (TA) are different from the tissue of interest (TI) based on pathophysiology. RESULTS: We developed Phenotype-Tissue Expression and Exploration (PTEE), a tool to facilitate the decision about the most suitable TA for RNA-seq. We integrated phenotype-annotated genes, used 54 tissues from GTEx to perform correlation analyses and identify expressed genes and transcripts between TAs and TIs. We identified skeletal muscle as the most appropriate TA to inquire for cardiac arrhythmia genes and skin as a good proxy to study neurodevelopmental disorders. We also explored RNA-seq limitations and show that on-off switching of gene expression during ontogenesis or circadian rhythm can cause blind spots for RNA-seq-based analyses. CONCLUSIONS: PTEE aids the identification of tissues suitable for RNA-seq for a given pathology to increase the success rate of diagnosis and gene discovery. PTEE is freely available at https://bioinf.eva.mpg.de/PTEE/. AU - Velluva, A.* AU - Radtke, M.* AU - Horn, S.* AU - Popp, B.* AU - Platzer, K.* AU - Gjermeni, E.* AU - Lin, C.C.* AU - Lemke, J.R.* AU - Garten, A.* AU - Schöneberg, T.* AU - Blüher, M. AU - Abou Jamra, R.* AU - Le Duc, D. C1 - 63507 C2 - 51573 CY - Campus, 4 Crinan St, London N1 9xw, England TI - Phenotype-tissue expression and exploration (PTEE) resource facilitates the choice of tissue for RNA-seq-based clinical genetics studies. JO - BMC Genomics VL - 22 IS - 1 PB - Bmc PY - 2021 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Lake Baikal is one of the oldest freshwater lakes and has constituted a stable environment for millions of years, in stark contrast to small, transient bodies of water in its immediate vicinity. A highly diverse endemic endemic amphipod fauna is found in one, but not the other habitat. We ask here whether differences in stress response can explain the immiscibility barrier between Lake Baikal and non-Baikal faunas. To this end, we conducted exposure experiments to increased temperature and the toxic heavy metal cadmium as stressors. RESULTS: Here we obtained high-quality de novo transcriptome assemblies, covering mutiple conditions, of three amphipod species, and compared their transcriptomic stress responses. Two of these species, Eulimnogammarus verrucosus and E. cyaneus, are endemic to Lake Baikal, while the Holarctic Gammarus lacustris is a potential invader. CONCLUSIONS: Both Baikal species possess intact stress response systems and respond to elevated temperature with relatively similar changes in their expression profiles. G. lacustris reacts less strongly to the same stressors, possibly because its transcriptome is already perturbed by acclimation conditions. AU - Drozdova, P.* AU - Rivarola-Duarte, L. AU - Bedulina, D.* AU - Axenov-Gribanov, D.* AU - Schreiber, S.* AU - Gurkov, A.* AU - Shatilina, Z.* AU - Vereshchagina, K.* AU - Lubyaga, Y.* AU - Madyarova, E.* AU - Otto, C.* AU - Jühling, F.* AU - Busch, W.* AU - Jakob, L.* AU - Lucassen, M.* AU - Sartoris, F.J.* AU - Hackermüller, J.* AU - Hoffmann, S.* AU - Pörtner, H.O.* AU - Luckenbach, T.* AU - Timofeyev, M.* AU - Stadler, P.F.* C1 - 56906 C2 - 47423 TI - Comparison between transcriptomic responses to short-term stress exposures of a common Holarctic and endemic Lake Baikal amphipods. JO - BMC Genomics VL - 20 IS - 1 PY - 2019 SN - 1471-2164 ER - TY - JOUR AB - Background: Norepinephrine (NE) signaling has a key role in white adipose tissue (WAT) functions, including lipolysis, free fatty acid liberation and, under certain conditions, conversion of white into brite (brown-in-white) adipocytes. However, acute effects of NE stimulation have not been described at the transcriptional network level.Results: We used RNA-seq to uncover a broad transcriptional response. The inference of protein-protein and protein-DNA interaction networks allowed us to identify a set of immediate-early genes (IEGs) with high betweenness, validating our approach and suggesting a hierarchical control of transcriptional regulation. In addition, we identified a transcriptional regulatory network with IEGs as master regulators, including HSF1 and NFIL3 as novel NE-induced IEG candidates. Moreover, a functional enrichment analysis and gene clustering into functional modules suggest a crosstalk between metabolic, signaling, and immune responses.Conclusions: Altogether, our network biology approach explores for the first time the immediate-early systems level response of human adipocytes to acute sympathetic activation, thereby providing a first network basis of early cell fate programs and crosstalks between metabolic and transcriptional networks required for proper WAT function. AU - Higareda-Almaraz, J. AU - Karbiener, M.* AU - Giroud, M. AU - Pauler, F.M.* AU - Gerhalter, T.* AU - Herzig, S. AU - Scheideler, M. C1 - 54737 C2 - 45825 CY - Campus, 4 Crinan St, London N1 9xw, England TI - Norepinephrine triggers an immediate-early regulatory network response in primary human white adipocytes. JO - BMC Genomics VL - 19 IS - 1 PB - Bmc PY - 2018 SN - 1471-2164 ER - TY - JOUR AB - Background: Whole-genome bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage. Results: Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies. Conclusions: METHimpute provides methylation status calls and levels for all cytosines in the genome regardless of coverage, thus yielding complete methylomes even with low-coverage WGBS datasets. The method has been extensively tested in plants, but should also be applicable to other species. An implementation is available on Bioconductor. AU - Taudt, A.* AU - Roquis, D.* AU - Vidalis, A.* AU - Wardenaar, R.* AU - Johannes, F.* AU - Colomé-Tatché, M. C1 - 53764 C2 - 45015 CY - Elsevier House, Brookvale Plaza, East Park Shannon, Co, Clare, 00000, Ireland TI - METHimpute: Imputation-guided construction of complete methylomes from WGBS data. JO - BMC Genomics VL - 19 IS - 1 PB - Elsevier Ireland Ltd PY - 2018 SN - 1471-2164 ER - TY - JOUR AB - Background As CRISPR/Cas9 mediated screens with pooled guide libraries in somatic cells become increasingly established, an unmet need for rapid and accurate companion informatics tools has emerged. We have developed a lightweight and efficient software to easily manipulate large raw next generation sequencing datasets derived from such screens into informative relational context with graphical support. The advantages of the software entitled ENCoRE (Easy NGS-to-Gene CRISPR REsults) include a simple graphical workflow, platform independence, local and fast multithreaded processing, data pre-processing and gene mapping with custom library import. Results We demonstrate the capabilities of ENCoRE to interrogate results from a pooled CRISPR cellular viability screen following Tumor Necrosis Factor-alpha challenge. The results not only identified stereotypical players in extrinsic apoptotic signaling but two as yet uncharacterized members of the extrinsic apoptotic cascade, Smg7 and Ces2a. We further validated and characterized cell lines containing mutations in these genes against a panel of cell death stimuli and involvement in p53 signaling. Conclusions In summary, this software enables bench scientists with sensitive data or without access to informatic cores to rapidly interpret results from large scale experiments resulting from pooled CRISPR/Cas9 library screens.   AU - Trümbach, D. AU - Pfeiffer, S. AU - Poppe, M. AU - Scherb, H. AU - Doll, S. AU - Wurst, W. AU - Schick, J. C1 - 52410 C2 - 43947 CY - London TI - ENCoRE: An efficient software for CRISPR screens identifies new players in extrinsic apoptosis. JO - BMC Genomics VL - 18 IS - 1 PB - Biomed Central Ltd PY - 2017 SN - 1471-2164 ER - TY - JOUR AB - Background: The evidence for epigenome-wide associations between smoking and DNA methylation continues to grow through cross-sectional studies. However, few large-scale investigations have explored the associations using observations for individuals at multiple time-points. Here, through the use of the Illumina 450K BeadChip and data collected at two time-points separated by approximately 7years, we investigate changes in methylation over time associated with quitting smoking or remaining a former smoker, and those associated with continued smoking. Results: Our results indicate that after quitting smoking the most rapid reversion of altered methylation occurs within the first two decades, with reversion rates related to the initial differences in methylation. For 52 CpG sites, the change in methylation from baseline to follow-up is significantly different for former smokers relative to the change for never smokers (lowest p-value 3.61 x 10 -39 for cg26703534, gene AHRR). Most of these sites' respective regions have been previously implicated in smoking-associated diseases. Despite the early rapid change, dynamism of methylation appears greater in former smokers vs never smokers even four decades after cessation. Furthermore, our study reveals the heterogeneous effect of continued smoking: the methylation levels of some loci further diverge between smokers and non-smokers, while others re-approach. Though intensity of smoking habit appears more significant than duration, results remain inconclusive. Conclusions: This study improves the understanding of the dynamic link between cigarette smoking and methylation, revealing the continued fluctuation of methylation levels decades after smoking cessation and demonstrating that continuing smoking can have an array of effects. The results can facilitate insights into the molecular mechanisms behind smoking-induced disturbed methylation, improving the possibility for development of biomarkers of past smoking behavior and increasing the understanding of the molecular path from exposure to disease. AU - Wilson, R. AU - Wahl, S. AU - Pfeiffer, L. AU - Ward-Caviness, C.K. AU - Kunze, S. AU - Kretschmer, A. AU - Reischl, E. AU - Peters, A. AU - Gieger, C. AU - Waldenberger, M. C1 - 52214 C2 - 43798 TI - The dynamics of smoking-related disturbed methylation: A two time-point study of methylation change in smokers, non-smokers and former smokers. JO - BMC Genomics VL - 18 IS - 1 PY - 2017 SN - 1471-2164 ER - TY - JOUR AB - Background: Epigenetic information can be used to identify clinically relevant genomic variants single nucleotide polymorphisms (SNPs) of functional importance in cancer development. Super-enhancers are cell-specific DNA elements, acting to determine tissue or cell identity and driving tumor progression. Although previous approaches have been tried to explain risk associated with SNPs in regulatory DNA elements, so far epigenetic readers such as bromodomain containing protein 4 (BRD4) and super-enhancers have not been used to annotate SNPs. In prostate cancer (PC), androgen receptor (AR) binding sites to chromatin have been used to inform functional annotations of SNPs. Results: Here we establish criteria for enhancer mapping which are applicable to other diseases and traits to achieve the optimal tissue-specific enrichment of PC risk SNPs. We used stratified Q-Q plots and Fisher test to assess the differential enrichment of SNPs mapping to specific categories of enhancers. We find that BRD4 is the key discriminant of tissue-specific enhancers, showing that it is more powerful than AR binding information to capture PC specific risk loci, and can be used with similar effect in breast cancer (BC) and applied to other diseases such as schizophrenia. Conclusions: This is the first study to evaluate the enrichment of epigenetic readers in genome-wide associations studies for SNPs within enhancers, and provides a powerful tool for enriching and prioritizing PC and BC genetic risk loci. Our study represents a proof of principle applicable to other diseases and traits that can be used to redefine molecular mechanisms of human phenotypic variation. AU - Zuber, V.* AU - Bettella, F.* AU - Witoelar, A.W.* AU - Andreassen, O.A.* AU - Mills, I.G.* AU - Urbanucci, A.* AU - Eeles, R.A.* AU - Easton, D.F.* AU - Kote-Jarai, Z.* AU - Al Olama, A.A.* AU - Benlloch, S.* AU - Muir, K.* AU - Giles, G.G.* AU - Wiklund, F.* AU - Grönberg, H.* AU - Haiman, C.A.* AU - Schleutker, J.* AU - Weischer, M.* AU - Travis, R.C.* AU - Neal, D.* AU - Pharoah, P.* AU - Khaw, K.T.* AU - Stanford, J.L.* AU - Blot, W.J.* AU - Thibodeau, S.N.* AU - Maier, C.* AU - Kibel, A.S.* AU - Cybulski, C.* AU - Cannon-Albright, L.* AU - Brenner, H.* AU - Park, J.* AU - Kaneva, R.* AU - Batra, J.* AU - Teixeira, M.R.* AU - Pandha, H.* AU - Chenevix-Trench, G.* AU - Humphreys, M.K.* AU - Hung, R.J.* AU - Han, Y.* AU - Brennan, P.* AU - Bickeböller, H.* AU - Rosenberger, A.* AU - Houlston, R.S.* AU - Caporaso, N.* AU - Landi, M.T.* AU - Heinrich, J. AU - Risch, A.* AU - Wu, X.* AU - Ye, Y.* AU - Christiani, D.C.* AU - Amos, C.I.* AU - Michailidou, K.* AU - Bolla, M.K.* AU - Wang, Q.* AU - Berchuck, A.* AU - Antoniou, A.C.* AU - McGuffog, L.* AU - Couch, F.J.* AU - Offit, K.* AU - Dennis, J.* AU - Dunning, A.M.* AU - Lee, A.* AU - Dicks, E.* AU - Luccarini, C.* AU - Benítez, J.* AU - González-Neira, A.* AU - Simard, J.* AU - Tessier, D.C.* AU - Bacot, F.* AU - Vincent, D.* AU - Laboissiere, S.* C1 - 50910 C2 - 42639 TI - Bromodomain protein 4 discriminates tissue-specific super-enhancers containing disease-specific susceptibility loci in prostate and breast cancer. JO - BMC Genomics VL - 18 IS - 1 PY - 2017 SN - 1471-2164 ER - TY - JOUR AB - � 2016 The Author(s). Background: The bacterial CRISPR system is fast becoming the most popular genetic and epigenetic engineering tool due to its universal applicability and adaptability. The desire to deploy CRISPR-based methods in a large variety of species and contexts has created an urgent need for the development of easy, time- and cost-effective methods enabling large-scale screening approaches. Results: Here we describe CORALINA (comprehensive gRNA library generation through controlled nuclease activity), a method for the generation of comprehensive gRNA libraries for CRISPR-based screens. CORALINA gRNA libraries can be derived from any source of DNA without the need of complex oligonucleotide synthesis. We show the utility of CORALINA for human and mouse genomic DNA, its reproducibility in covering the most relevant genomic features including regulatory, coding and non-coding sequences and confirm the functionality of CORALINA generated gRNAs. Conclusions: The simplicity and cost-effectiveness make CORALINA suitable for any experimental system. The unprecedented sequence complexities obtainable with CORALINA libraries are a necessary pre-requisite for less biased large scale genomic and epigenomic screens. AU - Köferle, A.* AU - Worf, K. AU - Breunig, C. AU - Baumann, V. AU - Herrero, J.* AU - Wiesbeck, M. AU - Hutter, L.H.* AU - Götz, M. AU - Fuchs, C. AU - Beck, S.* AU - Stricker, S.H. C1 - 49962 C2 - 41942 CY - London TI - CORALINA: A universal method for the generation of gRNA libraries for CRISPR-based screening. JO - BMC Genomics VL - 17 IS - 1 PB - Biomed Central Ltd PY - 2016 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: The trichothecene mycotoxins deoxynivalenol (DON) and trichothecin (TTC) are inhibitors of eukaryotic protein synthesis. Their effect on cellular homeostasis is poorly understood. We report a systematic functional investigation of the effect of DON and TTC on the yeast Saccharomyces cerevisiae using genetic array, network and microarray analysis. To focus the genetic analysis on intracellular consequences of toxin action we eliminated the PDR5 gene coding for a potent pleiotropic drug efflux protein potentially confounding results. We therefore used a knockout library with a pdr5Δ strain background. RESULTS: DON or TTC treatment creates a fitness bottleneck connected to ribosome efficiency. Genes isolated by systematic genetic array analysis as contributing to toxin resistance function in ribosome quality control, translation fidelity, and in transcription. Mutants in the E3 ligase Hel2, involved in ribosome quality control, and several members of the Rpd3 histone deacetylase complex were highly sensitive to DON. DON and TTC have similar genetic profiles despite their different toxic potency. Network analysis shows a coherent and tight network of genetic interactions among the DON and TTC resistance conferring gene products. The networks exhibited topological properties commonly associated with efficient processing of information. Many sensitive mutants have a "slow growth" gene expression signature. DON-exposed yeast cells increase transcripts of ribosomal protein and histone genes indicating an internal signal for growth enhancement. CONCLUSIONS: The combination of gene expression profiling and analysis of mutants reveals cellular pathways which become bottlenecks under DON and TTC stress. These are generally directly or indirectly connected to ribosome biosynthesis such as the general secretory pathway, cytoskeleton, cell cycle delay, ribosome synthesis and translation quality control. Gene expression profiling points to an increased demand of ribosomal components and does not reveal activation of stress pathways. Our analysis highlights ribosome quality control and a contribution of a histone deacetylase complex as main sources of resistance against DON and TTC. AU - Kugler, K.G. AU - Jandric, Z.* AU - Beyer, R.* AU - Klopf, E.* AU - Glaser, W.* AU - Lemmens, M.* AU - Shams, M.* AU - Mayer, K.F.X. AU - Adam, G.* AU - Schüller, C.* C1 - 48720 C2 - 41294 CY - London TI - Ribosome quality control is a central protection mechanism for yeast exposed to deoxynivalenol and trichothecin. JO - BMC Genomics VL - 17 IS - 1 PB - Biomed Central Ltd PY - 2016 SN - 1471-2164 ER - TY - JOUR AB - Background: The Rhynchosporium species complex consists of hemibiotrophic fungal pathogens specialized to different sweet grass species including the cereal crops barley and rye. A sexual stage has not been described, but several lines of evidence suggest the occurrence of sexual reproduction. Therefore, a comparative genomics approach was carried out to disclose the evolutionary relationship of the species and to identify genes demonstrating the potential for a sexual cycle. Furthermore, due to the evolutionary very young age of the five species currently known, this genus appears to be well-suited to address the question at the molecular level of how pathogenic fungi adapt to their hosts. Results: The genomes of the different Rhynchosporium species were sequenced, assembled and annotated using ab initio gene predictors trained on several fungal genomes as well as on Rhynchosporium expressed sequence tags. Structures of the rDNA regions and genome-wide single nucleotide polymorphisms provided a hypothesis for intra-genus evolution. Homology screening detected core meiotic genes along with most genes crucial for sexual recombination in ascomycete fungi. In addition, a large number of cell wall-degrading enzymes that is characteristic for hemibiotrophic and necrotrophic fungi infecting monocotyledonous hosts were found. Furthermore, the Rhynchosporium genomes carry a repertoire of genes coding for polyketide synthases and non-ribosomal peptide synthetases. Several of these genes are missing from the genome of the closest sequenced relative, the poplar pathogen Marssonina brunnea, and are possibly involved in adaptation to the grass hosts. Most importantly, six species-specific genes coding for protein effectors were identified in R. commune. Their deletion yielded mutants that grew more vigorously in planta than the wild type. Conclusion: Both cryptic sexuality and secondary metabolites may have contributed to host adaptation. Most importantly, however, the growth-retarding activity of the species-specific effectors suggests that host adaptation of R. commune aims at extending the biotrophic stage at the expense of the necrotrophic stage of pathogenesis. Like other apoplastic fungi Rhynchosporium colonizes the intercellular matrix of host leaves relatively slowly without causing symptoms, reminiscent of the development of endophytic fungi. Rhynchosporium may therefore become an object for studying the mutualism-parasitism transition. AU - Penselin, D.* AU - Münsterkötter, M. AU - Kirsten, S.* AU - Felder, M.* AU - Taudien, S.* AU - Platzer, M.* AU - Ashelford, K.* AU - Paskiewicz, K.H.* AU - Harrison, R.J.* AU - Hughes, D.J.* AU - Wolf, T.* AU - Shelest, E.* AU - Graap, J.* AU - Hoffmann, J.C.* AU - Wenzel, C.* AU - Wöltje, N.* AU - King, K.M.* AU - Fitt, B.D.L.* AU - Güldener, U.* AU - Avrova, A.* AU - Knogge, W.* C1 - 50169 C2 - 42229 CY - London TI - Comparative genomics to explore phylogenetic relationship, cryptic sexual potential and host specificity of Rhynchosporium species on grasses. JO - BMC Genomics VL - 17 PB - Biomed Central Ltd PY - 2016 SN - 1471-2164 ER - TY - JOUR AB - Background: Whereas an increasing number of pathogenic and mutualistic ascomycetous species were sequenced in the past decade, species showing a seemingly neutral association such as root endophytes received less attention. In the present study, the genome of Phialocephala subalpina, the most frequent species of the Phialocephala fortinii s.l. - Acephala applanata species complex, was sequenced for insight in the genome structure and gene inventory of these wide-spread root endophytes. Results: The genome of P. subalpina was sequenced using Roche/454 GS FLX technology and a whole genome shotgun strategy. The assembly resulted in 205 scaffolds and a genome size of 69.7 Mb. The expanded genome size in P. subalpina was not due to the proliferation of transposable elements or other repeats, as is the case with other ascomycetous genomes. Instead, P. subalpina revealed an expanded gene inventory that includes 20,173 gene models. Comparative genome analysis of P. subalpina with 13 ascomycetes shows that P. subalpina uses a versatile gene inventory including genes specific for pathogens and saprophytes. Moreover, the gene inventory for carbohydrate active enzymes (CAZymes) was expanded including genes involved in degradation of biopolymers, such as pectin, hemicellulose, cellulose and lignin. Conclusions: The analysis of a globally distributed root endophyte allowed detailed insights in the gene inventory and genome organization of a yet largely neglected group of organisms. We showed that the ubiquitous root endophyte P. subalpina has a broad gene inventory that links pathogenic and saprophytic lifestyles. AU - Schlegel, M.* AU - Münsterkötter, M. AU - Güldener, U. AU - Bruggmann, R.* AU - Duò, A.* AU - Hainaut, M.* AU - Henrissat, B.* AU - Sieber, C.M.K. AU - Hoffmeister, D.* AU - Grünig, C.R.* C1 - 50151 C2 - 42217 CY - London TI - Globally distributed root endophyte Phialocephala subalpina links pathogenic and saprophytic lifestyles. JO - BMC Genomics VL - 17 PB - Biomed Central Ltd PY - 2016 SN - 1471-2164 ER - TY - JOUR AB - Background: The motif ACTAYRNNNCCCR (Y being C or T, R being A or G, and N any nucleotide), called M4, was discovered as a putative cis-regulatory element, present 520 times in human promoter regions. Of these, 317 (61 %) are conserved within promoter sequences of four related organisms: human, mouse, rat, and dog. Recent genome-wide studies have described M4 as a transcription factor (TF) binding site for THAP11 that does often overlap with SBS (STAF Binding Site) a second core-promoter associated TF binding module, which associates with the TFs STAF/ZNF143 and RBP-J. Human M4-promoter genes show enhanced expression in cells of hematopoietic origin, especially in B lymphoblasts and peripheral blood B and T cells. Apart from RBP-J that is well known to recruit ICN1 (the intracellular transcriptional mediator of activated Notch1), the functional role of the hyperconserved M4 cis-element in the context of transcriptional regulation of M4-genes in lymphoid cells remains poorly defined. Results: Here, we present a quantitative proteomic investigation of the M4 motif TF binding landscape in lymphoid cell lines that is further validated by ChIP experiments and functional assays. Our data strongly suggest that THAP11 and Ikaros interact directly, while NFKB1 (NF-kappa B p50) and HCF-1 are binding indirectly to M4-promoters in vitro and in living cells. Further analysis reveals that M4 is a bipartite composite cis-element, which is recognized by THAP11 via binding to the ACTAYR sequence module, thereby promoting ternary complex formation with HCF-1. Similarly, Ikaros binds to the CCCR module of the M4 motif and this interaction is crucial for recruiting NFKB1 to M4 harboring genes. Transient reporter assays in HEK293 and loss-of-function experiments in Molt4 T cells unequivocally demonstrate that binding of Ikaros and/or THAP11 to M4 bearing promoters is functionally important and therefore biologically relevant. Accordingly, this study validates our SILAC-based DNA protein interaction screening methodology as a valuable surrogate for a bona fide reverse ChIP technology. Conclusions: The M4 motif (ACTAYRNNNCCCR) is a functional regulatory bipartite cis-element, which engages a THAP11/HCF-1 complex via binding to the ACTAYR module, while the CCCRRNRNRC subsequence part constitutes a binding platform for Ikaros and NFKB1. AU - Trung, N.T.* AU - Kremmer, E. AU - Mittler, G.* C1 - 49446 C2 - 30863 CY - London TI - Biochemical and cellular characterization of transcription factors binding to the hyperconserved core promoter-associated M4 motif. JO - BMC Genomics VL - 17 PB - Biomed Central Ltd PY - 2016 SN - 1471-2164 ER - TY - JOUR AB - Background: The Lolium-Festuca complex incorporates species from the Lolium genera and the broad leaf fescues, both belonging to the subfamily Pooideae. This subfamily also includes wheat, barley, oat and rye, making it extremely important to world agriculture. Species within the Lolium-Festuca complex show very diverse phenotypes, and many of them are related to agronomically important traits. Analysis of sequenced transcriptomes of these non-model species may shed light on the molecular mechanisms underlying this phenotypic diversity. Results: We have generated de novo transcriptome assemblies for four species from the Lolium-Festuca complex, ranging from 52,166 to 72,133 transcripts per assembly. We have also predicted a set of proteins and validated it with a high-confidence protein database from three closely related species (H. vulgare, B. distachyon and O. sativa). We have obtained gene family clusters for the four species using OrthoMCL and analyzed their inferred phylogenetic relationships. Our results indicate that VRN2 is a candidate gene for differentiating vernalization and non-vernalization types in the Lolium-Festuca complex. Grouping of the gene families based on their BLAST identity enabled us to divide ortholog groups into those that are very conserved and those that are more evolutionarily relaxed. The ratio of the non-synonumous to synonymous substitutions enabled us to pinpoint protein sequences evolving in response to positive selection. These proteins may explain some of the differences between the more stress tolerant Festuca, and the less stress tolerant Lolium species. Conclusions: Our data presents a comprehensive transcriptome sequence comparison between species from the Lolium-Festuca complex, with the identification of potential candidate genes underlying some important phenotypical differences within the complex (such as VRN2). The orthologous genes between the species have a very high %id (91,61%) and the majority of gene families were shared for all of them. It is likely that the knowledge of the genomes will be largely transferable between species within the complex. AU - Czaban, A.* AU - Sharma, S. AU - Byrne, S.L.* AU - Spannagl, M. AU - Mayer, K.F.X. AU - Asp, T.* C1 - 44389 C2 - 36935 CY - London TI - Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation. JO - BMC Genomics VL - 16 IS - 1 PB - Biomed Central Ltd PY - 2015 SN - 1471-2164 ER - TY - JOUR AB - Background: Exome sequencing has become a popular method to evaluate undirected mutagenesis experiments in mice. However, the most suitable mouse strain for the biological model may be relatively distant from the standard mouse reference genome. For pinpointing causative variants, a matching reference with gene annotations is essential, but not always readily available. Results: We present an approach that allows to use murine Ensembl annotations on alternative mouse strain assemblies. We resolved ENU-induced mutation screening for 8 phenotypic mutant lines generated on C3HeB/FeJ background aligning the sequences against the closely related, but not annotated reference of C3H/HeJ. Variants occurring in all strains were filtered out as specific for the C3HeB/FeJ strain but unrelated to mutagenesis. Variants occurring exclusively in all individuals of one mutant line and matching the inheritance model were selected as mutagenesis-related. These variants were annotated with gene and exon names lifted over from the standard murine reference mm9 to C3H/HeJ using megablast. For each mutant line, we could restrict the results to exonic variants in between 1 and 23 genes. Conclusions: The presented method of exonic annotation lift-over proved to be a valuable tool in the search for mutagenesis-derived coding genomic variants and the assessment of genotype-phenotype relationships. AU - Derdak, S.* AU - Sabrautzki, S. AU - Hrabě de Angelis, M. AU - Gut, M.O.* AU - Gut, I.G.* AU - Beltran, S.* C1 - 44818 C2 - 36995 CY - London TI - Genomic characterization of mutant laboratory mouse strains by exome sequencing and annotation lift-over. JO - BMC Genomics VL - 16 PB - Biomed Central Ltd PY - 2015 SN - 1471-2164 ER - TY - JOUR AB - Background: Next Generation Sequencing has proven to be an exceptionally powerful tool in the field of genomics and transcriptomics. With recent development it is nowadays possible to analyze ultra-low input sample material down to single cells. Nevertheless, investigating such sample material often limits the analysis to either the genome or transcriptome. We describe here a combined analysis of both types of nucleic acids from the same sample material. Methods: The method described enables the combined preparation of amplified cDNA as well as amplified whole-genome DNA from an ultra-low input sample material derived from a sub-colony of in-vitro cultivated human embryonic stem cells. cDNA is prepared by the application of oligo-dT coupled magnetic beads for mRNA capture, first strand synthesis and 3'-tailing followed by PCR. Whole-genome amplified DNA is prepared by Phi29 mediated amplification. Illumina sequencing is applied to short fragment libraries prepared from the amplified samples. Results: We developed a protocol which enables the combined analysis of the genome as well as the transcriptome by Next Generation Sequencing from ultra-low input samples. The protocol was evaluated by sequencing sub-colony structures from human embryonic stem cells containing 150 to 200 cells. The method can be adapted to any available sequencing system. Conclusions: To our knowledge, this is the first report where sub-colonies of human embryonic stem cells have been analyzed both at the genomic as well as transcriptome level. The method of this proof of concept study may find useful practical applications for cases where only a limited number of cells are available, e.g. for tissues samples from biopsies, tumor spheres, circulating tumor cells and cells from early embryonic development. The results we present demonstrate that a combined analysis of genomic DNA and messenger RNA from ultra-low input samples is feasible and can readily be applied to other cellular systems with limited material available. AU - Mertes, F. AU - Lichtner, B.* AU - Kuhl, H.* AU - Blattner, M.* AU - Otte, J.* AU - Wruck, W.* AU - Timmermann, B.* AU - Lehrach, H.* AU - Adjaye, J.* C1 - 47347 C2 - 40515 TI - Combined ultra-low input mRNA and whole-genome sequencing of human embryonic stem cells. JO - BMC Genomics VL - 16 PY - 2015 SN - 1471-2164 ER - TY - JOUR AB - Background: The technical progress in the last decade has made it possible to sequence millions of DNA reads in a relatively short time frame. Several variant callers based on different algorithms have emerged and have made it possible to extract single nucleotide polymorphisms (SNPs) out of the whole-genome sequence. Often, only a few individuals of a population are sequenced completely and imputation is used to obtain genotypes for all sequence-based SNP loci for other individuals, which have been genotyped for a subset of SNPs using a genotyping array. Methods: First, we compared the sets of variants detected with different variant callers, namely GATK, freebayes and SAMtools, and checked the quality of genotypes of the called variants in a set of 50 fully sequenced white and brown layers. Second, we assessed the imputation accuracy (measured as the correlation between imputed and true genotype per SNP and per individual, and genotype conflict between father-progeny pairs) when imputing from high density SNP array data to whole-genome sequence using data from around 1000 individuals from six different generations. Three different imputation programs (Minimac, FImpute and IMPUTE2) were checked in different validation scenarios. Results: There were 1,741,573 SNPs detected by all three callers on the studied chromosomes 3, 6, and 28, which was 71.6 % (81.6 %, 88.0 %) of SNPs detected by GATK (SAMtools, freebayes) in total. Genotype concordance (GC) defined as the proportion of individuals whose array-derived genotypes are the same as the sequence-derived genotypes over all non-missing SNPs on the array were 0.98 (GATK), 0.97 (freebayes) and 0.98 (SAMtools). Furthermore, the percentage of variants that had high values (>0.9) for another three measures (non-reference sensitivity, non-reference genotype concordance and precision) were 90 (88, 75) for GATK (SAMtools, freebayes). With all imputation programs, correlation between original and imputed genotypes was >0.95 on average with randomly masked 1000 SNPs from the SNP array and >0.85 for a leave-one-out cross-validation within sequenced individuals. Conclusions: Performance of all variant callers studied was very good in general, particularly for GATK and SAMtools. FImpute performed slightly worse than Minimac and IMPUTE2 in terms of genotype correlation, especially for SNPs with low minor allele frequency, while it had lowest numbers in Mendelian conflicts in available father-progeny pairs. Correlations of real and imputed genotypes remained constantly high even if individuals to be imputed were several generations away from the sequenced individuals. AU - Ni, G.* AU - Strom, T.M. AU - Pausch, H.* AU - Reimer, C.* AU - Preisinger, R.* AU - Simianer, H.* AU - Erbe, M.* C1 - 47183 C2 - 40518 TI - Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken. JO - BMC Genomics VL - 16 PY - 2015 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Head and neck squamous cell carcinoma (HNSCC) is a very heterogeneous disease resulting in huge differences in the treatment response. New individualized therapy strategies including molecular targeting might help to improve treatment success. In order to identify potential targets, we developed a HNSCC radiochemotherapy cell culture model of primary HNSCC cells derived from two different patients (HN1957 and HN2092) and applied an integrative microRNA (miRNA) and mRNA analysis in order to gain information on the biological networks and processes of the cellular therapy response. We further identified potential target genes of four therapy-responsive miRNAs detected previously in the circulation of HNSCC patients by pathway enrichment analysis. RESULTS: The two primary cell cultures differ in global copy number alterations and P53 mutational status, thus reflecting heterogeneity of HNSCC. However, they also share many copy number alterations and chromosomal rearrangements as well as deregulated therapy-responsive miRNAs and mRNAs. Accordingly, six common therapy-responsive pathways (direct P53 effectors, apoptotic execution phase, DNA damage/telomere stress induced senescence, cholesterol biosynthesis, unfolded protein response, dissolution of fibrin clot) were identified in both cell cultures based on deregulated mRNAs. However, inflammatory pathways represented an important part of the treatment response only in HN1957, pointing to differences in the treatment responses of the two primary cultures. Focused analysis of target genes of four therapy-responsive circulating miRNAs, identified in a previous study on HNSCC patients, revealed a major impact on the pathways direct P53 effectors, the E2F transcription factor network and pathways in cancer (mainly represented by the PTEN/AKT signaling pathway). CONCLUSIONS: The integrative analysis combining miRNA expression, mRNA expression and the related cellular pathways revealed that the majority of radiochemotherapy-responsive pathways in primary HNSCC cells are related to cell cycle, proliferation, cell death and stress response (including inflammation). Despite the heterogeneity of HNSCC, the two primary cell cultures exhibited strong similarities in the treatment response. The findings of our study suggest potential therapeutic targets in the E2F transcription factor network and the PTEN/AKT signaling pathway. AU - Summerer, I. AU - Hess J. AU - Pitea, A. AU - Unger, K. AU - Hieber, L. AU - Selmansberger, M. AU - Lauber, K. AU - Zitzelsberger, H. C1 - 46727 C2 - 37765 CY - London TI - Integrative analysis of the microRNA-mRNA response to radiochemotherapy in primary head and neck squamous cell carcinoma cells. JO - BMC Genomics VL - 16 IS - 1 PB - Biomed Central Ltd PY - 2015 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Chlamydia pneumoniae (Cpn) are obligate intracellular bacteria that cause acute infections of the upper and lower respiratory tract and have been implicated in chronic inflammatory diseases. Although of significant clinical relevance, complete genome sequences of only four clinical Cpn strains have been obtained. All of them were isolated from the respiratory tract and shared more than 99% sequence identity. Here we investigate genetic differences on the whole-genome level that are related to Cpn tissue tropism and pathogenicity. RESULTS: We have sequenced the genomes of 18 clinical isolates from different anatomical sites (e.g. lung, blood, coronary arteries) of diseased patients, and one animal isolate. In total 1,363 SNP loci and 184 InDels have been identified in the genomes of all clinical Cpn isolates. These are distributed throughout the whole chlamydial genome and enriched in highly variable regions. The genomes show clear evidence of recombination in at least one potential region but no phage insertions. The tyrP gene was always encoded as single copy in all vascular isolates. Phylogenetic reconstruction revealed distinct evolutionary lineages containing primarily non-respiratory Cpn isolates. In one of these, clinical isolates from coronary arteries and blood monocytes were closely grouped together. They could be distinguished from all other isolates by characteristic nsSNPs in genes involved in RB to EB transition, inclusion membrane formation, bacterial stress response and metabolism. CONCLUSIONS: This study substantially expands the genomic data of Cpn and elucidates its evolutionary history. The translation of the observed Cpn genetic differences into biological functions and the prediction of novel pathogen-oriented diagnostic strategies have to be further explored. AU - Weinmaier, T.* AU - Hoser, J.D.S.* AU - Eck, S.* AU - Kaufhold, I.* AU - Shima, K.* AU - Strom, T.M. AU - Rattei, T.* AU - Rupp, J.* C1 - 44437 C2 - 36834 TI - Genomic factors related to tissue tropism in Chlamydia pneumoniae infection. JO - BMC Genomics VL - 16 IS - 1 PY - 2015 SN - 1471-2164 ER - TY - JOUR AB - Understanding the links between genetic, epigenetic and non-genetic factors throughout the lifespan and across generations and their role in disease susceptibility and disease progression offer entirely new avenues and solutions to major problems in our society. To overcome the numerous challenges, we have come up with nine major conclusions to set the vision for future policies and research agendas at the European level. AU - Almouzni, G.* AU - Altucci, L.* AU - Amati, B.* AU - Ashley, N.* AU - Baulcombe, D.* AU - Beaujean, N.* AU - Bock, C.* AU - Bongcam-Rudloff, E.* AU - Bousquet, J.* AU - Braun, S.* AU - de Paillerets, B.B.* AU - Bussemakers, M.* AU - Clarke, L.* AU - Conesa, A.* AU - Estivill, X.* AU - Fazeli, A.* AU - Grgurevi, N.A.* AU - Gut, I.* AU - Heijmans, B.T.* AU - Hermouet, S.* AU - Houwing Duistermaat, J.* AU - Iacobucci, I.* AU - Ila, J.* AU - Kandimalla, R.* AU - Krauss-Etschmann, S. AU - Lasko, P.* AU - Lehmann, S.* AU - Lindroth, A.* AU - Majdi, G.* AU - Marcotte, E.* AU - Martinelli, G.* AU - Martinet, N.* AU - Meyer, E.* AU - Miceli, C.* AU - Mills, K.* AU - Moreno-Villanueva, M.* AU - Morvan, G.* AU - Nickel, D.* AU - Niesler, B.* AU - Nowacki, M.* AU - Nowak, J.* AU - Ossowski, S.* AU - Pelizzola, M.* AU - Pochet, R.* AU - Poto Nik, U.* AU - Radwanska, M.* AU - Raes, J.* AU - Rattray, M.* AU - Robinson, M.D.* AU - Roelen, B.* AU - Sauer, S.* AU - Schinzer, D.* AU - Slagboom, E.* AU - Spector, T.* AU - Stunnenberg, H.G.* AU - Tiligada, E.* AU - Torres-Padilla, M.E.* AU - Tsonaka, R.* AU - van Soom, A.* AU - Vidakovi, M.* AU - Widschwendter, M.* C1 - 31675 C2 - 34643 CY - London TI - Relationship between genome and epigenome - challenges and requirements for future research. JO - BMC Genomics VL - 15 IS - 1 PB - Biomed Central Ltd PY - 2014 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied. RESULTS: We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case-control status - effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case-control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches. CONCLUSIONS: To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case-control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway. AU - Backes, C.* AU - Rühle, F.* AU - Stoll, M.* AU - Haas, J.* AU - Frese, K.* AU - Franke, A.* AU - Lieb, W.* AU - Wichmann, H.-E. AU - Weis, T.* AU - Kloos, W.* AU - Lenhof, H.P.* AU - Meese, E.* AU - Katus, H.A.* AU - Meder, B.* AU - Keller, A.* C1 - 32359 C2 - 34996 CY - London TI - Systematic permutation testing in GWAS pathway analyses: Identification of genetic networks in dilated cardiomyopathy and ulcerative colitis. JO - BMC Genomics VL - 15 PB - Biomed Central Ltd PY - 2014 SN - 1471-2164 ER - TY - JOUR AB - Background: The date palm is one of the oldest cultivated fruit trees. It is critical in many ways to cultures in arid lands by providing highly nutritious fruit while surviving extreme heat and environmental conditions. Despite its importance from antiquity, few genetic resources are available for improving the productivity and development of the dioecious date palm. To date there has been no genetic map and no sex chromosome has been identified.Results: Here we present the first genetic map for date palm and identify the putative date palm sex chromosome. We placed ~4000 markers on the map using nearly 1200 framework markers spanning a total of 1293 cM. We have integrated the genetic map, derived from the Khalas cultivar, with the draft genome and placed up to 19% of the draft genome sequence scaffolds onto linkage groups for the first time. This analysis revealed approximately ~1.9 cM/Mb on the map. Comparison of the date palm linkage groups revealed significant long-range synteny to oil palm. Analysis of the date palm sex-determination region suggests it is telomeric on linkage group 12 and recombination is not suppressed in the full chromosome.Conclusions: Based on a modified gentoyping-by-sequencing approach we have overcome challenges due to lack of genetic resources and provide the first genetic map for date palm. Combined with the recent draft genome sequence of the same cultivar, this resource offers a critical new tool for date palm biotechnology, palm comparative genomics and a better understanding of sex chromosome development in the palms. AU - Mathew, L. S.* AU - Spannagl, M. AU - Al-Malki, A.* AU - George, B.* AU - Torres, M.F.* AU - Al-Dous, E.K.* AU - Al-Azwani, E. K.* AU - Hussein, E.* AU - Mathew, S.* AU - Mayer, K.F.X. AU - Mohamoud, Y.A.* AU - Suhre, K. AU - Malek, J.A.* C1 - 31239 C2 - 34311 CY - London TI - A first genetic map of date palm (Phoenix dactylifera) reveals long-range genome structure conservation in the palms. JO - BMC Genomics VL - 15 IS - 1 PB - Biomed Central Ltd PY - 2014 SN - 1471-2164 ER - TY - JOUR AB - Background: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011.Results: Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an " unsupported" status and 4% are absent from the Mt4.0 predictions.Conclusions: Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www.jcvi.org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics. AU - Tang, H.* AU - Krishnakumar, V.* AU - Bidwell, S.* AU - Rosen, B.* AU - Chan, A.* AU - Zhou, S.* AU - Gentzbittel, L.* AU - Childs, K.L.* AU - Yandell, M.D.* AU - Gundlach, H. AU - Mayer, K.F.X. AU - Schwartz, D.C.* AU - Town, C.D.* C1 - 31357 C2 - 34586 CY - London TI - An improved genome release (version Mt4.0) for the model legume Medicago truncatula. JO - BMC Genomics VL - 15 PB - Biomed Central Ltd PY - 2014 SN - 1471-2164 ER - TY - JOUR AB - Background High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far. Results We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix(R) Axiom(R) Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina(R) MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel. Conclusions The high density Affymetrix(R) Axiom(R) Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species. AU - Unterseer, S.* AU - Bauer, E.* AU - Haberer, G. AU - Seidel, M. AU - Knaak, C.* AU - Ouzunova, M.* AU - Meitinger, T. AU - Strom, T.M. AU - Fries, R.* AU - Pausch, H.* AU - Bertani, C.* AU - Davassi, A.* AU - Mayer, K.F.X. AU - Schön, C.C.* C1 - 32476 C2 - 35039 CY - London TI - A powerful tool for genome analysis in maize: Development and evaluation of the high density 600k SNP genotyping array. JO - BMC Genomics VL - 15 IS - 1 PB - Biomed Central Ltd PY - 2014 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) that associate with clinical phenotypes, but these SNPs usually explain just a small part of the heritability and have relatively modest effect sizes. In contrast, SNPs that associate with metabolite levels generally explain a higher percentage of the genetic variation and demonstrate larger effect sizes. Still, the discovery of SNPs associated with metabolite levels is challenging since testing all metabolites measured in typical metabolomics studies with all SNPs comes with a severe multiple testing penalty. We have developed an automated workflow approach that utilizes prior knowledge of biochemical pathways present in databases like KEGG and BioCyc to generate a smaller SNP set relevant to the metabolite. This paper explores the opportunities and challenges in the analysis of GWAS of metabolomic phenotypes and provides novel insights into the genetic basis of metabolic variation through the re-analysis of published GWAS datasets. RESULTS: Re-analysis of the published GWAS dataset from Illig et al. (Nature Genetics, 2010) using a pathway-based workflow (http://www.myexperiment.org/packs/319.html), confirmed previously identified hits and identified a new locus of human metabolic individuality, associating Aldehyde dehydrogenase family1 L1 (ALDH1L1) with serine/glycine ratios in blood. Replication in an independent GWAS dataset of phospholipids (Demirkan et al., PLoS Genetics, 2012) identified two novel loci supported by additional literature evidence: GPAM (Glycerol-3 phosphate acyltransferase) and CBS (Cystathionine beta-synthase). In addition, the workflow approach provided novel insight into the affected pathways and relevance of some of these gene-metabolite pairs in disease development and progression. CONCLUSIONS: We demonstrate the utility of automated exploitation of background knowledge present in pathway databases for the analysis of GWAS datasets of metabolomic phenotypes. We report novel loci and potential biochemical mechanisms that contribute to our understanding of the genetic basis of metabolic variation and its relationship to disease development and progression.   AU - Dharuri, H.* AU - Henneman, P.* AU - Demirkan, A.* AU - Mook-Kanamori, D.O.* AU - Wang-Sattler, R. AU - Gieger, C. AU - Adamski, J. AU - Hettne, K.* AU - Roos, M.* AU - Suhre, K. AU - van Duijn, C.M.* AU - van Dijk, K.W.* AU - 't Hoen, P.A.* C1 - 28630 C2 - 33502 CY - London TI - Automated wokflow-based exploitation of pathway databases provides new insights into genetic associations of metabolite profiles. JO - BMC Genomics VL - 14 IS - 1 PB - Biomed Central PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - Background: Genome- and population-wide re-sequencing would allow for most efficient detection of causal trait variants. However, despite a strong decrease of costs for next-generation sequencing in the last few years, re-sequencing of large numbers of individuals is not yet affordable. We therefore resorted to re-sequencing of a limited number of bovine animals selected to explain a major proportion of the population's genomic variation, so called key animals, in order to provide a catalogue of functional variants and a substrate for population-and genome-wide imputation of variable sites. Results: Forty-three animals accounting for about 69 percent of the genetic diversity of the Fleckvieh population, a cattle breed of Southern Germany and Austria, were sequenced with coverages ranging from 4.17 to 24.98 and averaging 7.46. After alignment to the reference genome (UMD3.1) and multi-sample variant calling, more than 17 million variant positions were identified, about 90 percent biallelic single nucleotide variants (SNVs) and 10 percent short insertions and deletions (InDels). The comparison with high-density chip data revealed a sensitivity of at least 92 percent and a specificity of 81 percent for sequencing based genotyping, and 97 percent and 93 percent when a imputation step was included. There are 91,733 variants in coding regions of 18,444 genes, 46 percent being non-synonymous exchanges, of which 575 variants are predicted to cause premature stop codons. Three variants are listed in the OMIA database as causal for specific phenotypes. Conclusions: Low- to medium-coverage re-sequencing of individuals explaining a major fraction of a population's genomic variation allows for the efficient and reliable detection of most variants. Imputation strongly improves genotype quality of lowly covered samples and thus enables maximum density genotyping by sequencing. The functional annotation of variants provides the basis for exhaustive genotype imputation in the population, e.g., for highest-resolution genome-wide association studies. AU - Jansen, S.* AU - Aigner, B.* AU - Pausch, H.* AU - Wysocki, M.* AU - Eck, S. AU - Benet-Pagès, A. AU - Graf, E. AU - Wieland, T. AU - Strom, T.M. AU - Meitinger, T. AU - Fries, R.* C1 - 26555 C2 - 32286 TI - Assessment of the genomic variation in a cattle population by re-sequencing of key animals at low to medium coverage. JO - BMC Genomics VL - 14 IS - 1 PB - Biomed Central PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - Background The interaction between insect pests and their host plants is a never-ending race of evolutionary adaption. Plants have developed an armament against insect herbivore attacks, and attackers continuously learn how to address it. Using a combined transcriptomic and metabolomic approach, we investigated the molecular and biochemical differences between Quercus robur L. trees that resisted (defined as resistant oak type) or were susceptible (defined as susceptible oak type) to infestation by the major oak pest, Tortrix viridana L. Results Next generation RNA sequencing revealed hundreds of genes that exhibited constitutive and/or inducible differential expression in the resistant oak compared to the susceptible oak. Distinct differences were found in the transcript levels and the metabolic content with regard to tannins, flavonoids, and terpenoids, which are compounds involved in the defence against insect pests. The results of our transcriptomic and metabolomic analyses are in agreement with those of a previous study in which we showed that female moths prefer susceptible oaks due to their specific profile of herbivore-induced volatiles. These data therefore define two oak genotypes that clearly differ on the transcriptomic and metabolomic levels, as reflected by their specific defensive compound profiles. Conclusions We conclude that the resistant oak type seem to prefer a strategy of constitutive defence responses in contrast to more induced defence responses of the susceptible oaks triggered by feeding. These results pave the way for the development of biomarkers for an early determination of potentially green oak leaf roller-resistant genotypes in natural pedunculate oak populations in Europe. AU - Kersten, B.* AU - Ghirardo, A. AU - Schnitzler, J.-P. AU - Kanawati, B. AU - Schmitt-Kopplin, P. AU - Fladung, M.* AU - Schroeder, H.* C1 - 27978 C2 - 32887 TI - Integrated transcriptomics and metabolomics decipher differences in the resistance of pedunculate oak to the herbivore Tortrix viridana L. JO - BMC Genomics VL - 14 IS - 1 PB - Biomed Central PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - Background: High density (HD) SNP genotyping arrays are an important tool for genetic analyses of animals and plants. Although the chicken is one of the most important farm animals, no HD array is yet available for high resolution genetic analysis of this species. Results: We report here the development of a 600 K Affymetrix(R) Axiom(R) HD genotyping array designed using SNPs segregating in a wide variety of chicken populations. In order to generate a large catalogue of segregating SNPs, we re-sequenced 243 chickens from 24 chicken lines derived from diverse sources (experimental, commercial broiler and layer lines) by pooling 10-15 samples within each line. About 139 million (M) putative SNPs were detected by mapping sequence reads to the new reference genome (Gallus_gallus_4.0) of which similar to 78 M appeared to be segregating in different lines. Using criteria such as high SNP-quality score, acceptable design scores predicting high conversion performance in the final array and uniformity of distribution across the genome, we selected similar to 1.8 M SNPs for validation through genotyping on an independent set of samples (n = 282). About 64% of the SNPs were polymorphic with high call rates (>98%), good cluster separation and stable Mendelian inheritance. Polymorphic SNPs were further analysed for their population characteristics and genomic effects. SNPs with extreme breach of Hardy-Weinberg equilibrium (P < 0.00001) were excluded from the panel. The final array, designed on the basis of these analyses, consists of 580,954 SNPs and includes 21,534 coding variants. SNPs were selected to achieve an essentially uniform distribution based on genetic map distance for both broiler and layer lines. Due to a lower extent of LD in broilers compared to layers, as reported in previous studies, the ratio of broiler and layer SNPs in the array was kept as 3:2. The final panel was shown to genotype a wide range of samples including broilers and layers with over 100 K to 450 K informative SNPs per line. A principal component analysis was used to demonstrate the ability of the array to detect the expected population structure which is an important pre-investigation step for many genome-wide analyses. Conclusions: This Affymetrix(R) Axiom(R) array is the first SNP genotyping array for chicken that has been made commercially available to the public as a product. This array is expected to find widespread usage both in research and commercial application such as in genomic selection, genome-wide association studies, selection signature analyses, fine mapping of QTLs and detection of copy number variants. AU - Kranis, A.* AU - Gheyas, A.A.* AU - Boschiero, C.* AU - Turner, F.* AU - Yu, L.* AU - Smith, S.* AU - Talbot, R.* AU - Pirani, A.* AU - Brew, F.* AU - Kaiser, P.* AU - Hocking, P.M.* AU - Fife, M.* AU - Salmon, N.* AU - Fulton, J.* AU - Strom, T.M. AU - Haberer, G. AU - Weigend, S.* AU - Preisinger, R.* AU - Gholami, M.* AU - Qanbari, S.* AU - Simianer, H.* AU - Watson, K.A.* AU - Woolliams, J.A.* AU - Burt, D.W.* C1 - 23658 C2 - 33653 TI - Development of a high density 600K SNP genotyping array for chicken. JO - BMC Genomics VL - 14 IS - 1 PB - Biomed Central PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Fusarium head blight (FHB) caused by Fusarium graminearum Schwabe is one of the most prevalent diseases of wheat (Triticum aestivum L.) and other small grain cereals. Resistance against the fungus is quantitative and more than 100 quantitative trait loci (QTL) have been described. Two well-validated and highly reproducible QTL, Fhb1 and Qfhs.ifa-5A have been widely investigated, but to date the underlying genes have not been identified. RESULTS: We have investigated a gene co-expression network activated in response to F. graminearum using RNA-seq data from near-isogenic lines, harboring either the resistant or the susceptible allele for Fhb1 and Qfhs.ifa-5A. The network identified pathogen-responsive modules, which were enriched for differentially expressed genes between genotypes or different time points after inoculation with the pathogen. Central gene analysis identified transcripts associated with either QTL within the network. Moreover, we present a detailed gene expression analysis of four gene families (glucanases, NBS-LRR, WRKY transcription factors and UDP-glycosyltransferases), which take prominent roles in the pathogen response. CONCLUSIONS: A combination of a network-driven approach and differential gene expression analysis identified genes and pathways associated with Fhb1 and Qfhs.ifa-5A. We find G-protein coupled receptor kinases and biosynthesis genes for jasmonate and ethylene earlier induced for Fhb1. Similarly, we find genes involved in the biosynthesis and metabolism of riboflavin more abundant for Qfhs.ifa-5A.   AU - Kugler, K.G. AU - Siegwart, G.* AU - Nussbaumer, T. AU - Ametz, C.* AU - Spannagl, M. AU - Steiner, B.* AU - Lemmens, M.* AU - Mayer, K.F.X. AU - Buerstmayr, H.* AU - Schweiger, W.* C1 - 28487 C2 - 33421 TI - Quantitative trait loci-dependent analysis of a gene co-expression network associated with Fusarium head blight resistance in bread wheat (Triticum aestivum L.). JO - BMC Genomics VL - 14 IS - 1 PB - Biomed Central PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - Background: Sunflower belongs to the largest plant family on earth, the genomically poorly explored Compositae. Downy mildew Plasmopara halstedii (Farlow) Berlese & de Toni is one of the major diseases of cultivated sunflower (Helianthus annuus L.). In the search for new sources of downy mildew resistance, the locus Pl(ARG) on linkage group 1 (LG1) originating from H. argophyllus is promising since it confers resistance against all known races of the pathogen. However, the mapping resolution in the Pl(ARG) region is hampered by significantly suppressed recombination and by limited availability of polymorphic markers. Here we examined a strategy developed for the enrichment of molecular markers linked to this specific genomic region. We combined bulked segregant analysis (BSA) with next-generation sequencing (NGS) and de novo assembly of the sunflower transcriptome for single nucleotide polymorphism (SNP) discovery in a sequence resource combining reads originating from two sunflower species, H. annuus and H. argophyllus. Results: A computational pipeline developed for SNP calling and pattern detection identified 219 candidate genes. For a proof of concept, 42 resistance gene-like sequences were subjected to experimental SNP validation. Using a high-resolution mapping population, 12 SNP markers were mapped to LG1. We successfully verified candidate sequences either co-segregating with or closely flanking Pl(ARG). Conclusions: This study is the first successful example to improve bulked segregant analysis with de novo transcriptome assembly using next generation sequencing. The BSTA pipeline we developed provides a useful guide for similar studies in other non-model organisms. Our results demonstrate this method is an efficient way to enrich molecular markers and to identify candidate genes in a specific mapping interval. AU - Livaja, M.* AU - Wang, Y.* AU - Wieckhorst, S.* AU - Haseneyer, G.* AU - Seidel, M. AU - Hahn, V.* AU - Knapp, S.J.* AU - Taudien, S.* AU - Schön, C.C.* AU - Bauer, E.* C1 - 27797 C2 - 32818 TI - BSTA: A targeted approach combines bulked segregant analysis with next-generation sequencing and de novo transcriptome assembly for SNP discovery in sunflower. JO - BMC Genomics VL - 14 IS - 1 PB - Biomed Central PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: Systems biology enables the identification of gene networks that modulate complex traits. Comprehensive metabolomic analyses provide innovative phenotypes that are intermediate between the initiator of genetic variability, the genome, and raw phenotypes that are influenced by a large number of environmental effects. The present study combines two concepts, systems biology and metabolic analyses, in an approach without prior functional hypothesis in order to dissect genes and molecular pathways that modulate differential growth at the onset of puberty in male cattle. Furthermore, this integrative strategy was applied to specifically explore distinctive gene interactions of non-SMC condensin I complex, subunit G (NCAPG) and myostatin (GDF8), known modulators of pre- and postnatal growth that are only partially understood for their molecular pathways affecting differential body weight. RESULTS: Our study successfully established gene networks and interacting partners affecting growth at the onset of puberty in cattle. We demonstrated the biological relevance of the created networks by comparison to randomly created networks. Our data showed that GnRH (Gonadotropin-releasing hormone) signaling is associated with divergent growth at the onset of puberty and revealed two highly connected hubs, BTC and DGKH, within the network. Both genes are known to directly interact with the GnRH signaling pathway. Furthermore, a gene interaction network for NCAPG containing 14 densely connected genes revealed novel information concerning the functional role of NCAPG in divergent growth. CONCLUSIONS: Merging both concepts, systems biology and metabolomic analyses, successfully yielded new insights into gene networks and interacting partners affecting growth at the onset of puberty in cattle. Genetic modulation in GnRH signaling was identified as key modifier of differential cattle growth at the onset of puberty. In addition, the benefit of our innovative concept without prior functional hypothesis was demonstrated by data suggesting that NCAPG might contribute to vascular smooth muscle contraction by indirect effects on the NO pathway via modulation of arginine metabolism. Our study shows for the first time in cattle that integration of genetic, physiological and metabolomics data in a systems biology approach will enable (or contribute to) an improved understanding of metabolic and gene networks and genotype-phenotype relationships.   AU - Widmann, P.* AU - Reverter, A.* AU - Fortes, M.R.* AU - Weikard, R.* AU - Suhre, K. AU - Hammon, H.* AU - Albrecht, E.* AU - Kuehn, C.* C1 - 28928 C2 - 33581 TI - A systems biology approach using metabolomic data reveals genes and pathways interacting to modulate divergent growth in cattle. JO - BMC Genomics VL - 14 IS - 1 PY - 2013 SN - 1471-2164 ER - TY - JOUR AB - ABSTRACT: BACKGROUND: Genome-wide association studies (GWAS) have provided a large set of genetic loci influencing the risk for many common diseases. Association studies typically analyze one specific trait in single populations in an isolated fashion without taking into account the potential phenotypic and genetic correlation between traits. However, GWA data can be efficiently used to identify overlapping loci with analogous or contrasting effects on different diseases. RESULTS: Here, we describe a new approach to systematically prioritize and interpret available GWA data. We focus on the analysis of joint and disjoint genetic determinants across diseases. Using network analysis, we show that variant-based approaches are superior to locus-based analyses. In addition, we provide a prioritization of disease loci based on network properties and discuss the roles of hub loci across several diseases. We demonstrate that, in general, agonistic associations appear to reflect current disease classifications, and present the potential use of effect sizes in refining and revising these agonistic signals. We further identify potential branching points in disease etiologies based on antagonistic variants and describe plausible small-scale models of the underlying molecular switches. CONCLUSIONS: The observation that a surprisingly high fraction (>15%) of the SNPs considered in our study are associated both agonistically and antagonistically with related as well as unrelated disorders indicates that the molecular mechanisms influencing causes and progress of human diseases are in part interrelated. Genetic overlaps between two diseases also suggest the importance of the affected entities in the specific pathogenic pathways and should be investigated further. AU - Arnold, M. AU - Hartsperger, M.L. AU - Baurecht, H.* AU - Rodriguez, E.* AU - Wachinger, B. AU - Franke, A.* AU - Kabesch, M.* AU - Winkelmann, J. AU - Pfeufer, A. AU - Romanos, M.* AU - Illig, T. AU - Mewes, H.-W. AU - Stuempflen, V. AU - Weidinger, S.* C1 - 10694 C2 - 30423 TI - Network-based SNP meta-analysis identifies joint and disjoint genetic features across common human diseases. JO - BMC Genomics VL - 13 IS - 1 PB - BioMed Central PY - 2012 SN - 1471-2164 ER - TY - JOUR AB - Background: Single nucleotide polymorphisms (SNPs) are increasingly becoming the DNA marker system of choice due to their prevalence in the genome and their ability to be used in highly multiplexed genotyping assays. Although needed in high numbers for genome-wide marker profiles and genomics-assisted breeding, a surprisingly low number of validated SNPs are currently available for perennial ryegrass. Results: A perennial ryegrass unigene set representing 9,399 genes was used as a reference for the assembly of 802,156 high quality reads generated by 454 transcriptome sequencing and for in silico SNP discovery. Out of more than 15,433 SNPs in 1,778 unigenes fulfilling highly stringent assembly and detection parameters, a total of 768 SNP markers were selected for GoldenGate genotyping in 184 individuals of the perennial ryegrass mapping population VrnA, a population being previously evaluated for important agronomic traits. A total of 592 (77%) of the SNPs tested were successfully called with a cluster separation above 0.9. Of these, 509 (86%) genic SNP markers segregated in the VrnA mapping population, out of which 495 were assigned to map positions. The genetic linkage map presented here comprises a total of 838 DNA markers (767 gene-derived markers) and spans 750 centi Mogan (cM) with an average marker interval distance of less than 0.9 cM. Moreover, it locates 732 expressed genes involved in a broad range of molecular functions of different biological processes in the perennial ryegrass genome. Conclusions: Here, we present an efficient approach of using next generation sequencing (NGS) data for SNP discovery, and the successful design of a 768-plex Illumina GoldenGate genotyping assay in a complex genome. The ryegrass SNPs along with the corresponding transcribed sequences represent a milestone in the establishment of genetic and genomics resources available for this species and constitute a further step towards molecular breeding strategies. Moreover, the high density genetic linkage map predominantly based on gene-associated DNA markers provides an important tool for the assignment of candidate genes to quantitative trait loci (QTL), functional genomics and the integration of genetic and physical maps in perennial ryegrass, one of the most important temperate grassland species. AU - Studer, B.* AU - Byrne, S.* AU - Nielsen, R.O.* AU - Panitz, F.* AU - Bendixen, C.* AU - Islam, M.S.* AU - Pfeifer, M. AU - Lübberstedt, T.* AU - Asp, T.* C1 - 10474 C2 - 30231 TI - A transcriptome map of perennial ryegrass (Lolium perenne L.). JO - BMC Genomics VL - 13 IS - 1 PB - Biomed Central Ltd. PY - 2012 SN - 1471-2164 ER - TY - JOUR AB - Background: Expansion of multi-C2H2 domain zinc finger (ZNF) genes, including the Kruppel-associated box (KRAB) subfamily, paralleled the evolution of tetrapodes, particularly in mammalian lineages. Advances in their cataloging and characterization suggest that the functions of the KRAB-ZNF gene family contributed to mammalian speciation. Results: Here, we characterized the human 8q24.3 ZNF cluster on the genomic, the phylogenetic, the structural and the transcriptome level. Six (ZNF7, ZNF34, ZNF250, ZNF251, ZNF252, ZNF517) of the seven locus members contain exons encoding KRAB domains, one (ZNF16) does not. They form a paralog group in which the encoded KRAB and ZNF protein domains generally share more similarities with each other than with other members of the human ZNF superfamily. The closest relatives with respect to their DNA-binding domain were ZNF7 and ZNF251. The analysis of orthologs in therian mammalian species revealed strong conservation and purifying selection of the KRAB-A and zinc finger domains. These findings underscore structural/functional constraints during evolution. Gene losses in the murine lineage (ZNF16, ZNF34, ZNF252, ZNF517) and potential protein truncations in primates (ZNF252) illustrate ongoing speciation processes. Tissue expression profiling by quantitative real-time PCR showed similar but distinct patterns for all tested ZNF genes with the most prominent expression in fetal brain. Based on accompanying expression signatures in twenty-six other human tissues ZNF34 and ZNF250 revealed the closest expression profiles. Together, the 8q24.3 ZNF genes can be assigned to a cerebellum, a testis or a prostate/thyroid subgroup. These results are consistent with potential functions of the ZNF genes in morphogenesis and differentiation. Promoter regions of the seven 8q24.3 ZNF genes display common characteristics like missing TATA-box, CpG island-association and transcription factor binding site (TFBS) modules. Common TFBS modules partly explain the observed expression pattern similarities. Conclusions: The ZNF genes at human 8q24.3 form a relatively old mammalian paralog group conserved in eutherian mammals for at least 130 million years. The members persisted after initial duplications by undergoing subfunctionalizations in their expression patterns and target site recognition. KRAB-ZNF mediated repression of transcription might have shaped organogenesis in mammalian ontogeny. AU - Lorenz, P.* AU - Dietmann, S. AU - Wilhelm, T.* AU - Koczan, D.* AU - Autran, S.* AU - Gad, S.* AU - Wen, G.P.* AU - Ding, G.H.* AU - Li, Y.X.* AU - Rousseau-Merck, M.F.* AU - Thiesen, H.J.* C1 - 2945 C2 - 28118 TI - The ancient mammalian KRAB zinc finger gene cluster on human chromosome 8q24.3 illustrates principles of C2H2 zinc finger evolution associated with unique expression profiles in human tissues. JO - BMC Genomics VL - 11 IS - 1 PB - Biomed Central Ltd. PY - 2010 SN - 1471-2164 ER - TY - JOUR AB - MicroRNA-mediated control of gene expression via translational inhibition has substantial impact on cellular regulatory mechanisms. About 37% of mammalian microRNAs appear to be located within introns of protein coding genes, linking their expression to the promoter-driven regulation of the host gene. In our study we investigate this linkage towards a relationship beyond transcriptional co-regulation. Using measures based on both annotation and experimental data, we show that intronic microRNAs tend to support their host genes by regulation of target gene expression with significantly correlated expression patterns. We used expression data of three differentiating cell types and compared gene expression profiles of host and target genes. Many microRNA target genes show expression patterns significantly correlated with the expressions of the microRNA host genes. By calculating functional similarities between host and predicted microRNA target genes based on GO annotations, we confirm that many microRNAs link host and target gene activity in an either synergistic or antagonistic manner. These two regulatory effects may result from fine tuning of target gene expression functionally related to the host or knock-down of remaining opponent target gene expression. This finding allows to extend the common practice of mapping large scale gene expression data to protein associated genes with functionality of co-expressed intronic microRNAs. AU - Lutter, D. AU - Marr, C. AU - Krumsiek, J. AU - Lang, E.W.* AU - Theis, F.J. C1 - 6134 C2 - 28105 TI - Intronic microRNAs support their host genes by mediating synergistic and antagonistic regulatory effects. JO - BMC Genomics VL - 11 IS - 1 PB - BioMed Central Ltd. PY - 2010 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: The pivotal role of stress in the precipitation of psychiatric diseases such as depression is generally accepted. This study aims at the identification of genes that are directly or indirectly responding to stress. Inbred mouse strains that had been evidenced to differ in their stress response as well as in their response to antidepressant treatment were chosen for RNA profiling after stress exposure. Gene expression and regulation was determined by microarray analyses and further evaluated by bioinformatics tools including pathway and cluster analyses. RESULTS: Forced swimming as acute stressor was applied to C57BL/6J and DBA/2J mice and resulted in sets of regulated genes in the paraventricular nucleus of the hypothalamus (PVN), 4 h or 8 h after stress. Although the expression changes between the mouse strains were quite different, they unfolded in phases over time in both strains. Our search for connections between the regulated genes resulted in potential novel signalling pathways in stress. In particular, Guanine nucleotide binding protein, alpha inhibiting 2 (GNAi2) and amyloid β (A4) precursor protein (APP) were detected as stress-regulated genes, and together with other genes, seem to be integrated into stress-responsive pathways and gene networks in the PVN. CONCLUSIONS: This search for stress-regulated genes in the PVN revealed its impact on interesting genes (GNAi2 and APP) and a novel gene network. In particular the expression of APP in the PVN that is governing stress hormone balance, is of great interest. The reported neuroprotective role of this molecule in the CNS supports the idea that a short acute stress can elicit positive adaptational effects in the brain. AU - Tsolakidou, A.* AU - Czibere, L.* AU - Pütz, B.* AU - Trümbach, D. AU - Panhuysen, M.* AU - Deussing, J.M.* AU - Wurst, W. AU - Sillaber, I.* AU - Landgraf, R.* AU - Holsboer, F.* AU - Rein, T.* C1 - 6068 C2 - 27818 TI - Gene expression profiling in the stress control brain region hypothalamic paraventricular nucleus reveals a novel gene network including amyloid beta precursor protein. JO - BMC Genomics VL - 11 PB - BioMed Central Ltd. PY - 2010 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L.) is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable. RESULTS: To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of approximately 50 kb (N80 approximately 31 kb, N90 approximately 21 kb) and a Q40 of 94%. For approximately 80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes.By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies. CONCLUSION: The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome. AU - Steuernagel, B.* AU - Taudien, S.* AU - Gundlach, H. AU - Seidel, M. AU - Ariyadasa, R.* AU - Schulte, D.* AU - Petzold, A.* AU - Felder, M.* AU - Graner, A.* AU - Scholz, U.* AU - Mayer, K.F.X. AU - Platzer, M.* AU - Stein, N.* C1 - 847 C2 - 26530 TI - De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. JO - BMC Genomics VL - 10 PB - Biomed Central Ltd PY - 2009 SN - 1471-2164 ER - TY - JOUR AB - Background: The spatiotemporal regulation of gene expression largely depends on the presence and absence of cis-regulatory sites in the promoter. In the economically highly important grass family, our knowledge of transcription factor binding sites and transcriptional networks is still very limited. With the completion of the sorghum genome and the available rice genome sequence, comparative promoter analyses now allow genome-scale detection of conserved cis-elements. Results: In this study, we identified thousands of phylogenetic footprints conserved between orthologous rice and sorghum upstream regions that are supported by co-expression information derived from three different rice expression data sets. In a complementary approach, cis-motifs were discovered by their highly conserved co-occurrence in syntenic promoter pairs. Sequence conservation and matches to known plant motifs support our findings. Expression similarities of gene pairs positively correlate with the number of motifs that are shared by gene pairs and corroborate the importance of similar promoter architectures for concerted regulation. This strongly suggests that these motifs function in the regulation of transcript levels in rice and, presumably also in sorghum. Conclusion: Our work provides the first large-scale collection of cis-elements for rice and sorghum and can serve as a paradigm for cis-element analysis through comparative genomics in grasses in general. AU - Wang, X.* AU - Haberer, G.* AU - Mayer, K.F.X.* C1 - 311 C2 - 26768 TI - Discovery of cis-elements between sorghum and rice using co-expression and evolutionary conservation. JO - BMC Genomics VL - 10 PB - Biomed Central PY - 2009 SN - 1471-2164 ER - TY - JOUR AB - A significant proportion of the human genome is comprised of human endogenous retroviruses (HERVs). HERV transcripts are found in every human tissue. Expression of proviruses of the HERV-K(HML-2) family has been associated with development of human tumors, in particular germ cell tumors (GCT). Very little is known about transcriptional activity of individual HML-2 loci in human tissues, though. RESULTS: By employing private nucleotide differences between loci, we assigned approximately 1500 HML-2 cDNAs to individual HML-2 loci, identifying, in total, 23 transcriptionally active HML-2 proviruses. Several loci are active in various human tissue types. Transcription levels of some HML-2 loci appear higher than those of other loci. Several HML-2 Rec-encoding loci are expressed in GCT and non-GCT tissues. A provirus on chromosome 22q11.21 appears strongly upregulated in pathologic GCT tissues and may explain high HML-2 Gag protein levels in GCTs. Presence of Gag and Env antibodies in GCT patients is not correlated with activation of individual loci. HML-2 proviruses previously reported capable of forming an infectious HML-2 variant are transcriptionally active in germ cell tissue. Our study furthermore shows that Expressed Sequence Tag (EST) data are insufficient to describe transcriptional activity of HML-2 and other HERV loci in tissues of interest. CONCLUSION: Our, to date, largest-scale study reveals in greater detail expression patterns of individual HML-2 loci in human tissues of clinical interest. Moreover, large-scale, specialized studies are indicated to better comprehend transcriptional activity and regulation of HERVs. We thus emphasize the need for a specialized HERV Transcriptome Project. AU - Flockerzi, A.* AU - Ruggieri, A.* AU - Frank, O.* AU - Sauter, M.* AU - Maldener, E.* AU - Kopper, B.* AU - Wullich, B.* AU - Seifarth, W.* AU - Müller-Lantzsch, N.* AU - Leib-Mösch, C. AU - Meese, E.* AU - Mayer, J.* C1 - 2209 C2 - 25629 TI - Expression patterns of transcribed human endogenous retrovirus HERV-K(HML-2) loci in human tissues and the need for a HERV Transcriptome Project. JO - BMC Genomics VL - 9 PB - BioMed Central PY - 2008 SN - 1471-2164 ER - TY - JOUR AB - Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible. RESULTS: Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell.As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells. CONCLUSION: Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism. AU - Ishihama, Y.* AU - Schmidt, T.* AU - Rappsilber, J.* AU - Mann, M.* AU - Hartl, F.* AU - Kerner, M.J.* AU - Frishman, D. C1 - 3433 C2 - 25527 TI - Protein abundance profiling of the Escherichia coli cytosol. JO - BMC Genomics VL - 9 PB - BioMed Central PY - 2008 SN - 1471-2164 ER - TY - JOUR AB - Background: In a transgenic mouse model of Alzheimer disease (AD), cleavage of the amyloid precursor protein (APP) by the alpha-secretase ADAM10 prevented amyloid plaque formation, and alleviated cognitive deficits. Furthermore, ADAM10 overexpression increased the cortical synaptogenesis. These results suggest that upregulation of ADAM10 in the brain has beneficial effects on AD pathology. Results: To assess the influence of ADAM10 on the gene expression profile in the brain, we performed a microarray analysis using RNA isolated from brains of five months old mice overexpressing either the alpha-secretase ADAM10, or a dominant-negative mutant (dn) of this enzyme. As compared to non-transgenic wild-type mice, in ADAM10 transgenic mice 355 genes, and in dnADAM10 mice 143 genes were found to be differentially expressed. A higher number of genes was differentially regulated in double-transgenic mouse strains additionally expressing the human APP([V717I]) mutant. Overexpression of proteolytically active ADAM10 affected several physiological pathways, such as cell communication, nervous system development, neuron projection as well as synaptic transmission. Although ADAM10 has been implicated in Notch and beta-catenin signaling, no significant changes in the respective target genes were observed in adult ADAM10 transgenic mice. Real-time RT-PCR confirmed a downregulation of genes coding for the inflammation-associated proteins S100a8 and S100a9 induced by moderate ADAM10 overexpression. Overexpression of the dominant-negative form dnADAM10 led to a significant increase in the expression of the fatty acid-binding protein Fabp7, which also has been found in higher amounts in brains of Down syndrome patients. Conclusion: In general, there was only a moderate alteration of gene expression in ADAM10 overexpressing mice. Genes coding for pro-inflammatory or pro-apoptotic proteins were not over-represented among differentially regulated genes. Even a decrease of inflammation markers was observed. These results are further supportive for the strategy to treat AD by increasing the alpha-secretase activity. AU - Prinzen, C.* AU - Trümbach, D. AU - Wurst, W. AU - Endres, K.* AU - Postina, R.* AU - Fahrenholz, F.* C1 - 345 C2 - 26121 TI - Differential gene expression in ADAM10 and mutant ADAM10 transgenic mice. JO - BMC Genomics VL - 10 PB - Biomed Central PY - 2008 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: The Spemann/Mangold organizer is a transient tissue critical for patterning the gastrula stage vertebrate embryo and formation of the three germ layers. Despite its important role during development, there are still relatively few genes with specific expression in the organizer and its derivatives. Foxa2 is a forkhead transcription factor that is absolutely required for formation of the mammalian equivalent of the organizer, the node, the axial mesoderm and the definitive endoderm (DE). However, the targets of Foxa2 during embryogenesis, and the molecular impact of organizer loss on the gastrula embryo, have not been well defined. RESULTS: To identify genes specific to the Spemann/Mangold organizer, we performed a microarray-based screen that compared wild-type and Foxa2 mutant embryos at late gastrulation stage (E7.5). We could detect genes that were consistently down-regulated in replicate pools of mutant embryos versus wild-type, and these included a number of known node and DE markers. We selected 314 genes without previously published data at E7.5 and screened for expression by whole mount in situ hybridization. We identified 10 novel expression patterns in the node and 5 in the definitive endoderm. We also found significant reduction of markers expressed in secondary tissues that require interaction with the organizer and its derivatives, such as cardiac mesoderm, vasculature, primitive streak, and anterior neuroectoderm. CONCLUSION: The genes identified in this screen represent novel Spemann/Mangold organizer genes as well as potential Foxa2 targets. Further investigation will be needed to define these genes as novel developmental regulatory factors involved in organizer formation and function. We have placed these genes in a Foxa2-dependent genetic regulatory network and we hypothesize how Foxa2 may regulate a molecular program of Spemann/Mangold organizer development. We have also shown how early loss of the organizer and its inductive properties in an otherwise normal embryo, impacts on the molecular profile of surrounding tissues. AU - Tamplin, O.J.* AU - Kinzel, D. AU - Cox, B.J.* AU - Bell, C.E.* AU - Rossant, J.* AU - Lickert, H. C1 - 5202 C2 - 25872 TI - Microarray analysis of Foxa2 mutant mouse embryos reveals novel gene expression and inductive roles for the gastrula organizer and its derivatives. JO - BMC Genomics VL - 9 PB - BioMed Central PY - 2008 SN - 1471-2164 ER - TY - JOUR AB - Background: We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. Results: As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tend to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. Conclusions: We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes. AU - Wong, P. AU - Althammer, S. AU - Hildebrand, A. AU - Kirschner, A.* AU - Pagel, P. AU - Geissler, B.* AU - Smialowski, P. AU - Blöchl, F. AU - Oesterheld, M. AU - Schmidt, T. AU - Strack, N. AU - Theis, F.J. AU - Ruepp, A. AU - Frishman, D. C1 - 2388 C2 - 25982 TI - An evolutionary and structural characterization of mammalian protein complex organization. JO - BMC Genomics VL - 9 PB - BioMed Central Ltd. PY - 2008 SN - 1471-2164 ER - TY - JOUR AB - Quantitative phenotypic variation of agronomic characters in crop plants is controlled by environmental and genetic factors (quantitative trait loci = QTL). To understand the molecular basis of such QTL, the identification of the underlying genes is of primary interest and DNA sequence analysis of the genomic regions harboring QTL is a prerequisite for that. QTL mapping in potato (Solanum tuberosum) has identified a region on chromosome V tagged by DNA markers GP21 and GP179, which contains a number of important QTL, among others QTL for resistance to late blight caused by the oomycete Phytophthora infestans and to root cyst nematodes. To obtain genomic sequence for the targeted region on chromosome V, two local BAC (bacterial artificial chromosome) contigs were constructed and sequenced, which corresponded to parts of the homologous chromosomes of the diploid, heterozygous genotype P6/210. Two contiguous sequences of 417,445 and 202,781 base pairs were assembled and annotated. Gene-by-gene co-linearity was disrupted by non-allelic insertions of retrotransposon elements, stretches of diverged intergenic sequences, differences in gene content and gene order. The latter was caused by inversion of a 70 kbp genomic fragment. These features were also found in comparison to orthologous sequence contigs from three homeologous chromosomes of Solanum demissum, a wild tuber bearing species. Functional annotation of the sequence identified 48 putative open reading frames (ORF) in one contig and 22 in the other, with an average of one ORF every 9 kbp. Ten ORFs were classified as resistance-gene-like, 11 as F-box-containing genes, 13 as transposable elements and three as transcription factors. Comparing potato to Arabidopsis thaliana annotated proteins revealed five micro-syntenic blocks of three to seven ORFs with A. thaliana chromosomes 1, 3 and 5. Comparative sequence analysis revealed highly conserved collinear regions that flank regions showing high variability and tandem duplicated genes. Sequence annotation revealed that the majority of the ORFs were members of multiple gene families. Comparing potato to Arabidopsis thaliana annotated proteins suggested fragmented structural conservation between these distantly related plant species. AU - Ballvora, A.* AU - Jöcker, A.* AU - Viehöver, P.* AU - Ishihara, H.* AU - Paal, J.* AU - Meksem, K.* AU - Bruggmann, R. AU - Schoof, H.* AU - Weisshaar, B.* AU - Gebhardt, C.* C1 - 4264 C2 - 24567 TI - Comparative sequence analysis of Solanum and Arabidopsis in a hot spot for pathogen resistance on potato chromosome V reveals a patchwork of conserved and rapidly evolving genome segments. JO - BMC Genomics VL - 8 PB - BioMed Central PY - 2007 SN - 1471-2164 ER - TY - JOUR AB - he common marmoset monkey (Callithrix jacchus), a small non-endangered New World primate native to eastern Brazil, is becoming increasingly used as a non-human primate model in biomedical research, drug development and safety assessment. In contrast to the growing interest for the marmoset as an animal model, the molecular tools for genetic analysis are extremely limited. Here we report the development of the first marmoset-specific oligonucleotide microarray (EUMAMA) containing probe sets targeting 1541 different marmoset transcripts expressed in hippocampus. These 1541 transcripts represent a wide variety of different functional gene classes. Hybridisation of the marmoset microarray with labelled RNA from hippocampus, cortex and a panel of 7 different peripheral tissues resulted in high detection rates of 85% in the neuronal tissues and on average 70% in the non-neuronal tissues. The expression profiles of the 2 neuronal tissues, hippocampus and cortex, were highly similar, as indicated by a correlation coefficient of 0.96. Several transcripts with a tissue-specific pattern of expression were identified. Besides the marmoset microarray we have generated 3215 ESTs derived from marmoset hippocampus, which have been annotated and submitted to GenBank [GenBank: EF214838 – EF215447, EH380242 – EH382846]. We have generated the first marmoset-specific DNA microarray and demonstrated its use to characterise large-scale gene expression profiles of hippocampus but also of other neuronal and non-neuronal tissues. In addition, we have generated a large collection of ESTs of marmoset origin, which are now available in the public domain. These new tools will facilitate molecular genetic research into this non-human primate animal model. AU - Datson, N.A.* AU - Morsink, M.C.* AU - Atanasova, S.* AU - Armstrong, V.W.* AU - Zischler, H.* AU - Schlumbohm, C.* AU - Dutilh, B.E.* AU - Huynen, M.A.* AU - Wägele, B. AU - Ruepp, A. AU - de Kloet, E.R.* AU - Fuchs, E.* C1 - 5881 C2 - 24570 TI - Development of the first marmoset-specific DNA microarray (EUMAMA): A new genetic tool for large-scale expression profiling in a non-human primate. JO - BMC Genomics VL - 8 PB - BioMed Central PY - 2007 SN - 1471-2164 ER - TY - JOUR AB - New technologies have enabled genome-wide association studies to be conducted with hundreds of thousands of genotyped SNPs. Several different first-generation genome-wide panels of SNPs have been commercialized. The total amount of common genetic variation is still unknown; however, the coverage of commercial panels can be evaluated against reference population samples genotyped by the International HapMap project. Less information is available about coverage in samples from other populations. RESULTS: In this study we compare four commercial panels: the HumanHap 300 and HumanHap 550 Array Sets from the Illumina Infinium series and the Mapping 100 K and Mapping 500 K Array Sets from the Affymetrix GeneChip series. Tagging performance is compared among HapMap CEPH (CEU), Asian (JPT, CHB) and Yoruba (YRI) population samples. It is also evaluated in an Estonian population sample with more than 1000 individuals genotyped in two 500-kbp ENCODE regions of chromosome 2: ENr112 on 2p16.3 and ENr131 on 2p37.1. CONCLUSION: We found that in a non-reference Caucasian population, commercial SNP panels provide levels of coverage similar to those in the HapMap CEPH population sample. We present the proportions of universal and population-specific SNPs in all the commercial platforms studied. AU - Mägi, R.* AU - Pfeufer, A. AU - Nelis, M.* AU - Montpetit, A.* AU - Metspalu, A.* AU - Remm, M. C1 - 3925 C2 - 24711 TI - Evaluating the performance of commercial whole-genome marker sets for capturing common genetic variation. JO - BMC Genomics VL - 8 PB - Biomed Central PY - 2007 SN - 1471-2164 ER - TY - JOUR AB - The corn smut fungus Ustilago maydis is a well-established model system for molecular phytopathology. In addition, it recently became evident that U. maydis and humans share proteins and cellular processes that are not found in the standard fungal model Saccharomyces cerevisiae. This prompted us to do a comparative analysis of the predicted proteome of U. maydis, S. cerevisiae and humans. AU - Münsterkötter, M. AU - Steinberg, G.* C1 - 4277 C2 - 25089 TI - The fungus Ustilago maydis and humans share disease-related proteins that are not found in Saccharomyces cerevisiae. JO - BMC Genomics VL - 8 PB - Biomed Central PY - 2007 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: The classical C2H2 zinc finger domain is involved in a wide range of functions and can bind to DNA, RNA and proteins. The comparison of zinc finger proteins in several eukaryotes has shown that there is a lot of lineage specific diversification and expansion. Although the number of characterized plant proteins that carry the classical C2H2 zinc finger motifs is growing, a systematic classification and analysis of a plant genome zinc finger gene set is lacking. RESULTS: We found through in silico analysis 176 zinc finger proteins in Arabidopsis thaliana that hence constitute the most abundant family of putative transcriptional regulators in this plant. Only a minority of 33 A. thaliana zinc finger proteins are conserved in other eukaryotes. In contrast, the majority of these proteins (81%) are plant specific. They are derived from extensive duplication events and form expanded families. We assigned the proteins to different subgroups and families and focused specifically on the two largest and evolutionarily youngest families (A1 and C1) that are suggested to be primarily involved in transcriptional regulation. The newly defined family A1 (24 members) comprises proteins with tandemly arranged zinc finger domains. Family C1 (64 members), earlier described as the EPF-family in Petunia, comprises proteins with one isolated or two to five dispersed fingers and a mostly invariant QALGGH motif in the zinc finger helices. Based on the amino acid pattern in these helices we could describe five different signature sequences prevalent in C1 zinc finger domains. We also found a number of non-finger domains that are conserved in these families. CONCLUSIONS: Our analysis of the few evolutionarily conserved zinc finger proteins of A. thaliana suggests that most of them could be involved in ancient biological processes like RNA metabolism and chromatin-remodeling. In contrast, the majority of the unique A. thaliana zinc finger proteins are known or suggested to be involved in transcriptional regulation. They exhibit remarkable differences in the features of their zinc finger sequences and zinc finger arrangements compared to animal zinc finger proteins. The different zinc finger helix signatures we found in family C1 may have important implications for the sequence specific DNA recognition and allow inferences about the evolution of the members in this family. AU - Engelbrecht, C.C. AU - Schoof, H. AU - Böhm, S.* C1 - 400 C2 - 22222 TI - Conservation, diversification and expansion of C2H2 zinc finger proteins in the Arabidopsis thaliana genome. JO - BMC Genomics VL - 5 PB - BIOMED CENTRAL LTD PY - 2004 SN - 1471-2164 ER - TY - JOUR AB - BACKGROUND: The understanding of whole genome sequences in higher eukaryotes depends to a large degree on the reliable definition of transcription units including exon/intron structures, translated open reading frames (ORFs) and flanking untranslated regions. The best currently available chicken transcript catalog is the Ensembl build based on the mappings of a relatively small number of full length cDNAs and ESTs to the genome as well as genome sequence derived in silico gene predictions. RESULTS: We use Long Serial Analysis of Gene Expression (LongSAGE) in bursal lymphocytes and the DT40 cell line to verify the quality and completeness of the annotated transcripts. 53.6% of the more than 38,000 unique SAGE tags (unitags) match to full length bursal cDNAs, the Ensembl transcript build or the genome sequence. The majority of all matching unitags show single matches to the genome, but no matches to the genome derived Ensembl transcript build. Nevertheless, most of these tags map close to the 3' boundaries of annotated Ensembl transcripts. CONCLUSIONS: These results suggests that rather few genes are missing in the current Ensembl chicken transcript build, but that the 3' ends of many transcripts may not have been accurately predicted. The tags with no match in the transcript sequences can now be used to improve gene predictions, pinpoint the genomic location of entirely missed transcripts and optimize the accuracy of gene finder software. AU - Wahl, M.B.* AU - Caldwell, R.B.* AU - Kierzek, A.M.* AU - Arakawa, H.* AU - Eyras, E.* AU - Hubner, N.* AU - Jung, C.* AU - Soeldenwagner, M.* AU - Cervelli, M.* AU - Wang, Y.-D.* AU - Liebscher, V.* C1 - 5211 C2 - 22371 TI - Evaluation of the chicken transcriptome by SAGE of B cells and the DT40 cell line. JO - BMC Genomics VL - 5 PB - Biomed Central Ltd PY - 2004 SN - 1471-2164 ER -