TY - JOUR AB - Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creating opportunities to improve RVATs. However, existing methods often rely on rigid models or single annotations, limiting their ability to leverage these advances. We introduce BayesRVAT, a Bayesian rare variant association test that jointly models multiple annotations. By specifying priors on annotation effects and estimating genetrait-specific posterior burden scores, BayesRVAT flexibly captures diverse rare-variant architectures. In simulations, BayesRVAT improves power while maintaining calibration. In UK Biobank analyses, it detects 10.2% more blood-trait associations and reveals novel genedisease links, including PRPH2 with retinal disease. Integrating BayesRVAT within omnibus frameworks further increases discoveries, demonstrating that flexible annotation modeling captures complementary signals beyond existing burden and variance-component tests. AU - Nappi, A. AU - Shilova, L. AU - Karaletsos, T.* AU - Cai, N. AU - Casale, F.P. C1 - 75852 C2 - 58146 TI - BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations. JO - Genome Res. PY - 2025 SN - 1088-9051 ER - TY - JOUR AB - Solve-RD is a pan-European rare disease (RD) research program that aims to identify disease-causing genetic variants in previously undiagnosed RD families. We utilized 10-fold coverage HiFi long-read sequencing (LRS) for detecting causative structural variants (SVs), single-nucleotide variants (SNVs), insertion-deletions (indels), and short tandem repeat (STR) expansions in previously studied RD families without a clear molecular diagnosis. Our cohort includes 293 individuals from 114 genetically undiagnosed RD families selected by European Reference Network (ERN) experts. Of these, 21 families were affected by so-called "unsolvable" syndromes for which genetic causes remain unknown and for which prior testing was not a prerequisite. The remaining 93 families had at least one individual affected by a rare neurological, neuromuscular, or epilepsy disorder without a genetic diagnosis despite extensive prior testing. Clinical interpretation and orthogonal validation of variants in known disease genes yielded 12 novel genetic diagnoses due to de novo and rare inherited SNVs, indels, SVs, and STR expansions. In an additional five families, we identified a candidate disease-causing variant, including an MCF2/FGF13 fusion and a PSMA3 deletion. However, no common genetic cause was identified in any of the "unsolvable" syndromes. Taken together, we found (likely) disease-causing genetic variants in 11.8% of previously unsolved families and additional candidate disease-causing SVs in another 5.4% of these families. In conclusion, our results demonstrate the potential added value of HiFi long-read genome sequencing in undiagnosed rare diseases. AU - Steyaert, W.* AU - Sagath, L.* AU - Demidov, G.* AU - Yépez, V.A.* AU - Esteve-Codina, A.* AU - Gagneur, J. AU - Ellwanger, K.* AU - Derks, R.* AU - Weiss, M.C.* AU - den Ouden, A.* AU - van den Heuvel, S.* AU - Swinkels, H.* AU - Zomer, N.* AU - Steehouwer, M.* AU - O'Gorman, L.* AU - Astuti, G.* AU - Neveling, K.* AU - Schüle, R.* AU - Xu, J.* AU - Synofzik, M.* AU - Beijer, D.* AU - Hengel, H.* AU - Schöls, L.* AU - Claeys, K.G.* AU - Baets, J.* AU - Van de Vondel, L.* AU - Ferlini, A.* AU - Selvatici, R.* AU - Morsy, H.* AU - Saeed Abd Elmaksoud, M.* AU - Straub, V.* AU - Müller, J.* AU - Pini, V.* AU - Perry, L.* AU - Sarkozy, A.* AU - Zaharieva, I.* AU - Muntoni, F.* AU - Bugiardini, E.* AU - Polavarapu, K.* AU - Horvath, R.* AU - Reid, E.* AU - Lochmüller, H.* AU - Spinazzi, M.* AU - Savarese, M.* AU - Matalonga, L.* AU - Laurie, S.* AU - Brunner, H.G.* AU - Graessner, H.* AU - Beltran, S.* AU - Ossowski, S.* AU - Vissers, L.E.L.M.* AU - Gilissen, C.* AU - Hoischen, A.* C1 - 73787 C2 - 57223 SP - 755-768 TI - Unraveling undiagnosed rare disease cases by HiFi long-read genome sequencing. JO - Genome Res. VL - 35 IS - 4 PY - 2025 SN - 1088-9051 ER - TY - JOUR AB - Dormancy is a key feature of stem cell function in adult tissues as well as in embryonic cells in the context of diapause. The establishment of dormancy is an active process that involves extensive transcriptional, epigenetic, and metabolic rewiring. How these processes are coordinated to successfully transition cells to the resting dormant state remains unclear. Here we show that microRNA activity, which is otherwise dispensable for preimplantation development, is essential for the adaptation of early mouse embryos to the dormant state of diapause. In particular, the pluripotent epiblast depends on miRNA activity, the absence of which results in the loss of pluripotent cells. Through the integration of high-sensitivity small RNA expression profiling of individual embryos and protein expression of miRNA targets with public data of protein-protein interactions, we constructed the miRNA-mediated regulatory network of mouse early embryos specific to diapause. We find that individual miRNAs contribute to the combinatorial regulation by the network, and the perturbation of the network compromises embryo survival in diapause. We further identified the nutrient-sensitive transcription factor TFE3 as an upstream regulator of diapause-specific miRNAs, linking cytoplasmic MTOR activity to nuclear miRNA biogenesis. Our results place miRNAs as a critical regulatory layer for the molecular rewiring of early embryos to establish dormancy. AU - Iyer, D.P.* AU - Moyon, L. AU - Wittler, L.* AU - Cheng, C.Y.* AU - Ringeling, F.R.* AU - Canzar, S.* AU - Marsico, A. AU - Bulut-Karslioğlu, A. C1 - 70663 C2 - 55809 CY - 1 Bungtown Rd, Cold Spring Harbor, Ny 11724 Usa SP - 572-589 TI - Combinatorial microRNA activity is essential for the transition of pluripotent cells from proliferation into dormancy. JO - Genome Res. VL - 34 IS - 4 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2024 SN - 1088-9051 ER - TY - JOUR AB - Accurate predictive models of future disease onset are crucial for effective preventive healthcare, yet longitudinal data sets linking early risk factors to subsequent health outcomes are limited. To overcome this challenge, we introduce a novel framework, Predictive Risk modeling using Mendelian Randomization (PRiMeR), which utilizes genetic effects as supervisory signals to learn disease risk predictors without relying on longitudinal data. To do so, PRiMeR leverages risk factors and genetic data from a healthy cohort, along with results from genome-wide association studies of diseases of interest. After training, the learned predictor can be used to assess risk for new patients solely based on risk factors. We validate PRiMeR through comprehensive simulations and in future type 2 diabetes predictions in UK Biobank participants without diabetes, using follow-up onset labels for validation. Moreover, we apply PRiMeR to predict future Alzheimer's disease onset from brain imaging biomarkers and future Parkinson's disease onset from accelerometer-derived traits. Overall, with PRiMeR we offer a new perspective in predictive modeling, showing it is possible to learn risk predictors leveraging genetics rather than longitudinal data. AU - Sens, D.W. AU - Shilova, L. AU - Gräf, L. AU - Grebenshchikova, M. AU - Eskofier, B.M. AU - Casale, F.P. C1 - 71860 C2 - 56453 CY - 1 Bungtown Rd, Cold Spring Harbor, Ny 11724 Usa SP - 1276-1285 TI - Genetics-driven risk predictions leveraging the Mendelian randomization framework. JO - Genome Res. VL - 34 IS - 9 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2024 SN - 1088-9051 ER - TY - JOUR AB - Acute myeloid leukemia (AML) is a molecularly complex disease characterized by heterogeneous tumor genetic profiles and involving numerous pathogenic mechanisms and pathways. Integration of molecular data types across multiple patient cohorts may advance current genetic approaches for improved subclassification and understanding of the biology of the disease. Here, we analyzed genome-wide DNA methylation in 649 AML patients using Illumina arrays and identified a configuration of 13 subtypes (termed "epitypes") using unbiased clustering. Integration of genetic data revealed that most epitypes were associated with a certain recurrent mutation (or combination) in a majority of patients, yet other epitypes were largely independent. Epitypes showed developmental blockage at discrete stages of myeloid differentiation, revealing epitypes that retain arrested hematopoietic stem-cell-like phenotypes. Detailed analyses of DNA methylation patterns identified unique patterns of aberrant hyper- and hypomethylation among epitypes, with variable involvement of transcription factors influencing promoter, enhancer, and repressed regions. Patients in epitypes with stem-cell-like methylation features showed inferior overall survival along with up-regulated stem cell gene expression signatures. We further identified a DNA methylation signature involving STAT motifs associated with FLT3-ITD mutations. Finally, DNA methylation signatures were stable at relapse for the large majority of patients, and rare epitype switching accompanied loss of the dominant epitype mutations and reversion to stem-cell-like methylation patterns. These results show that DNA methylation-based classification integrates important molecular features of AML to reveal the diverse pathogenic and biological aspects of the disease. AU - Giacopelli, B.* AU - Wang, M.* AU - Cleary, A.* AU - Wu, Y.Z.* AU - Schultz, A.R.* AU - Schmutz, M.* AU - Blachly, J.S.* AU - Eisfeld, A.K.* AU - Mundy-Bosse, B.* AU - Vosberg, S. AU - Greif, P.A.* AU - Claus, R.* AU - Bullinger, L.* AU - Garzon, R.* AU - Coombes, K.R.* AU - Bloomfield, C.D.* AU - Druker, B.J.* AU - Tyner, J.W.* AU - Byrd, J.C.* AU - Oakes, C.C.* C1 - 62118 C2 - 50621 CY - 1 Bungtown Rd, Cold Spring Harbor, Ny 11724 Usa SP - 747-761 TI - DNA methylation epitypes highlight underlying developmental and disease pathways in acute myeloid leukemia. JO - Genome Res. VL - 31 IS - 5 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2021 SN - 1088-9051 ER - TY - JOUR AB - Wheat has been domesticated into a large number of agricultural environments and has the ability to adapt to diverse environments. To understand this process, we survey genotype, repeat content, and DNA methylation across a bread wheat landrace collection representing global genetic diversity. We identify independent variation in methylation, genotype, and transposon copy number. We show that these, so far unexploited, sources of variation have had a significant impact on the wheat genome and that ancestral methylation states become preferentially "hard coded" as single nucleotide polymorphisms (SNPs) via 5-methylcytosine deamination. These mechanisms also drive local adaption, impacting important traits such as heading date and salt tolerance. Methylation and transposon diversity could therefore be used alongside SNP-based markers for breeding. AU - Gardiner, L.J.* AU - Joynson, R.* AU - Omony, J. AU - Rusholme-Pilcher, R.* AU - Olohan, L.* AU - Lang, D. AU - Bai, C.* AU - Hawkesford, M.* AU - Salt, D.* AU - Spannagl, M. AU - Mayer, K.F.X. AU - Kenny, J.* AU - Bevan, M.* AU - Hall, N.* AU - Hall, A.* C1 - 54247 C2 - 45385 CY - 1 Bungtown Rd, Cold Spring Harbor, Ny 11724 Usa SP - 1319-1332 TI - Hidden variation in polyploid wheat drives local adaptation. JO - Genome Res. VL - 28 IS - 9 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2018 SN - 1088-9051 ER - TY - JOUR AB - Alternative splicing generates distinct mRNA isoforms and is crucial for proteome diversity in eukaryotes. The RNA-binding protein (RBP) U2AF2 is central to splicing decisions, as it recognizes 3′splice sites and recruits the spliceosome. We establish “in vitro iCLIP” experiments, in which recombinant RBPs are incubated with long transcripts, to study how U2AF2 recognizes RNA sequences and how this is modulated by trans-acting RBPs. We measure U2AF2 affinities at hundreds of binding sites and compare in vitro and in vivo binding landscapes by mathematical modeling. We find that trans-acting RBPs extensively regulate U2AF2 binding in vivo, including enhanced recruitment to 3′splice sites and clearance of introns. Using machine learning, we identify and experimentally validate novel trans-acting RBPs (including FUBP1, CELF6, and PCBP1) that modulate U2AF2 binding and affect splicing outcomes. Our study offers a blueprint for the high-throughput characterization of in vitro mRNP assembly and in vivo splicing regulation. AU - Reymond Sutandy, F.X.* AU - Ebersberger, S.* AU - Huang, L.* AU - Busch, A.* AU - Bach, M.* AU - Kang, H.-S. AU - Fallmann, J.* AU - Maticzka, D.* AU - Backofen, R.* AU - Stadler, P.F.* AU - Zarnack, K.* AU - Sattler, M. AU - Legewie, S.* AU - König, J.* C1 - 53884 C2 - 45111 SP - 699-713 TI - In vitro iCLIP-based modeling uncovers how the splicing factor U2AF2 relies on regulation by cofactors. JO - Genome Res. VL - 28 IS - 5 PY - 2018 SN - 1088-9051 ER - TY - JOUR AB - Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop. AU - Clavijo, B.J.* AU - Venturini, L.* AU - Schudoma, C.* AU - Accinelli, G.G.* AU - Kaithakottil, G.* AU - Wright, J.* AU - Borrill, P* AU - Kettleborough, G.* AU - Heavens, D.* AU - Chapman, H.* AU - Lipscombe, J.* AU - Barker, T.* AU - Lu, F.H.* AU - McKenzie, N.* AU - Raats, D.* AU - Ramirez-Gonzalez, R.H.* AU - Coince, A.* AU - Peel, N.* AU - Percival-Alwyn, L.* AU - Duncan, O.* AU - Trösch, J.* AU - Yu, G.* AU - Bolser, D.M.* AU - Namaati, G.* AU - Kerhornou, A.* AU - Spannagl, M. AU - Gundlach, H. AU - Haberer, G. AU - Davey, R.P.* AU - Fosker, C.* AU - Palma, F.D.* AU - Phillips, A.* AU - Millar, A.H.* AU - Kersey, P.J.* AU - Uauy, C.* AU - Krasileva, K.V.* AU - Swarbreck, D.* AU - Bevan, M.W.* AU - Clark, M.D.* C1 - 50966 C2 - 43029 SP - 885-896 TI - An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. JO - Genome Res. VL - 27 IS - 5 PY - 2017 SN - 1088-9051 ER - TY - JOUR AB - DNase I hypersensitive sites (DHSs) are a hallmark of chromatin regions containing regulatory DNA such as enhancers and promoters; however, the factors affecting the establishment and maintenance of these sites are not fully understood. We now show that HMGN1 and HMGN2, nucleosome-binding proteins that are ubiquitously expressed in vertebrate cells, maintain the DHSs landscape of mouse embryonic fibroblasts (MEFs) synergistically. Loss of one of these HMGN variants led to a compensatory increase of binding of the remaining variant. Genome wide mapping of the DHSs in Hmgn1-/-, Hmgn2-/- and Hmgn1-/-n2-/- MEFs reveals that loss of both, but not a single HMGN variant, leads to significant remodeling of the DHSs landscape, especially at enhancer regions marked by H3K4me1 and H3K27ac. Loss of HMGN variants affects the induced expression of stress responsive genes in MEFs, the transcription profiles of several mouse tissues, and leads to altered phenotypes that are not seen in mice lacking only one variant. We conclude that the compensatory binding of HMGN variants to chromatin maintains the DHSs landscape and the transcription fidelity and is necessary to retain wild type phenotypes. Our study provides insights into mechanisms that maintain regulatory sites in chromatin and into functional compensation among nucleosome binding architectural proteins. AU - Deng, T.* AU - Zhu, Z.I.* AU - Zhang, S.* AU - Postnikov, Y.* AU - Huang, D.* AU - Horsch, M. AU - Furusawa, T.* AU - Beckers, J. AU - Rozman, J. AU - Klingenspor, M. AU - Amarie, O.V. AU - Graw, J. AU - Rathkolb, B. AU - Wolf, E.* AU - Adler, T. AU - Busch, D.H.* AU - Gailus-Durner, V. AU - Fuchs, H. AU - Hrabě de Angelis, M. AU - van der Velde, A.* AU - Tessarollo, L.* AU - Ovcherenko, I.* AU - Landsman, D.* AU - Bustin, M.* C1 - 46333 C2 - 37498 CY - Cold Spring Harbor SP - 1295-1308 TI - Functional compensation among HMGN variants modulates the DNase I hypersensitive sites at enhancers. JO - Genome Res. VL - 25 IS - 9 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2015 SN - 1088-9051 ER - TY - JOUR AB - Glucocorticoids (GCs) are commonly prescribed drugs, but their anti-inflammatory benefits are mitigated by metabolic side effects. Their transcriptional effects, including tissue-specific gene activation and repression, are mediated by the glucocorticoid receptor (GR), which is known to bind as a homodimer to a palindromic DNA sequence. Using ChIP-exo in mouse liver under endogenous corticosterone exposure, we report here that monomeric GR interaction with a half-site motif is more prevalent than homodimer binding. Monomers colocalize with lineage-determining transcription factors in both liver and primary macrophages, and the GR half-site motif drives transcription, suggesting that monomeric binding is fundamental to GR's tissue-specific functions. In response to exogenous GC in vivo, GR dimers assemble on chromatin near ligand-activated genes, concomitant with monomer evacuation of sites near repressed genes. Thus, pharmacological GCs mediate gene expression by favoring GR homodimer occupancy at classic palindromic sites at the expense of monomeric binding. The findings have important implications for improving therapies that target GR. AU - Lim, H.W.* AU - Uhlenhaut, N.H. AU - Rauch, A.* AU - Weiner, J.* AU - Hübner, S.* AU - Hubner, N.* AU - Won, K.J.* AU - Lazar, M.A.* AU - Tuckermann, J.P.* AU - Steger, D.J.* C1 - 44812 C2 - 37062 CY - Cold Spring Harbor SP - 836-844 TI - Genomic redistribution of GR monomers and dimers mediates transcriptional response to exogenous glucocorticoid in vivo. JO - Genome Res. VL - 25 IS - 6 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2015 SN - 1088-9051 ER - TY - JOUR AB - Genome-wide association studies (GWAS) identified the MEIS1 locus for Restless Legs Syndrome (RLS), but causal single nucleotide polymorphisms (SNPs) and their functional relevance remain unknown. This locus contains a large number of highly conserved noncoding regions (HCNRs) potentially functioning as cis-regulatory modules. We analyzed these HCNRs for allele-dependent enhancer activity in zebrafish and mice and found that the risk allele of the lead SNP rs12469063 reduces enhancer activity in the Meis1 expression domain of the murine embryonic ganglionic eminences (GE). CREB1 binds this enhancer and rs12469063 affects its binding in vitro. In addition, MEIS1 target genes suggest a role in the specification of neuronal progenitors in the GE, and heterozygous Meis1-deficient mice exhibit hyperactivity, resembling the RLS phenotype. Thus, in vivo and in vitro analysis of a common SNP with small effect size showed allele-dependent function in the prospective basal ganglia representing the first neurodevelopmental region implicated in RLS. AU - Spieler, D. AU - Kaffe, M. AU - Knauf, F. AU - Bessa, J.* AU - Tena, J.J.* AU - Giesert, F. AU - Schormair, B. AU - Tilch, E. AU - Lee, H.* AU - Horsch, M. AU - Czamara, D.* AU - Karbalai, N.* AU - von Toerne, C. AU - Waldenberger, M. AU - Gieger, C. AU - Lichtner, P. AU - Claussnitzer, M.* AU - Naumann, R.* AU - Müller-Myhsok, B.* AU - Torres, M.* AU - Garrett, L. AU - Rozman, J. AU - Klingenspor, M. AU - Gailus-Durner, V. AU - Fuchs, H. AU - Hrabě de Angelis, M. AU - Beckers, J. AU - Hölter, S.M. AU - Meitinger, T. AU - Hauck, S.M. AU - Laumen, H. AU - Wurst, W. AU - Casares, F.* AU - Gómez-Skarmeta, J.L.* AU - Winkelmann, J. C1 - 30806 C2 - 36325 CY - Cold Spring Harbor SP - 592-603 TI - Restless Legs Syndrome-associated intronic common variant in Meis1 alters enhancer function in the developing telencephalon. JO - Genome Res. VL - 24 IS - 4 PB - Cold Spring Harbor Lab Press, Publications Dept PY - 2014 SN - 1088-9051 ER - TY - JOUR AB - Nearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage. In the precursors of platelets and erythrocytes, as well as in monocytes, we found that open chromatin signatures reflect the corresponding hematopoietic lineages of the studied cell types and associate with the cell type-specific gene expression patterns. Dependent on their signal strength, open chromatin regions showed correlation with promoter and enhancer histone marks, distance to the transcription start site, and ontology classes of nearby genes. Cell type-restricted regions of open chromatin were enriched in sequence variants associated with hematological indices. The majority (63.6%) of such candidate functional variants at platelet quantitative trait loci (QTLs) coincided with binding sites of five transcription factors key in regulating megakaryopoiesis. We experimentally tested 13 candidate regulatory variants at 10 platelet QTLs and found that 10 (76.9%) affected protein binding, suggesting that this is a frequent mechanism by which regulatory variants influence quantitative trait levels. Our findings demonstrate that combining large-scale GWA data with open chromatin profiles of relevant cell types can be a powerful means of dissecting the genetic architecture of closely related quantitative traits. AU - Paul, D.S.* AU - Albers, C.A.* AU - Rendon, A.* AU - Voss, K.* AU - Stephens, J.* AU - HaemGen Consortium (Döring, A. AU - Gieger, C. AU - Illig, T. AU - Meisinger, C. AU - Ried, J.S. AU - Wichmann, H.-E.) AU - van der Harst, P.* AU - Chambers, J.C.* AU - Soranzo, N.* AU - Ouwehand, W.H.* AU - Deloukas, P.* C1 - 28870 C2 - 33556 SP - 1130-1141 TI - Maps of open chromatin highlight cell type-restricted patterns of regulatory sequence variation at hematological trait loci. JO - Genome Res. VL - 23 IS - 7 PY - 2013 SN - 1088-9051 ER - TY - JOUR AB - RNA synthesis and decay rates determine the steady-state levels of cellular RNAs. Metabolic tagging of newly transcribed RNA by 4-thiouridine (4sU) can reveal the relative contributions of RNA synthesis and decay rates. The kinetics of RNA processing, however, had so far remained unresolved. Here, we show that ultrashort 4sU-tagging not only provides snapshot pictures of eukaryotic gene expression but, when combined with progressive 4sU-tagging and RNA-seq, reveals global RNA processing kinetics at nucleotide resolution. Using this method, we identified classes of rapidly and slowly spliced/degraded introns. Interestingly, each class of splicing kinetics was characterized by a distinct association with intron length, gene length, and splice site strength. For a large group of introns, we also observed long lasting retention in the primary transcript, but efficient secondary splicing or degradation at later time points. Finally, we show that processing of most, but not all small nucleolar (sno)RNA-containing introns is remarkably inefficient with the majority of introns being spliced and degraded rather than processed into mature snoRNAs. In summary, our study yields unparalleled insights into the kinetics of RNA processing and provides the tools to study molecular mechanisms of RNA processing and their contribution to the regulation of gene expression. AU - Windhager, L.* AU - Bonfert, T.* AU - Burger, K. AU - Ruzsics, Z.* AU - Krebs, S.* AU - Kaufmann, S.* AU - Malterer, G.* AU - L'hernault, A.* AU - Schilhabel, M.* AU - Schreiber, S.* AU - Rosenstiel, P.* AU - Zimmer, R.* AU - Eick, D. AU - Friedel, C.C.* AU - Dölken, L.* C1 - 10704 C2 - 30414 SP - 2031-2042 TI - Ultrashort and progressive 4sU-tagging reveals key characteristics of RNA processing at nucleotide resolution. JO - Genome Res. VL - 22 IS - 10 PB - Cold Spring Harbor Laboratory Press PY - 2012 SN - 1088-9051 ER - TY - JOUR AB - Mutational screens are an effective means used in the functional annotation of a genome. We present a method for a mutational screen of the mouse X chromosome using gene trap technologies. This method has the potential to screen all of the genes on the X chromosome without establishing mutant animals, as all gene-trapped embryonic stem (ES) cell lines are hemizygous null for mutations on the X chromosome. Based on this method, embryonic morphological phenotypes and expression patterns for 58 genes were assessed, approximately 10% of all human and mouse syntenic genes on the X chromosome. Of these, 17 are novel embryonic lethal mutations and nine are mutant mouse models of genes associated with genetic disease in humans, including BCOR and PORCN. The rate of lethal mutations is similar to previous mutagenic screens of the autosomes. Interestingly, some genes associated with X-linked mental retardation (XLMR) in humans show lethal phenotypes in mice, suggesting that null mutations cannot be responsible for all cases of XLMR. The entire data set is available via the publicly accessible website (http://xlinkedgenes.ibme.utoronto.ca/). AU - Cox, B.J.* AU - Vollmer, M. AU - Tamplin, O.* AU - Lu, M.* AU - Biechele, S.* AU - Gertsenstein, M.* AU - van Campenhout, C.A. AU - Floß, T. AU - Kühn, R. AU - Wurst, W. AU - Lickert, H. AU - Rossant, J.* C1 - 3213 C2 - 27274 SP - 1154-1164 TI - Phenotypic annotation of the mouse X chromosome. JO - Genome Res. VL - 20 IS - 8 PB - Cold Spring Harbor Lab Press PY - 2010 SN - 1088-9051 ER - TY - JOUR AB - We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ~32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. AU - Itoh, T.* AU - Tanaka, T.* AU - Barrero, R.A.* AU - Yamasaki, C.* AU - Fujii, Y.* AU - Hilton, P.B.* AU - Antonio, B.A.* AU - Aono, H.* AU - Apweiler, R.* AU - Bruskiewich, R.* AU - Bureau, T.* AU - Burr, F.* AU - Costa, de, Oliveira, A.* AU - Fuks, G.* AU - Habara, T.* AU - Haberer, G. AU - Han, B.* AU - Harada, E.* AU - Hiraki, AT.* AU - Hirochika, H.* AU - Hoen, D.* AU - Hokari, H.* AU - Hosokawa, S.* AU - Hsing, Y.I.* AU - Ikawa, H.* AU - Ikeo, K.* AU - Imanishi, T.* AU - Ito, Y.* AU - Jaiswal, P.* AU - Kanno, M.* AU - Kawahara, Y.* AU - Kawamura, T.* AU - Kawashima, H.* AU - Khurana, J.P.* AU - Kikuchi, S.* AU - Komatsu, S.* AU - Koyanagi, K.O.* AU - Kubooka, H.* AU - Lieberherr, D.* AU - Lin, Y.C.* AU - Lonsdale, D.* AU - Matsumoto, T.* AU - Matsuya, A.* AU - McCombie, W.R.* AU - Messing, J.* AU - Miyao, A.* AU - Mulder, N.* AU - Nagamura, Y.* AU - Nam, J.* AU - Namiki, N.* AU - Numa, H.* AU - Nurimoto, S.* AU - O'Donovan, C.* AU - Ohyanagi, H.* AU - Okido, T.* AU - Oota, S.* AU - Osato, N.* AU - Palmer, L.E.* AU - Quétier, F.* AU - Raghuvanshi, S.* AU - Saichi, N.* AU - Sakai, H.* AU - Sakai, Y.* AU - Sakata, K.* AU - Sakurai, T.* AU - Sato, F.* AU - Sato, Y.* AU - Schoof, H. AU - Seki, M.* AU - Shibata, M.* AU - Shimizu, Y.* AU - Shinozaki, K.* AU - Shinso, Y.* AU - Singh, N.K.* AU - Smith-White, B.* AU - Takeda, J.* AU - Tanino, M.* AU - Tatusova, T.* AU - Thongjuea, S.* AU - Todokoro, F.* AU - Tsugane, M.* AU - Tyagi, A.K.* AU - Vanavichit, A.* AU - Wang, A.* AU - Wing, R.A.* AU - Yamaguchi, K.* AU - Yamamoto, M.* AU - Yamamoto, N.* AU - Yu, Y.* AU - Zhang, H.* AU - Zhao, Q.* AU - Higo, K.* AU - Burr, B.* AU - Gojobori, T.* AU - Sasaki, T.* AU - Rice Annotation Project () C1 - 5817 C2 - 24574 SP - 175-183 TI - Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. JO - Genome Res. VL - 17 IS - 2 PB - Cold Spring Harbor Laboratory Pr. PY - 2007 SN - 1088-9051 ER - TY - JOUR AB - Maize (Zea mays or corn), both a major food source and an important cytogenetic model, evolved from a tetraploid that arose about 4.8 million years ago (Mya). As a result, maize has extensive duplicated regions within its genome. We have sequenced the two copies of one such region, generating 7.8 Mb of sequence spanning 17.4 cM of the short arm of chromosome 1 and 6.6 Mb (25.6 cM) from the long arm of chromosome 9. Rice, which did not undergo a similar whole genome duplication event, has only one orthologous region (4.9 Mb) on the short arm of chromosome 3, and can be used as reference for the maize homoeologous regions. Alignment of the three regions allowed identification of syntenic blocks, and indicated that the maize regions have undergone differential contraction in genic and intergenic regions and expansion by the insertion of retrotransposable elements. Approximately 9% of the predicted genes in each duplicated region are completely missing in the rice genome, and almost 20% have moved to other genomic locations. Predicted genes within these regions tend to be larger in maize than in rice, primarily because of the presence of predicted genes in maize with larger introns. Interestingly, the general gene methylation patterns in the maize homoeologous regions do not appear to have changed with contraction or expansion of their chromosomes. In addition, no differences in methylation of single genes and tandemly repeated gene copies have been detected. These results, therefore, provide new insights into the diploidization of polyploid species. ©2006 by Cold Spring Harbor Laboratory Press. AU - Bruggmann, R. AU - Bharti, A.K.* AU - Gundlach, H. AU - Lai, J.* AU - Young, S.* AU - Pontaroli, A.C.* AU - Wie, F.* AU - Haberer, G. AU - Fuks, G.* AU - Du, C.* AU - Raymond, C.* AU - Estep, M.C.* AU - Liu, R.* AU - Bennetzen, J.L.* AU - Chan, A.P.* AU - Rabinowicz, P.D.* AU - Quackenbush, J.* AU - Barbazuk, W.B.* AU - Wing, R.A.* AU - Birren, B.* AU - Nusbaum, C.* AU - Rounsley, S.* AU - Mayer, K.F.X. AU - Messing, J.* C1 - 3560 C2 - 23885 SP - 1241-1251 TI - Uneven chromosome contractionand expansion in the maize genome. JO - Genome Res. VL - 16 IS - 10 PY - 2006 SN - 1088-9051 ER - TY - JOUR AB - The cereal endosperm is a major organ of the seed and an important component of the world's food supply. To understand the development and physiology of the endosperm of cereal seeds, we focused on the identification of genes expressed at various times during maize endosperm development. We constructed several cDNA libraries to identify full-length clones and subjected them to a twofold enrichment. A total of 23,348 high-quality sequence-reads from 5'- and 3'-ends of cDNAs were generated and assembled into a unigene set representing 5326 genes with paired sequence-reads. Additional sequencing yielded a total of 3160 (59%) completely sequenced, full-length cDNAs. From 5326 unigenes, 4139 (78%) can be aligned with 5367 predicted rice genes and by taking only the "best hit" be mapped to 3108 positions on the rice genome. The 22% unigenes not present in rice indicate a rapid change of gene content between rice and maize in only 50 million years. Differences in rice and maize gene numbers also suggest that maize has lost a large number of duplicated genes following tetraploidization. The larger number of gene copies in rice suggests that as many as 30% of its genes arose from gene amplification, which would extrapolate to a significant proportion of the estimated 44,027 candidate genes of its entire genome. Functional classification of the maize endosperm unigene set indicated that more than a fourth of the novel functionally assignable genes found in this study are involved in carbohydrate metabolism, consistent with its role as a storage organ. AU - Lai, J.* AU - Dey, N.* AU - Kim, C.-S.* AU - Bharti, A.K.* AU - Rudd, S. AU - Mayer, K.F.X. AU - Larkins, B.A.* AU - Becraft, P.* AU - Messing, J.* C1 - 502 C2 - 22220 SP - 1932-1937 TI - Characterization of the maize endosperm transcriptome and its comparison to the rice genome. JO - Genome Res. VL - 14 PY - 2004 SN - 1088-9051 ER - TY - JOUR AB - Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells. They are major determinants of locus control of gene expression and can shield gene expression from position effects. Experimental detection of S/MARs requires substantial effort and is not suitable for large-scale screening of genomic sequences. In silico prediction of S/MARs can provide a crucial first selection step to reduce the number of candidates. We used experimentally defined S/MAR sequences as the training set and generated a library of new S/MAR-associated, AT-rich patterns described as weight matrices. A new tool called SMARTest was developed that identifies potential S/MARs by performing a density analysis based on the S/MAR matrix library (http://www.genomatix.de/cgi-bin/smartest_pd/smartest.pl). S/MAR predictions were evaluated by using six genomic sequences from animal and plant for which S/MARs and non-S/MARs were experimentally mapped. SMARTest reached a sensitivity of 38% and a specificity of 68%. In contrast to previous algorithms, the SMARTest approach does not depend on the sequence context and is suitable to analyze long genomic sequences up to the size of whole chromosomes. To demonstrate the feasibility of large-scale S/MAR prediction, we analyzed the recently published chromosome 22 sequence and found 1198 S/MAR candidates AU - Frisch, M.* AU - Frech, K.* AU - Klingenhoff, A.* AU - Cartharius, K.* AU - Liebich, I.* AU - Werner, T. C1 - 22030 C2 - 20605 SP - 349-354 TI - In Silico Prediction of Scaffold/Matrix Attachment Regions in Large Genomic Sequences. JO - Genome Res. VL - 12 IS - 2 PY - 2002 SN - 1088-9051 ER - TY - JOUR AB - The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for "chromosomal scaffolding", by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. AU - Scherf, M. AU - Klingenhoff, A. AU - Frech, K.* AU - Quandt, K.* AU - Schneider, R. AU - Grote, K.* AU - Frisch, M.* AU - Gailus-Durner, V. AU - Seidel, A. AU - Brack-Werner, R. AU - Werner, T. C1 - 10158 C2 - 22359 SP - 333-340 TI - First pass annotation of promoters on human chromosome 22. JO - Genome Res. VL - 11 PY - 2001 SN - 1088-9051 ER -