Tomaz da Silva, P.* ; Karollus, A.* ; Hingerl, J.* ; Galindez, G.S.T.* ; Wagner, N.* ; Hernandez-Alias, X.* ; Incarnato, D.* ; Gagneur, J.
Nucleotide dependency analysis of genomic language models detects functional elements.
Nat. Genet. 57, 2589-2602 (2025)
Deciphering how nucleotides in genomes encode regulatory instructions and molecular machines is a long-standing goal. Genomic language models (gLMs) implicitly capture functional elements and their organization from genomic sequences alone by modeling probabilities of each nucleotide given its sequence context. However, discovering functional genomic elements from gLMs has been challenging due to the lack of interpretable methods. Here we introduce nucleotide dependencies, which quantify how nucleotide substitutions at one genomic position affect the probabilities of nucleotides at other positions. We demonstrate that nucleotide dependencies are more effective at indicating the deleteriousness of genetic variants than alignment-based conservation and gLM reconstruction. Dependency analysis accurately detects regulatory motifs and highlights bases in contact within RNAs, including pseudoknots and tertiary structure contacts, revealing new, experimentally validated RNA structures. Finally, we leverage dependency maps to reveal critical limitations of several gLM architectures and training strategies. Altogether, nucleotide dependency analysis opens a new avenue for discovering and studying functional elements and their interactions in genomes.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Typ der Hochschulschrift
Herausgeber
Schlagwörter
Rna; Identification; Classification; Prediction; Alignment; Database; Genes
Keywords plus
Sprache
englisch
Veröffentlichungsjahr
2025
Prepublished im Jahr
0
HGF-Berichtsjahr
2025
ISSN (print) / ISBN
1061-4036
e-ISSN
1546-1718
ISBN
Bandtitel
Konferenztitel
Konferzenzdatum
Konferenzort
Konferenzband
Quellenangaben
Band: 57,
Heft: 10,
Seiten: 2589-2602
Artikelnummer: ,
Supplement: ,
Reihe
Verlag
Nature Publishing Group
Verlagsort
New York, NY
Tag d. mündl. Prüfung
0000-00-00
Betreuer
Gutachter
Prüfer
Topic
Hochschule
Hochschulort
Fakultät
Veröffentlichungsdatum
0000-00-00
Anmeldedatum
0000-00-00
Anmelder/Inhaber
weitere Inhaber
Anmeldeland
Priorität
Begutachtungsstatus
Peer reviewed
POF Topic(s)
30205 - Bioengineering and Digital Health
Forschungsfeld(er)
Enabling and Novel Technologies
PSP-Element(e)
G-503800-001
Förderungen
European Union
Helmholtz Association under the joint research school 'Munich School for Data Science-MUDS'
Dutch Research Council (NWO)
NWO Open Competitie ENW-XS
European Research Council (ERC), European Union's Horizon Europe research and innovation program
EMBO Postdoctoral Fellowship
German Bundesministerium fur Bildung und Forschung (BMBF) through the Model Exchange for Regulatory Genomics project MERGE
Deutsche Forschungsgemeinschaft (DFG
German Research Foundation)
EVUK program ('Next-generation AI for Integrated Diagnostics') of the Free State of Bavaria
DFG (German Research Foundation)
DFG (German Research Foundation) through the IT Infrastructure for Computational Molecular Medicine
ERC (EPIC)
Munich Center for Machine Learning
Copyright
Erfassungsdatum
2025-10-14