PuSH - Publikationsserver des Helmholtz Zentrums München

Investigating the performance of foundation models on human 3'UTR sequences.

Nucleic Acids Res. 53:gkaf871 (2025)
Verlagsversion DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
Foundation models, such as DNABERT and Nucleotide Transformer, have recently shaped a new direction in DNA research. Trained in an unsupervised manner on a vast quantity of genomic data, they can be used for a variety of downstream tasks, such as promoter prediction, DNA methylation prediction, gene network prediction, or functional variant prioritization. However, these models are often trained and evaluated on entire genomes, neglecting genome partitioning into different functional regions. In our study, we investigate the efficacy of various unsupervised approaches, including genome-wide and 3' untranslated region (3'UTR)-specific foundation models on human 3'UTR regions. To this end, we train a set of popular transformer architectures on a 3'UTR-specific dataset comprising 3 783 714 3'UTR sequences (6.6B bp) of 241 Zoonomia species. Our evaluation includes downstream tasks specific for RNA biology, such as recognition of binding motifs of RNA-binding proteins, detection of functional genetic variants, prediction of expression levels in massively parallel reporter assays, and estimation of messenger RNA half-life. Remarkably, models specifically trained on 3'UTR sequences demonstrate superior performance when compared to established genome-wide foundation models in three out of four downstream tasks. Our results underscore the importance of considering genome partitioning into distinct functional regions when training and evaluating foundation models. In addition, the proposed set of 3'UTR-specific tasks can be used for benchmarking of future models.
Impact Factor
Scopus SNIP
Altmetric
13.100
0.000
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Sprache englisch
Veröffentlichungsjahr 2025
HGF-Berichtsjahr 2025
ISSN (print) / ISBN 0305-1048
e-ISSN 1362-4962
Quellenangaben Band: 53, Heft: 17, Seiten: , Artikelnummer: gkaf871 Supplement: ,
Verlag Oxford University Press
Verlagsort Great Clarendon St, Oxford Ox2 6dp, England
Begutachtungsstatus Peer reviewed
POF Topic(s) 30205 - Bioengineering and Digital Health
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-553500-001
Förderungen Deutsche Forschungsgemeinschaft
Scopus ID 105016548756
PubMed ID 40966500
Erfassungsdatum 2025-10-21