PuSH - Publication Server of Helmholtz Zentrum München

Inferring protein from transcript abundances using convolutional neural networks.

BioData Min. 18:18 (2025)
Publ. Version/Full Text Research data DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
BACKGROUND: Although transcript abundance is often used as a proxy for protein abundance, it is an unreliable predictor. As proteins execute biological functions and their expression levels influence phenotypic outcomes, we developed a convolutional neural network (CNN) to predict protein abundances from mRNA abundances, protein sequence, and mRNA sequence in Homo sapiens (H. sapiens) and the reference plant Arabidopsis thaliana (A. thaliana). RESULTS: After hyperparameter optimization and initial data exploration, we implemented distinct training modules for value-based and sequence-based data. By analyzing the learned weights, we revealed common and organism-specific sequence features that influence protein-to-mRNA ratios (PTRs), including known and putative sequence motifs. Adding condition-specific protein interaction information identified genes correlated with many PTRs but did not improve predictions, likely due to insufficient data. The integrated model predicted protein abundance on unseen genes with a coefficient of determination (r2) of 0.30 in H. sapiens and 0.32 in A. thaliana. CONCLUSIONS: For H. sapiens, our model improves prediction performance by nearly 50% compared to previous sequence-based approaches, and for A. thaliana it represents the first model of its kind. The model's learned motifs recapitulate known regulatory elements, supporting its utility in systems-level and hypothesis-driven research approaches related to protein regulation.
Altmetric
Additional Metrics?
Edit extra informations Login
Publication type Article: Journal article
Document type Scientific Article
Corresponding Author
Keywords Convolutional Neural Networks ; Explainable Ai ; Protein-to-mrna Ratio ; Regression Analysis ; Translational Regulation; Rna-binding Proteins; Messenger-rna; Translation; Interactome; Regions; Codon; Tool; Seq
ISSN (print) / ISBN 1756-0381
e-ISSN 1756-0381
Journal BioData Mining
Quellenangaben Volume: 18 Issue: 1, Pages: , Article Number: 18 Supplement: ,
Publisher BioMed Central
Publishing Place London
Non-patent literature Publications
Reviewing status Peer reviewed
Institute(s) Institute of Network Biology (INET)
Grants Horizon 2020 Framework Programme