PuSH - Publikationsserver des Helmholtz Zentrums München

Flöge, K. ; Udayakumar, S.* ; Sommer, J.* ; Piraud, M. ; Kesselheim, S. ; Fortuin, V. ; Günnemann, S.* ; van der Weg, K.J.* ; Gohlke, H.* ; Merdivan, E. ; Bazarova, A.

OneProt: Towards multi-modal protein foundation models via latent space alignment of sequence, structure, binding sites and text encoders.

PLoS Comput. Biol. 21:e1013679 (2025)
Verlagsversion Forschungsdaten DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
Recent advances in Artificial Intelligence have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal Deep Learning model for proteins that integrates structural, sequence, text, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of protein modality encoders in a lightweight fine-tuning scheme that focuses on pairwise alignment with sequence data, rather than requiring full matches. This novel approach comprises a mix of Graph Neural Networks and transformer architectures. It demonstrates good performance in retrieval tasks and showcases the efficacy of multi-modal systems in Protein Machine Learning through a broad spectrum of downstream baselines, including enzyme function prediction and binding site analysis. Furthermore, OneProt enables the transfer of representational information from specialized encoders to the sequence encoder, enhancing capabilities for distinguishing evolutionarily related and unrelated sequences and exhibiting representational properties where evolutionarily related proteins align in similar directions within the latent space. In addition, we extensively investigate modality ablations to identify the encoders that contribute the most to predictive performance, highlighting the significance of the binding site encoder, which has not been used in similar models previously. This work expands the horizons of multi-modal protein models, paving the way for transformative applications in drug discovery, biocatalytic reaction planning, and protein engineering.
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
ISSN (print) / ISBN 1553-734X
e-ISSN 1553-7358
Quellenangaben Band: 21, Heft: 11, Seiten: , Artikelnummer: e1013679 Supplement: ,
Verlag Public Library of Science (PLoS)
Verlagsort 1160 Battery Street, Ste 100, San Francisco, Ca 94111 Usa
Begutachtungsstatus Peer reviewed
Institut(e) Helmholtz Artifical Intelligence Cooperation Unit (HAICU)
Helmholtz AI - FZJ (HAI - FZJ)
Förderungen Gauss Centre for Supercomputing
Branco Weiss Fellowship - Society in Science
Helmholtz Association Initiative and Networking Fund
Helmholtz Foundational Model Initiative