PuSH - Publikationsserver des Helmholtz Zentrums München

Karpov, P. ; Godin, G.* ; Tetko, I.V.

Transformer-CNN: Swiss knife for QSAR modeling and interpretation.

J. Cheminformatics 12:17 (2020)
Verlagsversion DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
5.318
1.541
38
53
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Augmentation ; Character-based Models ; Cheminformatics ; Classification ; Convolutional Neural Neural Networks ; Embeddings ; Qsar ; Regression ; Smiles ; Transformer Model; Aqueous Solubility; Neural-networks
Sprache englisch
Veröffentlichungsjahr 2020
HGF-Berichtsjahr 2020
e-ISSN 1758-2946
Quellenangaben Band: 12, Heft: 1, Seiten: , Artikelnummer: 17 Supplement: ,
Verlag Bmc
Verlagsort Campus, 4 Crinan St, London N1 9xw, England
Begutachtungsstatus Peer reviewed
POF Topic(s) 30203 - Molecular Targets and Therapies
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-503000-001
Scopus ID 85083271107
PubMed ID 33431004
Erfassungsdatum 2020-04-15