Transformer-CNN: Swiss knife for QSAR modeling and interpretation.
J. Cheminformatics 12:17 (2020)
We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publication type
Article: Journal article
Document type
Scientific Article
Thesis type
Editors
Keywords
Augmentation ; Character-based Models ; Cheminformatics ; Classification ; Convolutional Neural Neural Networks ; Embeddings ; Qsar ; Regression ; Smiles ; Transformer Model; Aqueous Solubility; Neural-networks
Keywords plus
Language
english
Publication Year
2020
Prepublished in Year
HGF-reported in Year
2020
ISSN (print) / ISBN
e-ISSN
1758-2946
ISBN
Book Volume Title
Conference Title
Conference Date
Conference Location
Proceedings Title
Quellenangaben
Volume: 12,
Issue: 1,
Pages: ,
Article Number: 17
Supplement: ,
Series
Publisher
Bmc
Publishing Place
Campus, 4 Crinan St, London N1 9xw, England
Day of Oral Examination
0000-00-00
Advisor
Referee
Examiner
Topic
University
University place
Faculty
Publication date
0000-00-00
Application date
0000-00-00
Patent owner
Further owners
Application country
Patent priority
Reviewing status
Peer reviewed
POF-Topic(s)
30203 - Molecular Targets and Therapies
Research field(s)
Enabling and Novel Technologies
PSP Element(s)
G-503000-001
Grants
Copyright
Erfassungsdatum
2020-04-15