PuSH - Publication Server of Helmholtz Zentrum München

Van Deursen, R.* ; Ertl, P.* ; Tetko, I.V. ; Godin, G.*

GEN: Highly efficient SMILES explorer using autodidactic generative examination networks.

J. Cheminformatics 12:22 (2020)
Publ. Version/Full Text DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95-98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85-90%) while generating SMILES with strong conservation of the property space (95-99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
5.318
1.541
6
12
Tags
Annotations
Special Publikation
Hide on homepage

Edit extra information
Edit own tags
Private
Edit own annotation
Private
Hide on publication lists
on hompage
Mark as special
publikation
Publication type Article: Journal article
Document type Scientific Article
Keywords Autonomous Learning ; Gen ; Gan ; Rnn ; Lstm ; Gru ; Bilstm ; Bigru ; Ai ; Smiles ; Generator ; Quality Control ; Sqc; Chemical Space; Design
Language english
Publication Year 2020
HGF-reported in Year 2020
e-ISSN 1758-2946
Quellenangaben Volume: 12, Issue: 1, Pages: , Article Number: 22 Supplement: ,
Publisher BioMed Central
Publishing Place Campus, 4 Crinan St, London N1 9xw, England
Reviewing status Peer reviewed
POF-Topic(s) 30203 - Molecular Targets and Therapies
Research field(s) Enabling and Novel Technologies
PSP Element(s) G-503000-001
Scopus ID 85083172661
PubMed ID 33430998
Erfassungsdatum 2020-05-05