PuSH - Publikationsserver des Helmholtz Zentrums München

Hunklinger, A. ; Hartog, P. ; Šícho, M.* ; Godin, G.* ; Tetko, I.V.

The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS Joint Compound Solubility Challenge.

SLAS Discov. 29:100144 (2024)
Verlagsversion DOI PMC
Creative Commons Lizenzvertrag
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/​​27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Korrespondenzautor
Schlagwörter Kaggle Challenge ; Ochem ; Transformer Cnn ; Consensus ; Descriptor Based Models ; Graph Neural Networks ; Representation Learning ; Solubility Prediction; Molecular Descriptors; Neural Networks; In-vitro; Coefficient; Agreement
ISSN (print) / ISBN 2472-5552
e-ISSN 2472-5560
Zeitschrift SLAS Discovery
Quellenangaben Band: 29, Heft: 2, Seiten: , Artikelnummer: 100144 Supplement: ,
Verlag Sage
Verlagsort Thousand Oaks, Calif.
Nichtpatentliteratur Publikationen
Begutachtungsstatus Peer reviewed
Förderungen Ministry of Education, Youth and Sports of the Czech Republic
European Union