PuSH - Publikationsserver des Helmholtz Zentrums München

Cirino, T.* ; Caron, G.* ; Ermondi, G.* ; Charochkina, L.L.* ; Tetko, I.V.

SangsterLogP - the largest publicly available dataset of logP values.

Sci. Data, DOI: 10.1038/s41597-026-07357-2 (2026)
Postprint DOI PMC
Open Access Gold möglich sobald Verlagsversion bei der ZB eingereicht worden ist.
We present SangsterLogP, the largest publicly available curated dataset of experimental logP values, comprising more than 23k unique molecules, with experimental logP values ranging from -3.8 to 11.7 (about 15.9 log units). The dataset originated from Dr. James Sangster's comprehensive literature review of over 3k sources. We implemented a systematic curation workflow including a) logD-to-logP adjustment for ionised compounds and b) consensus-based residual analysis for outliers and duplicates removal. External validation using retrospective and prospective test sets demonstrated robust predictive performance (RMSE of 0.34 and 0.47 log units, respectively). SangsterLogP also substantially expands coverage of chemical space compared to the widely used legacy PHYSPROP database, including compounds in the beyond-Rule-of-5 domain. The fully annotated dataset, including experimental conditions and sources, is freely accessible via the Zenodo repository and on the Online Chemical database and Modelling Environment website.
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Workflow ; Outlier ; Data Curation ; Residual ; Chemical Space ; Data Source
ISSN (print) / ISBN 2052-4463
e-ISSN 2052-4463
Zeitschrift Scientific Data
Verlag Springer
Verlagsort London
Begutachtungsstatus Peer reviewed
Förderungen European Comission (Erasmus Mundus Joint Master)
HORIZON EUROPE Marie Sklodowska-Curie Actions