Open Access Gold möglich sobald Verlagsversion bei der ZB eingereicht worden ist.
SangsterLogP - the largest publicly available dataset of logP values.
Sci. Data, DOI: 10.1038/s41597-026-07357-2 (2026)
We present SangsterLogP, the largest publicly available curated dataset of experimental logP values, comprising more than 23k unique molecules, with experimental logP values ranging from -3.8 to 11.7 (about 15.9 log units). The dataset originated from Dr. James Sangster's comprehensive literature review of over 3k sources. We implemented a systematic curation workflow including a) logD-to-logP adjustment for ionised compounds and b) consensus-based residual analysis for outliers and duplicates removal. External validation using retrospective and prospective test sets demonstrated robust predictive performance (RMSE of 0.34 and 0.47 log units, respectively). SangsterLogP also substantially expands coverage of chemical space compared to the widely used legacy PHYSPROP database, including compounds in the beyond-Rule-of-5 domain. The fully annotated dataset, including experimental conditions and sources, is freely accessible via the Zenodo repository and on the Online Chemical database and Modelling Environment website.
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten
[➜Einloggen]
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Schlagwörter
Workflow ; Outlier ; Data Curation ; Residual ; Chemical Space ; Data Source
ISSN (print) / ISBN
2052-4463
e-ISSN
2052-4463
Zeitschrift
Scientific Data
Verlag
Springer
Verlagsort
London
Begutachtungsstatus
Peer reviewed
Institut(e)
Institute of Structural Biology (STB)
Förderungen
European Comission (Erasmus Mundus Joint Master)
HORIZON EUROPE Marie Sklodowska-Curie Actions
HORIZON EUROPE Marie Sklodowska-Curie Actions