Unleashing high content screening in hit detection - Benchmarking AI workflows including novelty detection.
Comp. Struc. Biotech. J. 20, 5453-5465 (2022)
Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publication type
Article: Journal article
Document type
Scientific Article
Thesis type
Editors
Keywords
Bioactives ; Cell Painting ; Classifier ; Deep Learning ; High-content Screening ; Hit Detection ; Machine Learning ; Novelty Detection
Keywords plus
Language
english
Publication Year
2022
Prepublished in Year
HGF-reported in Year
2022
ISSN (print) / ISBN
2001-0370
e-ISSN
2001-0370
ISBN
Book Volume Title
Conference Title
Conference Date
Conference Location
Proceedings Title
Quellenangaben
Volume: 20,
Issue: ,
Pages: 5453-5465
Article Number: ,
Supplement: ,
Series
Publisher
Research Network of Computational and Structural Biotechnology (RNCSB)
Publishing Place
Day of Oral Examination
0000-00-00
Advisor
Referee
Examiner
Topic
University
University place
Faculty
Publication date
0000-00-00
Application date
0000-00-00
Patent owner
Further owners
Application country
Patent priority
Reviewing status
Peer reviewed
POF-Topic(s)
30202 - Environmental Health
30203 - Molecular Targets and Therapies
Research field(s)
Environmental Sciences
Enabling and Novel Technologies
PSP Element(s)
G-504800-001
G-505293-001
Grants
Copyright
Erfassungsdatum
2022-11-23