PuSH - Publication Server of Helmholtz Zentrum München

A data-driven solution for the cold start problem in biomedical image classification.

In: (Proceedings - International Symposium on Biomedical Imaging, 27-30 May 2024, Athen). 2024. DOI: 10.1109/ISBI56570.2024.10635886 (Proceedings - International Symposium on Biomedical Imaging)
DOI
The demand for large quantities of high-quality annotated images poses a significant bottleneck for developing effective deep learning-based classifiers in the biomedical domain. We present a simple yet powerful solution to the cold start problem, i.e., selecting the most informative data for annotation within an unlabeled dataset. Our framework consists of three key components: (i) A self-supervised encoder to construct meaningful representations of unlabeled data, (ii) a sampling method selecting the most representative data points for annotation, and (iii) a classifier head using model ensembling to overcome the lack of validation data. We test our approach on four challenging public biomedical datasets. Our strategy outperforms the state-of-the-art approach in detecting the representative data points in all datasets and achieves a 7% improvement on a leukemia blood cell classification task. Our work offers a practical and efficient solution to the challenges associated with tedious and costly, high-quality data annotations in the biomedical field. We make our framework's code publicly available on https://github.com/marrlab/initial-data-point-selection.
Altmetric
Additional Metrics?
Edit extra informations Login
Publication type Article: Conference contribution
Corresponding Author
ISSN (print) / ISBN 1945-7928
e-ISSN 1945-8452
Conference Title Proceedings - International Symposium on Biomedical Imaging
Conference Date 27-30 May 2024
Conference Location Athen
Non-patent literature Publications
Institute(s) Institute of AI for Health (AIH)
Helmholtz Artifical Intelligence Cooperation Unit (HAICU)