Human ratings are abstract representations of segmentation quality. To approximate human quality ratings on scarce expert data, we train surrogate quality estimation models. We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation, following the BraTS annotation protocol. The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations. Even though the networks operate on 2D images and with scarce training data, we can approximate segmentation quality within a margin of error comparable to human intra-rater reliability. Segmentation quality prediction has broad applications. While an understanding of segmentation quality is imperative for successful clinical translation of automatic segmentation quality algorithms, it can play an essential role in training new segmentation models. Due to the split-second inference times, it can be directly applied within a loss function or as a fully-automatic dataset curation mechanism in a federated learning setting.
Förderungen AIME GPU cloud services NIH/NINDS National Institutes of Health (NIH) Helmut Horten Foundation ERC, DFG, BMBF Graduate School of Bioengineering, Technical University of Munich Technical University of Munich - Institute for Advanced Study - German Excellence Initiative Anna Valentina Lioba Eleonora Claire Javid Mamasani Translational Brain Imaging Training Network (TRABIT) under the European Union's 'Horizon 2020' research & innovation program Deutsche Forschungsgemeinschaft (DFG) through TUM International Graduate School of Science and Engineering (IGSSE)