Single-cell reference atlases are large-scale, cell-
level maps that capture cellular heterogeneity
within an organ using single cell genomics. Given
their size and cellular diversity, these atlases serve
as high-quality training data for the transfer of cell
type labels to new datasets. Such label transfer,
however, must be robust to domain shifts in gene
expression due to measurement technique, lab
specifics and more general batch effects. This re-
quires methods that provide uncertainty estimates
on the cell type predictions to ensure correct in-
terpretation. Here, for the first time, we introduce
uncertainty quantification methods for cell type
classification on single-cell reference atlases. We
benchmark four model classes and show that cur-
rently used models lack calibration, robustness,
and actionable uncertainty scores. Furthermore,
we demonstrate how models that quantify uncer-
tainty are better suited to detect unseen cell types
in the setting of atlas-level cell type transfer.