Single-cell RNA sequencing has greatly enhanced our understanding of the
lungs and airways, although rare cell types and unique subtypes are
often overlooked in individual studies. Recently, the Human Lung Cell
Atlas (HLCA) was developed to identify these rare cell types in human
studies and to standardize cell type identification across various
datasets. However, there is a notable lack of references for mouse
single-cell studies, especially concerning disease states. In response,
we developed The Mouse Lung Disease Cell Atlas (MLDCA), which integrates
17 single-cell datasets encompassing 200 mice and a total of 773,732
cells across 24 disease models. To ensure the best integration of our
datasets, we utilized the scIB benchmarking analysis, which identified
the scArches scANVI pipeline as the most effective method. We determined
2,000 highly variable genes (HVGs) across our studies for this
integration. After performing the integration, we analysed gene
signature profiles, including those related to interferon response, and
conducted multiple reclustering to identify unique cell types. These
findings were further validated through spatial transcriptomics. In
total, we categorized 5 hierarchical cell type annotations, with our
most detailed definitions primarily consisting of immune cells and
incorporating 53 unique cell types. This includes rare cells such as
mast cells, neutrophils, pericytes, and plasmacytoid dendritic cells
(pDCs), as well as cell types specific to certain disease states.
Notably, our atlas identified viral and smoking-specific cell subtypes
present only during viral infections (such as COVID-19, influenza, and
herpesvirus) and cigarette smoke exposure, respectively. Furthermore, we
developed a deconvolution reference matrix to accurately predict cell
types in bulk RNA sequencing data of mouse lungs, which we validated
against histological results. Our analysis revealed that biological
factors, particularly age, have a greater influence on the composition
of single-cell datasets in mice than technical factors, such as total
RNA counts and sequencing platforms. Nevertheless, the choice of
sequencing platform remains crucial when rare cell types are of
interest. In addition to providing a comprehensive atlas, the MLDCA
offers publicly available resources that allow other researchers to
annotate and map cell types and define candidate genes across mouse
models in single-cell datasets.