High-content imaging and single-cell genomics are two of the most prominent high-throughput technologies for studying cellular properties and functions at scale. Recent studies have demonstrated that information in large imaging datasets can be used to estimate gene mutations and to predict the cell-cycle state and the cellular decision making directly from cellular morphology. Thus, high-throughput imaging methodologies, such as imaging flow cytometry can potentially aim beyond simple sorting of cellpopulations. We introduce IFC-seq, a machine learning methodology for predicting the expression profile of every cell in an imaging flow cytometry experiment. Since it is to-date unfeasible to observe singlecell gene expression and morphology in flow, we integrate uncoupled imaging data with an independent transcriptomics dataset by leveraging common surface markers. We demonstrate that IFC-seq successfully models gene expression of a moderate number of key gene-markers for two independent imaging flow cytometry datasets: (i) human blood mononuclear cells and (ii) mouse myeloid progenitor cells. In the case of mouse myeloid progenitor cells IFC-seq can predict gene expression directly from brightfield images in a label-free manner, using a convolutional neural network. The proposed method promises to add gene expression information to existing and new imaging flow cytometry datasets, at no additional cost.
GrantsDeutsche Forschungsgemeinschaft Chan Zuckerberg Initiative DAF (advised fund of Silicon Valley Community Foundation) Helmholtz Association (Incubator grant sparse2big) BMBF DFG Fellowship through the Graduate School of Quantitative Biosciences Munich (QBM)