PuSH - Publikationsserver des Helmholtz Zentrums München

Ostner, J. ; Li, H.* ; Müller, C.L.

Score matching for differential abundance testing of compositional high-throughput sequencing data.

Stat. Med. 45:e70534 (2026)
Verlagsversion Forschungsdaten DOI PMC
Open Access Hybrid
Creative Commons Lizenzvertrag
The class of a-b power interaction models, proposed by [1], provides a general framework for modeling sparse compositional data with pairwise feature interactions. This class includes many distributions as special cases and enables modeling of zero entries through power transformations, making it particularly suitable for modern high-throughput sequencing data with excess zeros, including single-cell RNA-Seq and microbial amplicon data. Here, we present an extension of this class of models that allows inclusion of covariate information, thus enabling accurate characterization of covariate dependencies in heterogeneous populations. Combining this model with a tailored differential abundance (DA) test leads to a novel DA testing scheme, cosmoDA, that can reduce the false positive detection rate caused by correlated features. cosmoDA uses penalized generalized score matching for parsimonious model fitting. We show on simulated benchmarks that cosmoDA can accurately estimate feature interactions in the presence of population heterogeneity and significantly reduces the false discovery rate when testing for differential abundance of correlated features. Using single-cell and amplicon data, we illustrate cosmoDA's ability to estimate data-adaptive Box-Cox-type data transformations and assess the impact of zero replacement and power transformations on downstream differential abundance results. cosmoDA is available at https://github.com/bio-datascience/cosmoDA.
Altmetric
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Compositional Data ; Differential Abundance ; Generative Model ; Microbiome ; Score Matching ; Single‐cell Rna Sequencing; Distributions; Models
ISSN (print) / ISBN 0277-6715
e-ISSN 1097-0258
Quellenangaben Band: 45, Heft: 8-9, Seiten: , Artikelnummer: e70534 Supplement: ,
Verlag Wiley
Verlagsort 111 River St, Hoboken 07030-5774, Nj Usa
Begutachtungsstatus Peer reviewed
Förderungen Helmholtz-Gemeinschaft
National Institutes of Health