PuSH - Publication Server of Helmholtz Zentrum München

Huang, S.* ; Alier, E. ; Kilbertus, N. ; Pfister, N.*

Supervised learning and model analysis with compositional data.

PLoS Comput. Biol. 19:e1011240 (2023)
Publ. Version/Full Text DOI PMC
Open Access Gold
Creative Commons Lizenzvertrag
Supervised learning, such as regression and classification, is an essential tool for analyzing modern high-throughput sequencing data, for example in microbiome research. However, due to the compositionality and sparsity, existing techniques are often inadequate. Either they rely on extensions of the linear log-contrast model (which adjust for compositionality but cannot account for complex signals or sparsity) or they are based on black-box machine learning methods (which may capture useful signals, but lack interpretability due to the compositionality). We propose KernelBiome, a kernel-based nonparametric regression and classification framework for compositional data. It is tailored to sparse compositional data and is able to incorporate prior knowledge, such as phylogenetic structure. KernelBiome captures complex signals, including in the zero-structure, while automatically adapting model complexity. We demonstrate on par or improved predictive performance compared with state-of-the-art machine learning methods on 33 publicly available microbiome datasets. Additionally, our framework provides two key advantages: (i) We propose two novel quantities to interpret contributions of individual components and prove that they consistently estimate average perturbation effects of the conditional mean, extending the interpretability of linear log-contrast coefficients to nonparametric models. (ii) We show that the connection between kernels and distances aids interpretability and provides a data-driven embedding that can augment further analysis. KernelBiome is available as an open-source Python package on PyPI and at https://github.com/shimenghuang/KernelBiome.
Impact Factor
Scopus SNIP
Altmetric
4.300
1.278
Tags
Annotations
Special Publikation
Hide on homepage

Edit extra information
Edit own tags
Private
Edit own annotation
Private
Hide on publication lists
on hompage
Mark as special
publikation
Publication type Article: Journal article
Document type Scientific Article
Language english
Publication Year 2023
HGF-reported in Year 2023
ISSN (print) / ISBN 1553-734X
e-ISSN 1553-7358
Quellenangaben Volume: 19, Issue: 6, Pages: , Article Number: e1011240 Supplement: ,
Publisher Public Library of Science (PLoS)
Reviewing status Peer reviewed
POF-Topic(s) 30205 - Bioengineering and Digital Health
Research field(s) Enabling and Novel Technologies
PSP Element(s) G-530003-001
Scopus ID 85164748144
PubMed ID 37390111
Erfassungsdatum 2023-10-18