Tree-aggregated predictive modeling of microbiome data.
    
    
        
    
    
        
        Sci. Rep. 11:14505 (2021)
    
    
    
      
      
	
	    Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling, greatly reducing the need for user-defined aggregation in preprocessing while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbiome researchers gain insights into the structure and functioning of the underlying ecosystem of interest.
	
	
	    
	
       
      
	
	    
		Impact Factor
		Scopus SNIP
		Web of Science
Times Cited
		Scopus
Cited By
		Altmetric
		
	     
	    
	 
       
      
     
    
        Publication type
        Article: Journal article
    
 
    
        Document type
        Scientific Article
    
 
    
        Thesis type
        
    
 
    
        Editors
        
    
    
        Keywords
        Gut Microbiome; Regression; Diversity; Selection; Ph
    
 
    
        Keywords plus
        
    
 
    
    
        Language
        english
    
 
    
        Publication Year
        2021
    
 
    
        Prepublished in Year
        
    
 
    
        HGF-reported in Year
        2021
    
 
    
    
        ISSN (print) / ISBN
        2045-2322
    
 
    
        e-ISSN
        2045-2322
    
 
    
        ISBN
        
    
    
        Book Volume Title
        
    
 
    
        Conference Title
        
    
 
	
        Conference Date
        
    
     
	
        Conference Location
        
    
 
	
        Proceedings Title
        
    
 
     
	
    
        Quellenangaben
        
	    Volume: 11,  
	    Issue: 1,  
	    Pages: ,  
	    Article Number: 14505 
	    Supplement: ,  
	
    
 
    
        
            Series
            
        
 
        
            Publisher
            Nature Publishing Group
        
 
        
            Publishing Place
            London
        
 
	
        
            Day of Oral Examination
            0000-00-00
        
 
        
            Advisor
            
        
 
        
            Referee
            
        
 
        
            Examiner
            
        
 
        
            Topic
            
        
 
	
        
            University
            
        
 
        
            University place
            
        
 
        
            Faculty
            
        
 
    
        
            Publication date
            0000-00-00
        
 
         
        
            Application date
            0000-00-00
        
 
        
            Patent owner
            
        
 
        
            Further owners
            
        
 
        
            Application country
            
        
 
        
            Patent priority
            
        
 
    
        Reviewing status
        Peer reviewed
    
 
     
    
        POF-Topic(s)
        30205 - Bioengineering and Digital Health
    
 
    
        Research field(s)
        Enabling and Novel Technologies
    
 
    
        PSP Element(s)
        G-503800-001
    
 
    
        Grants
        National Science Foundation
NIH HHS
    
 
    
        Copyright
        
    
 	
    
    
    
    
    
        Erfassungsdatum
        2021-08-04