Kasmanas, J.C.* ; Magnúsdóttir, S.* ; Zhang, J.* ; Smalla, K.* ; Schloter, M. ; Stadler, P.F.* ; de Leon Ferreira de Carvalho, A.C.P.* ; Rocha, U.*
Integrating comparative genomics and risk classification by assessing virulence, antimicrobial resistance, and plasmid spread in microbial communities with gSpreadComp.
BACKGROUND: Comparative genomics, genetic spread analysis, and context-aware ranking are crucial in understanding microbial dynamics' impact on public health. gSpreadComp streamlines the path from in silico analysis to hypothesis generation. By integrating comparative genomics, genome annotation, normalization, plasmid-mediated gene transfer, and microbial resistance-virulence risk-ranking into a unified workflow, gSpreadComp facilitates hypothesis generation from complex microbial datasets. FINDINGS: The gSpreadComp workflow works through 6 modular steps: taxonomy assignment, genome quality estimation, antimicrobial resistance (AMR) gene annotation, plasmid/chromosome classification, virulence factor annotation, and downstream analysis. Our workflow calculates gene spread using normalized weighted average prevalence and ranks potential resistance-virulence risk by integrating microbial resistance, virulence, and plasmid transmissibility data and producing an HTML report. As a use case, we analyzed 3,566 metagenome-assembled genomes recovered from human gut microbiomes across diets. Our findings indicated consistent AMR across diets, with diet-specific resistance patterns, such as increased bacitracin in vegans and tetracycline in omnivores. Notably, ketogenic diets showed a slightly higher resistance-virulence rank, while vegan and vegetarian diets encompassed more plasmid-mediated gene transfer. CONCLUSIONS: The gSpreadComp workflow aims to facilitate hypothesis generation for targeted experimental validations by the identification of concerning resistant hotspots in complex microbial datasets. Our study raises attention to a more thorough study of the critical role of diet in microbial community dynamics and the spread of AMR. This research underscores the importance of integrating genomic data into public health strategies to combat AMR. The gSpreadComp workflow is available at https://github.com/mdsufz/gSpreadComp/.