Rare genetic variants can have strong effects on phenotypes, yet accounting for rare variants in genetic analyses is statistically challenging due to the limited number of allele carriers and the burden of multiple testing. While rich variant annotations promise to enable well-powered rare variant association tests, methods integrating variant annotations in a data-driven manner are lacking. Here we propose deep rare variant association testing (DeepRVAT), a model based on set neural networks that learns a trait-agnostic gene impairment score from rare variant annotations and phenotypes, enabling both gene discovery and trait prediction. On 34 quantitative and 63 binary traits, using whole-exome-sequencing data from UK Biobank, we find that DeepRVAT yields substantial gains in gene discoveries and improved detection of individuals at high genetic risk. Finally, we demonstrate how DeepRVAT enables calibrated and computationally efficient rare variant tests at biobank scale, aiding the discovery of genetic risk factors for human disease traits.
FörderungenModel Exchange for Regulatory Genomics project (MERGE) Initiative and Networking Fund of the Helmholtz Association State Parliament of Baden-Wurttemberg for the Innovation Campus Health+Life Science Alliance Heidelberg Mannheim Helmholtz Association Deutsche Forschungsgemeinschaft (DFG German Research Foundation) German Bundesministerium fur Bildung und Forschung (BMBF) through the ERA PerMed project PerMiM UKBB