Transposon screens are powerful in vivo assays used to identify loci driving carcinogenesis. These loci are identified as Common Insertion Sites (CISs), i.e. regions with more transposon insertions than expected by chance. However, the identification of CISs is affected by biases in the insertion behaviour of transposon systems. Here, we introduce Transmicron, a novel method that differs from previous methods by (i) modelling neutral insertion rates based on chromatin accessibility, transcriptional activity and sequence context and (ii) estimating oncogenic selection for each genomic region using Poisson regression to model insertion counts while controlling for neutral insertion rates. To assess the benefits of our approach, we generated a dataset applying two different transposon systems under comparable conditions. Benchmarking for enrichment of known cancer genes showed improved performance of Transmicron against state-of-the-art methods. Modelling neutral insertion rates allowed for better control of false positives and stronger agreement of the results between transposon systems. Moreover, using Poisson regression to consider intra-sample and inter-sample information proved beneficial in small and moderately-sized datasets. Transmicron is open-source and freely available. Overall, this study contributes to the understanding of transposon biology and introduces a novel approach to use this knowledge for discovering cancer driver genes.
GrantsBundesministerium fur Bildung und Forschung Deutsche Krebshilfe Deutsche Forschungsgemeinschaft (DFG) German Bundesministerium fur Bildung und Forschung (BMBF) through the VALE (Entdeckung und Vorhersage der Wirkung von genetischen Varianten durch Artifizielle Intelligenz fur LEukamie Diagnose und Subtyp-Identifizierung) project