RT Journal Article SR Electronic T1 Katdetectr: utilising unsupervised changepoint analysis for robust kataegis detection JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.07.11.499364 DO 10.1101/2022.07.11.499364 A1 Hazelaar, Daan M. A1 van Riet, Job A1 Hoogstrate, Youri A1 Lolkema, Martijn P. A1 van de Werken, Harmen J. G. YR 2022 UL http://biorxiv.org/content/early/2022/07/11/2022.07.11.499364.abstract AB Motivation Kataegis refers to the occurrence of regional hypermutation in cancer genomes and is a phenomenon that has been observed in a wide range of malignancies. Robust detection of kataegis is necessary to advance research regarding the origins and clinical impact of kataegis. Multiple kataegis detection packages are publicly available; however, the performance of their respective approaches have not been evaluated extensively. Here, we introduce katdetectr, an R-based, open-source, computationally fast, and robust package for the detection, characterisation and visualisation of kataegis.Results The performance of katdetectr and five publicly available packages for kataegis detection were evaluated using an in-house generated synthetic dataset and an a priori labelled pan-cancer dataset of whole genome sequenced malignancies. The performance evaluation revealed that katdetectr has the highest accuracy and normalized Matthews Correlation Coefficient for kataegis classification on both the synthetic and the a priori labelled dataset. Katdetectr is in particularly more robust for kataegis detection within samples with a high tumour mutational burden.Availability and Implementation Katdetectr imports standardised variant calling formats (MAF and VCF) as well as standard Bioconductor classes (GRanges and VRanges). Katdetectr segments genomic variants utilising unsupervised changepoint detection and the Pruned Exact Linear Time search algorithm. Kataegis foci are flagged based on the historical definition, namely that a kataegis foci is a continuous segment harbouring ≥6 variants and has a mean intermutation distance ≤1000 bp. Additionally, the implementation of changepoint detection utilised by katdetectr results in fast computation. Furthermore, katdetectr is available on Bioconductor which ensures reliability, and operability on common operating systems (Windows, macOS and Linux). Katdetectr is available on Bioconductor at https://www.bioconductor.org/packages/devel/bioc/html/katdetectr.html and on GitHub at https://github.com/ErasmusMC-CCBC/katdetectr. All code used for the performance evaluation is available on GitHub at: https://github.com/ErasmusMC-CCBC/evaluation_katdetectrContact h.vandewerken{at}erasmusmc.nlCompeting Interest StatementThe authors have declared no competing interest.