PT - JOURNAL ARTICLE AU - Lauris Kaplinski AU - Märt Möls AU - Tarmo Puurand AU - Fanny-Dhelia Pajuste AU - Maido Remm TI - KATK: fast genotyping of rare variants directly from unmapped sequencing reads AID - 10.1101/2020.12.23.424124 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.12.23.424124 4099 - http://biorxiv.org/content/early/2020/12/23/2020.12.23.424124.short 4100 - http://biorxiv.org/content/early/2020/12/23/2020.12.23.424124.full AB - Motivation KATK is a fast and accurate software tool for calling variants directly from raw NGS reads. It uses predefined k-mers to retrieve only the reads of interest from the FASTQ file and calls genotypes by aligning retrieved reads locally. KATK does not use data about known polymorphisms and has NC (No Call) as default genotype. The reference or variant allele is called only if there is sufficient evidence for their presence in data. Thus it is not biased against rare variants or de novo mutations.Results With simulated datasets, we achieved a false negative rate of 0.23% (sensitivity 99.77%) and a false discovery rate of 0.19%. Calling all human exonic regions with KATK requires 1-2 h, depending on sequencing coverage.Availability KATK is distributed under the terms of GNU GPL v3. The k-mer databases are distributed under the Creative Commons CC BY-NC-SA license. The source code is available at GitHub as part of Genometester4 package (https://github.com/bioinfo-ut/GenomeTester4/). The binaries of KATK package and k-mer databases described in the current paper are available on http://bioinfo.ut.ee/KATK/.Competing Interest StatementThe authors have declared no competing interest.