TY - JOUR T1 - Merfin: improved variant filtering and polishing via k-mer validation JF - bioRxiv DO - 10.1101/2021.07.16.452324 SP - 2021.07.16.452324 AU - Giulio Formenti AU - Arang Rhie AU - Brian P. Walenz AU - Françoise Thibaud-Nissen AU - Kishwar Shafin AU - Sergey Koren AU - Eugene W. Myers AU - Erich D. Jarvis AU - Adam M. Phillippy Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/07/18/2021.07.16.452324.abstract N2 - Read mapping and variant calling approaches have been widely used for accurate genotyping and improving consensus quality assembled from noisy long reads. Variant calling accuracy relies heavily on the read quality, the precision of the read mapping algorithm and variant caller, and the criteria adopted to filter the calls. However, it is impossible to define a single set of optimal parameters, as they vary depending on the quality of the read set, the variant caller of choice, and the quality of the unpolished assembly. To overcome this issue, we have devised a new tool called Merfin (k-mer based finishing tool), a k-mer based variant filtering algorithm for improved genotyping and polishing. Merfin evaluates the accuracy of a call based on expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller’s internal score. Moreover, we introduce novel assembly quality and completeness metrics that account for the expected genomic copy numbers. Merfin significantly increased the precision of a variant call and reduced frameshift errors when applied to PacBio HiFi, PacBio CLR, or Nanopore long read based assemblies. We demonstrate the utility while polishing the first complete human genome, a fully phased human genome, and non-human high-quality genomes.Competing Interest StatementThe authors have declared no competing interest. ER -