RT Journal Article SR Electronic T1 TandemAligner: a new parameter-free framework for fast sequence alignment JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.09.15.507041 DO 10.1101/2022.09.15.507041 A1 Andrey V. Bzikadze A1 Pavel A. Pevzner YR 2022 UL http://biorxiv.org/content/early/2022/09/17/2022.09.15.507041.abstract AB The recent advances in “complete genomics” revealed the previously inaccessible genomic regions (such as centromeres) and enabled analysis of their associations with diseases. However, analysis of variations in centromeres, immunoglobulin loci, and other extra-long tandem repeats (ETRs) faces an algorithmic bottleneck since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, that work well for most sequences, fail to construct biologically adequate alignments of ETRs. This limitation was overlooked in previous studies since the ETR sequences across multiple genomes only became available in the last year. We present TandemAligner — the first parameter-free sequence alignment algorithm that introduces a sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. We apply TandemAligner to various human centromeres and primate immunoglobulin loci, arrive at the first accurate estimate of the mutation rates in human centromeres, and quantify the extremely high rate of large insertions/duplications in centromeres. This extremely high rate (that the standard alignment algorithms fail to uncover) suggests that centromeres represent the most rapidly evolving regions of the human genome with respect to their structural organization.Competing Interest StatementThe authors have declared no competing interest.