PT - JOURNAL ARTICLE AU - Robert Hubley AU - Travis J. Wheeler AU - Arian F.A. Smit TI - Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families AID - 10.1101/2021.08.17.456740 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.08.17.456740 4099 - http://biorxiv.org/content/early/2021/08/18/2021.08.17.456740.short 4100 - http://biorxiv.org/content/early/2021/08/18/2021.08.17.456740.full AB - The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Such alignments play an important role in understanding and representing TE family history. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family. As a result, consensus sequences derived from Refiner-based MSAs are more similar to the true consensus.Competing Interest StatementThe authors have declared no competing interest.