Abstract
Motivation Whole-genome alignment methods show insufficient scalability towards the generation of large-scale whole-genome alignments (WGAs). Profile alignment-based approaches revolutionized the fields of multiple sequence alignment construction methods by significantly reducing computational complexity and runtime. However, WGAs need to consider genomic rearrangements between genomes, which makes the profile-based extension of several whole-genomes challenging. Currently, none of the available methods offer the possibility to align or extend WGA profiles.
Results Here, we present GPA, an approach that aligns the profiles of WGAs and is capable of producing large-scale WGAs many folds faster than conventional methods. Our concept relies on already available whole-genome aligners, which are used to compute several smaller sets of aligned genomes that are combined to a full WGA with a divide and conquer approach. We make use of the SuperGenome data structure, which features a bidirectional mapping between individual sequence and alignment coordinates. This data structure is used to efficiently to transfer different coordinate system into a common one based on the principles of profiles alignments. The approach allows the computation of a WGA where alignments are subsequently merged along a guide tree. The current implementation uses progressiveMauve (Darling et al., 2010) and offers the possibility for parallel computation of independent genome alignments. Our results based on data sets up to 326 genomes show that we can reduce the runtime from months to hours with a quality that is negligibly worse than the WGA computed with the conventional progressiveMauve tool.
Availability GPA is freely available at https://lambda.informatik.uni-tuebingen.de/gitlab/ahennig/SuperGenome. GPA is implemented in Java 8, uses progressiveMauve and offers a parallel computation of WGAs.
Contact andre.hennig{at}uni-tuebingen.de
Supplementary information Supplementary data are available at Bioinformatics online.