RT Journal Article SR Electronic T1 A coarse-graining, ultrametric approach to resolve the phylogeny of prokaryotic strains with frequent recombination JF bioRxiv FD Cold Spring Harbor Laboratory SP 094599 DO 10.1101/094599 A1 Tin Yau Pang YR 2017 UL http://biorxiv.org/content/early/2017/02/21/094599.abstract AB Introduction Homologous recombination happens when a foreign DNA stretch replaces a similar stretch on the genome of a prokaryotic cell. For a genome pair, recombination affects their phylogenetic reconstruction in multiple ways: (i) a genome can recombine with a DNA stretch that is similar to the other genome of the pair, thereby reducing their pairwise sequence divergence; (ii) a genome can also recombine with a stretch from an outgroup-genome and increase the pairwise divergence. Most phylogenetic algorithms cannot account for recombination; while some do, they cannot account for all effects of recombination.Results We develop a fast algorithm that reconstructs ultrametric-trees while explicitly accounting for recombination. Instead of considering individual positions of genome sequences, we use a coarse-graining approach, which divides a genome sequence into short segments to account for local density of nucleotide-substitution. For each genome pair considered, our coarse-graining-phylogenetic (CGP) algorithm enumerates the pairwise single-site-polymorphisms (SSPs) on each segment to obtain the pairwise SSP-distribution; we fit each empirical SSP-distribution to a theoretical SSP-distribution. We test the accuracy of our algorithm against other state-of-the-art algorithms on simulated and real genomes. For genomes with a substantial level of recombination, such as E. coli, we show that the age prediction of internal nodes by CGP is more accurate than other algorithms, while the tree topology is at least as accurate.Conclusion The CGP algorithm is more accurate and faster than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.