A three-sample test for introgression

Many methods exist for detecting introgression between non-sister species, but the most commonly used require either a single sequence from four or more taxa or multiple sequences from each of three taxa. Here we present a test for introgression that uses only a single sequence from three taxa. This test, denoted D3, uses similar logic as the standard D-test for introgression, but by using pairwise distances instead of site patterns it is able to detect the same signal of introgression with fewer species. We use simulations to show that D3 has statistical power almost equal to D, demonstrating its use on a dataset of wild bananas (Musa). The new test is easy to apply and easy to interpret, and should find wide use among currently available datasets.

Genome-scale data have revealed extensive evidence for postspeciation introgression across the tree of life (reviewed in Mallet et al. 2016).Many of these analyses have been carried out in a phylogenetic context, using only a single sample from each population or species.Some methods use gene tree topologies themselves as input (e.g., Huson et al. 2005;Meng and Kubatko 2009;Yu et al. 2011;Edelman et al. 2018), whereas others use counts of shared derived alleles that reflect the underlying topologies (e.g., Green et al. 2010;Lohse and Frantz 2014;Pease and Hahn 2015).
All of these methods depend on the expectation under incomplete lineage sorting (ILS) that the two lessfrequent topologies in a rooted triplet should be equal in frequency.Asymmetry in gene tree topologies is taken as evidence for introgression, though ancestral population structure can produce similar patterns (Slatkin and Pollack 2008;Durand et al. 2011;Lohse and Frantz 2014).Importantly, the need to distinguish among topologies or between ancestral and derived sites using these methods means that at least four taxa must be sampled, and sometimes more (e.g., Pease and Hahn 2015;Elworth et al. 2018).
Here, we present a test for introgression that only requires a single sample from each of three taxa.With three taxa we cannot infer the frequencies of alternative gene tree topologies.Instead, our test is based on a related prediction of the ILS model: that there is also an expected symmetry in the branch lengths among topologies.While previous papers have used this expectation informally as an argument for gene flow (e.g., Brandvain et al. 2014), we develop an explicit model and test statistic based on pairwise distances to detect the presence of introgression.

A Test for Introgression
Assume that lineages A and B are sister in the species tree, with divergence time t 1 (measured in units of 2N generations), and that the ancestor of A and B split from lineage C at time t 2 (fig.1a).We refer to gene trees having this topology as AB, such that the two discordant topologies are AC and BC (fig.1b and c, respectively).
When ILS is the only cause of gene tree incongruence, topology AB may be generated in two different ways, with different expected frequencies and branch lengths.Looking backwards in time, we refer to the topology in which lineages A and B coalesce before t 2 as AB1 (this is the history shown in fig.1a).Alternatively, the same topology can occur when these lineages coalesce in the ancestral population of all three lineages; we refer to this topology as AB2.
The expected frequencies of these four topologies are (Hudson 1983): (1) As mentioned in the first three paragraphs, here we see that the two discordant topologies (AC and BC) are expected to have the same frequencies.See chapter 9 in Hahn (2018) for more details on the underlying assumptions of this model.The same model leads naturally to expectations for the times to coalescence between lineages in each of the different topologies.Here, we focus on the expected times to coalescence between B and C (t B-C ) and between A and C (t A-C ).These times are (Hibbins and Hahn 2019):

Letter
 The Author(s) 2019.Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.All rights reserved.For permissions, please e-mail: journals.permissions@oup.com These times can be transformed into genetic distances between tip sequences by assuming an infinite sites mutation model and multiplying by two to account for mutations along both lineages since their common ancestor.Summing the weighted length of branches between any two taxa across all possible topologies leads to the following expected distances: (leaving off the shared mutation parameter, l, for clarity).
Because of the underlying symmetries in topology frequencies and branch lengths under ILS, the expected values of d B-C and d A-C are exactly the same.Notably, these expectations hold for distances calculated without rooted gene trees or polarized substitutions (e.g., fig.1d-f).
Given these results, a natural test of the ILS-only model can be formed using the statistic: Because the two terms in the numerator have the same expected values under ILS alone, the expectation of D 3 is 0.
The denominator is a normalizing factor that bounds D 3 between -1 and þ1.D 3 can be significantly different from zero in the presence of gene flow.While the exchange of alleles between lineages A and B will have no effect on D 3 , unequal amounts of introgression between either B and C (fig.2a) or A and C (fig.2b) can lead to deviations from zero.This occurs because gene flow between a pair of nonsister lineages leads to a breakdown in the symmetry of branch lengths predicted under ILS alone.In particular, introgression between B and C leads to both more trees with a BC topology and a shorter pairwise distance between these two lineages (fig.2a).As a result, d B-C will be smaller than d A-C , leading to a negative value of D 3 .Conversely, gene flow between A and C leads to positive values of D 3 .Exact expectations for D 3 in the presence of introgression are presented in the Appendix.

Application of D 3
The D 3 test is straightforward to carry out, requiring only pairwise distances between three species.Distances can be measured as the percent of sites that differ in an alignment, or Three-Sample Test for Introgression .doi:10.1093/molbev/msz178any measure of genetic distance corrected for multiple hits.
Ideally, distances should be calculated from regions for which all three lineages have sequences present in the alignment.This will avoid biases that could possibly occur if regions with different ancestral effective population sizes (for example, in regions with different recombination rates; Pease and Hahn 2013) are sampled unequally for the two relevant distances.Otherwise, variation in either N or l across sites should not affect the expectation of D 3 (see below).
As an example application of this method, we calculated D 3 for whole-genome data from three subspecies within Musa acuminata (wild bananas; the alignment can be found at https://doi.org/10.6084/m9.figshare.7924727.v1;last accessed August 8, 2019).The tree relating the three subspecies used here is (M.a. burmannica (M.a. malaccensis, M. a. banksii)) (Rouard et al. 2018).As was found using the original D-test on these three taxa and an outgroup (Rouard et al. 2018), D 3 indicated gene flow between malaccensis and burmannica (D 3 ¼ -0.06; P < 0.0001), or species closely related to them (see "Assumptions of D 3 " below).Distances were calculated as the proportion of sites that differed between sequences and the significance of D 3 was determined by a block bootstrap of the Musa alignment, as is normally done for the D-test (Green et al. 2010).

Statistical Power of D 3 and Comparison with D
We tested the power of D 3 to detect gene flow with increasing levels of introgression (fig.3a).As the fraction of the genome introgressed approaches 10%, D 3 can detect gene flow in 94% of simulated data sets (at P < 0.05) with an alignment length of 1 Mb.If we reduce the alignment length to 100 kb we slightly reduce the power to detect gene flow (supplementary fig. 1, Supplementary Material online), while if we increase h by 200-fold the power of D 3 is increased (supplementary fig.2, Supplementary Material online).Simulations with variation in h across sites reduced power a small amount (supplementary fig.3, Supplementary Material online).The timing of t 1 has no effect on D 3 (supplementary fig.4, Supplementary Material online), while, as expected, the timing of introgression (t m ) does have an effect (supplementary fig.5, Supplementary Material online): when introgression occurs only a short time after speciation, there is little power to detect an asymmetry in branch lengths due to introgression.We also explored an alternative normalization to D 3 , similar to the y-distance introduced in Percentages reported above each violin plot represent the proportion of simulated data sets that were significantly different from 0 at P < 0.05.Ashander et al. (2018), but found little difference in power (supplementary fig.6, Supplementary Material online).
In addition to good power to reject the null in the presence of introgression, when there is no gene flow (c ¼ 0), the proportion of false positives in D 3 is the number we would expect at this significance threshold (fig.3a).We can also see that the expected values of D 3 under different levels of introgression (calculated according to the equations given in the Appendix) closely match the mean of simulated data sets (fig.3a).This indicates that we have developed an accurate model for the effect of gene flow on D 3 .
In order to directly compare these power calculations to the traditional D-test, we included an outgroup in the same simulated data sets (the outgroup was simply ignored for D 3 calculations).As shown in figure 3b, D has only slightly more statistical power, despite requiring more data than D 3 .Our results match similar calculations for D carried out previously (e.g., Good et al. 2015;Martin et al. 2015), demonstrating the general power of this class of tests to detect introgression between nonsister lineages.
D 3 also has some obvious advantages over similar tests, as it does not require an outgroup (as does the D-test) or population samples.Methods such as the f 3 -test (Reich et al. 2009) require polymorphism data from three taxa, though there are other statistics that only require polymorphism data from two when detecting gene flow between sister species (e.g., Joly et al. 2009;Geneva et al. 2015;Rosenzweig et al. 2016).Even when data from outgroups are available, if there is either ILS or introgression involving these species the D-test may not be appropriate.D 3 can also detect introgression in both directions (i.e., from B into C and from C into B), similar to D but unlike f 3 , which can only detect it in one direction (Peter 2016).It should be noted, however, that the D-test will be more robust to sequencing error than D 3 because it does not consider mutations on terminal branches (see next section).
The D-test has been used with ancient DNA samples, as in the use of Neandertal sequences in the paper introducing this statistic (Green et al. 2010).Although the expectations of branch lengths for D 3 given here obviously assume that all sequences are sampled from the present (or are sampled contemporaneously from the past), all of the symmetry expectations hold if the ancient sample is the unpaired lineage (i.e., species C in fig.1).Therefore, there may also be limited cases in which D 3 can be applied to ancient samples.

Assumptions of D 3
Several points about the test introduced here merit further discussion and explanation.Although the expectations underlying D 3 require few assumptions, there are a few things to be cautious about.
First, we have assumed that the pairwise distances used as input to D 3 accurately reflect coalescence times.This will only strictly be true for sequences evolving under an infinite sites model with the same shared mutation rate across lineages.Such conditions likely hold only for relatively closely related species, limiting the use of D 3 to recent divergences.To test the robustness of this assumption, we conducted further simulations in which lineage B was made to have a faster rate of substitution after the split with A. While D 3 appears robust to small changes in substitution rates, above a 0.01%difference in rates there is an extremely high rate of false positives (supplementary fig.7, Supplementary Material online).These simulations are also analogous to cases in which there are unequal rates of sequencing errors among lineages, or other causes of asymmetry such as mapping bias to a reference genome.To detect such situations, we recommend quantifying the proportion of genomic windows with either a positive or negative value of D 3 .When there is no introgression the expectation is that 50% of windows will have positive (or negative) values, and under introgression there is only a slight excess of such windows (supplementary fig.8, Supplementary Material online).In contrast, lineage-specific differences in rates of mutation and/or sequence quality will cause most genomic windows to have either a positive or negative value of D 3 .Using these expectations should help to distinguish the causes of significant results.
Second, while values of D 3 significantly different from zero can be interpreted as rejecting an ILS-only model (given the above assumptions), such results do not strictly mean that introgression is the cause of rejection, or that introgression occurred between the sampled lineages.As with the D-test, population structure in the ancestor of all three lineages can produce deviations from the ILS-only expectations (Slatkin and Pollack 2008;Durand et al. 2011).In these cases additional analyses may be needed to distinguish among alternative causes of significant D 3 values (e.g., Lohse and Frantz 2014).Likewise, significant results among the three lineages tested do not necessarily mean that gene flow occurred between these species or their direct ancestors.Either unsampled extant lineages or extinct "ghost" taxa may have been the ones directly involved in the introgression event, while the sampled lineages show the effects of reduced divergence.In either of the cases discussed here, one must simply be cautious as to how significant results are interpreted.
Finally, we have assumed that the rooted species tree is known, even though the test does not require an outgroup.Of course it is often the case that the species tree can be inferred from either smaller amounts of sequence data or morphological characters, and so a rooted species tree may be known despite the lack of genome-scale data from an outgroup taxon.However, if the species relationships are not known, a conservative approach would be to test all three combinations of pairwise distances (i.e., d B-C -d A-C , d B-Cd A-B , and d A-C -d A-B ).If all three are significantly different from zero, then it is likely that introgression has acted in the system.

Materials and Methods
In order to determine the statistical power of the tests discussed here, we simulated multi-locus data sets.For each of four different values of the admixture proportion (c), our main results are based on 100 simulated data sets consisting of 1,000 nonrecombining loci each using the coalescent simulator ms (Hudson 2002).Except where stated explicitly, the Three-Sample Test for Introgression .doi:10.1093/molbev/msz178

)FIG. 1 .
FIG.1.Topologies produced by incomplete lineage sorting.The top row shows the same species tree (thick lines, with divergence times denoted by t 1 and t 2 ) within which three different topologies arise: (a) AB1, (b) AC, and (c) BC.The bottom row (d-f) shows the same unrooted topologies as in a-c, with approximate branch lengths.

FIG. 2 .FIG. 3 .
FIG. 2.Topologies produced by introgression.The top row shows the same species tree as in figure1, but with introgression between (a) lineages B and C, or (b) lineages A and C. Introgression occurs at time t m in both scenarios.The bottom row again shows the approximate unrooted topologies resulting from introgression.Note how the distance between lineages (a) B and C, or (b) A and C are smaller than in the ILS-only case (fig.1).