Abstract
The goal of Phylogenetic Comparative Methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) running along a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species’ traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. One natural extension of the BM is to use a weighted average model for the trait of a hybrid, at a reticulation point. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel’s λ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents’. Hybrid vigor and hybrid depression is indeed a rather common phenomenon observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts, and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios, and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to a dataset of Xiphophorus fishes, to confirm and complete previous analysis in this group. All the methods developed here are available in the user-friendly julia package PhyloNetworks.
The evolutionary history of species is known to shape the present-day distribution of observed characters (Felsenstein 1985). Phylogenetic Comparative Methods (PCMs) have been developed to account for correlations induced by a shared history in the analysis of a quantitative dataset (Pennell and Harmon 2013). They usually rely on two main ingredients: a time-calibrated phylogenetic tree, and a dynamical model of trait evolution, that should be chosen to capture the features of the trait evolution over time. Much work has been made on the second ingredient, with more and more sophisticated models of trait evolution, with numerous variations around the original Brownian Motion (BM), see for instance Felsenstein (1985); Hansen and Martins (1996); Hansen (1997); Blomberg et al. (2003); Butler and King (2004); Beaulieu et al. (2012); Landis et al. (2013); Blomberg (2016), to cite only but a few.
In contrast, the first assumption has not been questioned until now (Jhwueng and O’Meara 2017). However, phylogenetic trees are not always adapted to capture relationships between species, and phylogenetic networks are sometimes needed. Phylogenetic networks differ from trees by added reticulation points, where two distinct branches come together to create a new species. Such reticulations can represent various biological mechanisms, like hybridization, gene flow or horizontal gene transfer, that are known to be common in some groups of organisms (Mallet 2005, 2007). Ignoring those events can lead to misleading tree inference (Kubatko 2009; Solís-Lemus et al. 2016; Long and Kubatko 2017). Thanks to recent methodological developments, the statistical inference of reliable phylogenetic networks has become possible (Maddison 1997; Degnan and Salter 2005; Kubatko 2009; Yu et al. 2012, 2014; Yu and Nakhleh 2015; Solís-Lemus and Ané 2016). Although these state-of-the-art methods are still limited by their computational burden, we believe that the use of these networks will increase in the future. The goal of this work is to propose an adaptation of standard PCMs to a group of species with reticulate evolution, related by a network instead of a tree.
We describe an extension of the BM model of trait evolution to a network. The main modeling choice is about the fate of hybrid species. How should these species inherit their trait from their two parents? In this work, we first choose a weighted-average merging rule: the trait of a hybrid is a mixture of its two parents’, weighted by their relative genetic contributions. This rule can be seen as a reasonable null model. But in some cases, the trait of a hybrid is observed to be outside of the range of its two parents. This phenomenon can be modeled by a shift in the trait value occurring right after the reticulation point: the hybrid trait value being the weighted average of the two parents, plus an extra term specific to the hybridization event at hand. Such a shift can model several biological mechanisms, such as transgressive segregation (Rieseberg et al. 1999) or heterosis (Fiévet et al. 2010; Chen 2013), with hybrid vigor (when the hybrid species is particularly fit to its environment) or depression (when the hybrid is ill-fit). In the following, we will refer to this class of phenomena using the generic term transgressive evolution. Here, this term only refers to the hybrid trait being different from the weighted average of its parents. This model allows for an explicit mathematical derivation of the trait distribution at the tips of the network and extends to networks the use of standard PCM tools such as phylogenetic regression (Grafen 1989, 1992), ancestral state reconstruction (Felsenstein 1985; Schluter et al. 1997) or tests of phylogenetic signal (Pagel 1999).
In the following, we first describe this BM model of trait evolution and show how it fits into the standard PCM framework. We then show how to add shifts in the trait values to model transgressive evolution. We propose a statistical test for transgressive evolution. These methods are validated with a simulation study, and with the theoretical study of the power of the tests in a range of scenarios. Finally, we revisit the analysis of a Xiphophorus dataset about sword index and female preference made by Cui et al. (2013), in the light of our new network methods.
MODEL
In our model for trait evolution on a phylogenetic network, the novel aspect is the merging rule at reticulation events, compared to standard PCMs on trees. Our model is very similar to that defined in Jhwueng and O’Meara (2017), but we adopt a different statistical view point, based on the phylogenetic linear regression representation.
Trait Evolution on Networks
Phylogenetic Network
In this work, we assume that we have access to a rooted, calibrated and weighted phylogenetic network that describes the relationships between a set of observed species (Huson et al. 2010). In such a network, reticulations, or hybrids, are nodes that have two parent nodes. They receive a given proportion of their genetic material from each parent. This proportion is controlled by a weight γe that represents the inheritance probability associated with each branch e of the network. A branch that is tree-like, i.e. that ends at a non-hybrid node, has a weight γe = 1. We further assume that the length ℓe of a branch e represents evolutionary time. In the example in Figure 1a, the two hybrid edges have length zero, but this need not to be the case, see Jhwueng and O’Meara (2017); Degnan (2017).
Brownian Motion
Since the seminal article of Felsenstein (1985), the Brownian Motion (BM) has been intensively used to model trait evolution on phylogenetic trees. It is well adapted to model several biological processes, from random genetic drift, to rapid adaptation to a fluctuating environment (see e.g. Felsenstein 2004, Chap. 24). In order to adapt this process to a network instead of a tree, we define a weighted average merging rule at hybrids, as defined below. This rule expresses the idea that a hybrid inherits its trait from both its parents, with a relative weight determined by the proportion of genetic material received from each: if it inherits 90% of its genes from parent A, then 90% of its trait value should be determined by the trait of A. Because the BM usually models the evolution of a polygenic character, that is the additive result of the expression of numerous genes, this rule is a natural null hypothesis.
(BM on a Network). Consider a rooted phylogenetic network with branch lengths and inheritance probabilities. Let Xv be the random variable describing the trait value of node (or vertex) v
At the root node ρ, we assume that Xρ = μ is fixed.
For a tree node v with parent node a, we assume that Xv is normally distributed with mean Xa + Δe and with variance σ2ℓe, with σ2 the variance rate of the BM, and ℓe the length of the parent edge e from a to v. Δe is a (fixed) shift value associated with branch e, possibly equal to 0.
For a hybrid node v with parent nodes a and b, we assume that Xv is normally distributed with mean γea Xa + γeb Xb, where ea and eb are the hybrid edges from a (and b) to v. If these edges have length 0, meaning that a, b and their hybrid v are contemporary, then Xv is assumed to have variance 0, conditional on the parent traits Xa and Xb. In general, the conditional variance of Xv is γea σ2ℓea + γeb σ2ℓeb. For the sake of identifiability, shifts are not allowed on hybrid branches (see Section on Transgressive Evolution for further details).
An example of such a process (without shift) is presented Figure 1b. This process is the same as in Jhwueng and O’Meara (2017), except that the shifts are treated differently. See Section on Transgressive Evolution and Discussion for more information on the links and differences between the two models. For the sake of generality, shifts are allowed on any tree edge. We will see in the next section how they are used to model transgressive evolution. In the rest of this section, we take all shifts to be zero, and only consider the un-shifted BM (Δe = 0 for all edges e).
Note that the state at the root, μ, could also be drawn from a Gaussian distribution, instead of being fixed. This would not change the derivations below, and would simply add a constant value to all terms in the variance matrix.
Variance Matrix
From a Tree to a Network
The distribution of trait values at all nodes, X, can be fully characterized as a multivariate Gaussian with mean μ1m+n and variance matrix σ2V, where 1m+n is the vector of ones, n is the number of tips and m the number of internal nodes. The variance matrix V, which depends on the topology of the network, encodes the correlations induced by the phylogenetic relationships between taxa. When the network reduces to a tree (if there are no hybrids), then V is the well-known BM covariance (Felsenstein 1985): Vij = tij is the time of shared evolution between nodes i and j, i.e. the time elapsed between the root and the most recent common ancestor (mrca) of i and j.
When the network contains hybrids, this formula is not valid anymore. To see this, let’s re-write tij as: where pi is the path going from the root to node i. This formula just literally expresses that tij is the length of the shared path between the two nodes, that ends at their mrca. On a network, the difficulty is that there is not a unique path going from the root to a given node. Indeed, if there is a hybrid among the ancestors of node i, then a path might go “right” of “left” of the hybrid loop to go from the root to i.
Under the BM model in Definition 1 (with a fixed root), it turns out that we need to sum over all the possible paths going from the root to a given node, weighting paths by the inheritance probabilities γe of all the traversed edges: where 𝒫i denotes the set of all the paths going from the root to node i.
This general formula for V was first presented in Pickrell and Pritchard (2012) in the context of population genomics. A formal proof is provided here (Appendix).
Remark 1 (Variance reduction). From the expression above, we can show that the variance of any tip i decreases with each hybridization ancestral to i. Consider time-consistent network, in the sense that all paths from the root to a given hybrid node have the same length, as expected if branch lengths measure calendar time. Note that this is the opposite of the “NELP” property (No Equally Long Paths) defined by Pardi and Scornavacca (2015). For tip i, let ti be the length of any path from the root to i. If the network is a tree, then Vii = ti. If the history of tip i involves one or more reticulations, then we show (Appendix), that:
This shows that hybridization events, that imply taking a weighted means of two traits, cause the trait variance to decrease.
Algorithm
The formula above, although general, is not practical to compute. Using the recursive characterization of the process given in Definition 1, we can derive an efficient way to compute this covariance matrix across all nodes in the network (tips and internal nodes), in a single traversal of the network. This traversal needs to be in “preorder”, from the root to the tips, such that any given node is listed after all of its parent(s): for any two nodes numbered i and j, if there is a directed path from i to j, then i ≤ j. Such an ordering (also called topological sorting) can be obtained in linear time in the number of nodes and edges (Kahn 1962). On Figure 1a, nodes are numbered from 1 to 13 in preorder. The result below, proved in the Appendix, provides an efficient algorithm to compute the phylogenetic variance matrix V in a time linear in the number of nodes of the network, in a single preorder traversal.
Proposition 1 (Iterative computation of the phylogenetic variance). Assume that the nodes of a network are numbered in preorder. Then V can be calculated using the following step for each node i, from i = 1 to i = n + m:
If i=1 then i is the root, and vii=0
If i is a tree node, denote by a the parent of i, and by ℓea the length of the branch ea going from a to i. Then:
If i is a hybrid node, denote by a and b the parents of i, by ℓea and ℓeb the lengths of the branches ea and eb going from a or b to i, and by γea and γeb the associated inheritances probabilities. Then:
Phylogenetic Regression
We can now define a phylogenetic regression on networks, the same way it is defined for phylogenetic trees (Grafen 1989, 1992).
Linear Regression Framework
Define Y as the vector of trait values observed at the tips of the network. This is a sub-vector of the larger vector of trait values at all nodes. Let Vtip be the sub-matrix of V, with covariances between the observed taxa (tips). The phylogenetic linear regression can be written as: where R is a n × q matrix of regressors, and θ a vector of q coefficients. We can recover the distribution of Y under a simple BM with a fixed root value equal to μ (and no shift) by taking R = 1n and θ = μ (with q = 1). Regression matrix R can also contain some explanatory trait variables of interest. In this phylogenetic regression, the BM model applies to the residual variation not explained by predictors, E.
This formulation is very powerful, as it recasts the problem into the well-known linear regression framework. The variance matrix Vtip is known (it is entirely characterized by the network used) so that, through a Cholesky factorization, we can reduce this regression to the canonical case of independent sampling units. This problem hence inherits all the features of the standard linear regression, such as confidence intervals for coefficients or data prediction, as explained in the next paragraph.
Ancestral State Reconstruction and Missing Data
The phylogenetic variance matrix can also be used to do ancestral state reconstruction, or missing data imputation. Both tasks are equivalent from a mathematical point of view, rely on the Best Linear Unbiased Predictor (BLUP, see e.g. Robinson 1991) and are well known in the standard PCM toolbox. They have been implemented in many R packages, such as ape (Paradis et al. 2004, function ace), phytools (Revell 2012, function fastAnc) or more recently Rphylopars (Goolsby et al. 2017, function phylopars). In our Julia package PhyloNetworks, it is available as function ancestralStateReconstruction.
Pagel’s λ
The variance structure induced by the BM can be made more flexible using standard transformations of the network branch lengths, such as Pagel’s λ (Pagel 1999). Because the network is calibrated with node ages, it is time-consistent: the time ti elapsed between the root and a given node i is well defined, and does not depend on the path taken. Hence, the lambda transform used on a tree can be extended to networks, as shown below.
(Pagel’s λ transform). First, for any hybrid tip in the network, add a child edge of length 0 to change this tip into an internal (hybrid) node, and transfer the data from the former hybrid tip to the new tip. Next, let e be a branch of the network, with child node i, parent node pa(i), and length ℓe. Then its transformed length is given by: where ti and tpa(i) are the times elapsed from the root to node i and to its parent.
The interpretation of this transformation in term of phylogenetic signal is as usual: when λ decreases to zero, the phylogenetic structure is less and less important, and traits at the tips are completely independent for λ = 0. The first step of resolving hybrid tips is similar to a common technique to resolve polytomies in trees, using extra branches of length 0. This transformation does not change the interpretation of the network or the age of the hybrid. The added external edge does allow extra variation specific to the hybrid species, however, immediately after the hybridization, by Pagel’s λ transformation. The second part of (6) applies to the new external tree edge, and hybrid edges are only affected by the first part of (6). The transformation’s impact on the matrix Vtip is not exactly the same as on trees. It still involves a simple multiplication of the off-diagonal terms by λ, but the diagonal terms are also modified. The following proposition is proved in the Appendix.
Proposition 2 (Pagel’s λ effect on the variance). The phylogenetic variance of a BM running on a network transformed by a parameter λ, V(λ) is given by: where V = V(1) is the variance of the BM process on the non-transformed network.
On a tree, we have V (λ)ii = ti for any tip i and any λ, so that the diagonal terms remain unchanged. This is not true on a network, however, as the Pagel transformation erases the variance-reduction effect of ancestral hybridizations.
Other transformations, for instance based on Pagel’s κ or δ (Pagel 1999), could be adapted to the phylogenetic network setting. Although these are not implemented for the moment, they would be straightforward to add in our linear regression framework.
Shifted BM and Transgressive Evolution
In our BM model, we allowed for shifts on non-hybrid edges. In this section, we show how those shifts can be inferred from the linear regression framework, and how they can be used to test for ancestral transgressive evolution events. When considering shifts, we again require that all tips are tree nodes. If a tip is a hybrid node, then the network is first resolved by adding a child edge of length 0 to the hybrid, making this node an internal node. This network resolution does not affect the interpretation of the network or the variance of the BM model. It adds more flexibility to the mean vector of the BM process, because the extra edge is a tree edge on which a shift can be placed.
Shift Vector
We first describe an efficient way to represent the shifts on the network branches in a vector format. In Definition 1, we forbade shifts on hybrid branches. This does not lose generality, and is just for the sake of identifiability. Indeed, a hybrid node connects to three branches, two incoming and one outgoing. A shift on any of these three branches would impact the same set of nodes (apart from the hybrid itself), and hence would produce the same data distribution at the tips. The position of a shift on these three branches is consequently not identifiable. By restricting shifts to tree branches, the combined effect of branches with the same set of descendants is identified by a shift on a single (tree) edge. We can combine all shift values in a vector Δ indexed by nodes:
Note that any tree edge e is associated to its child node i in this definition. In the following, when there is no ambiguity, we will refer indifferently to one or the other.
Descendence Matrix
For a rooted tree, a matrix of 0/1 values where each column corresponds to a clade can fully represent the tree topology. In column j, entries are equal to 1 for descendants of node number j, and 0 otherwise (Ho and Ané 2014; Bastide et al. 2017b). On a network, a node i can be a “partial” descendant of j, with the proportion of inherited genetic material represented by the inheritance probabilities γe. Hence, the descendence matrix of a network can be defined with non-binary entries between 0 and 1 as follows.
(Descendence Matrix). The descendence matrix U of a network, given some ordering of its n tips and m internal nodes, is defined as an (n + m) × (n + m) matrix by: where 𝒫j→i is the set of all the paths going from node j to node i (respecting the direction of edges). Note that, if i is not a descendant of j, then 𝒫j→i is empty and Uij = 0. By convention, if i = j, we take Uii = 1 (that is, a node is considered to be a descendant of itself). If the network is a tree, we recover the usual definition (all the γe are equal to 1). In general, the set of nodes i for which Uij > 0 is the hardwired cluster of i, or the clade below i if the network is a tree.
Further define T as the (non-square) submatrix of U made of the rows that correspond to tip nodes (see example below).
Example 1 (Descendence Matrix and Shift Vector). The descendence matrices U and T associated with the network presented in Figure 2 are shown below, with zeros replaced by dots to improve readability:
The associated shift vector and associated trait means at the tips are shown below, where the only non-zero shift is assumed to correspond to transgressive evolution at the hybridization event, captured by Δ10 on edge 10:
Note that rapid trait evolution or jumps in the trait value in other parts of the phylogeny could be also be modeled, by letting Δi be non-zero for other tree edges i.
Linear Model
The shifted BM model in Definition 1 can be expressed by: where Y is the trait vector at the tips, and Δ and T are the shift vector and the descendence matrix as defined above (see the Appendix for the proof).
Transgressive Evolution
Even though the linear formulation above makes it easier to study, the problem of locating the non-zero shifts on the branches of a phylogenetic tree is difficult, and is still an active research area (see e.g. Uyeda and Harmon 2014; Bastide et al. 2017b; Khabbazian et al. 2016; Bastide et al. 2017a).
On networks as on trees, a shift can represent various biological processes. In the present work, we limit our study to shifts occurring on branches that are outgoing from a hybrid node (see Figure 2 for an example). Such shifts might represent a transgressive evolution effect, as defined in the introduction, and as a component of hybridization: the new species inherits its trait as a weighted average of the traits of its two parents, plus a shift representing extra variation, perhaps as a result of rapid selection.
Limiting shifts to being right after reticulations avoids the difficult exploration of all the possible locations of an unknown number of shifts on all the tree branches.
Statistical Tests for Transgressive Evolution
As there are typically only a few hybridization events in a phylogenetic network, we can test for transgressive evolution on each one individually. Thanks to the linear framework described above, this amounts to a well-known test of fixed effects.
Statistical Model
Denote by N the n × h sub-matrix of T containing only the columns corresponding to tree branches outgoing from hybrid nodes. We assume that N has full rank, that is, that the transgressive evolution shifts are identifiable. This is likely to be the case in networks that can be inferred by current methods, which typically have a small number of reticulations. We further denote by the vector of size n containing the row sums of N: for tip i, Then the phylogenetic linear regression extending (5) with transgressive evolution can be written as: where R is a given matrix of regressors, with associated coefficients β. These are included for the sake of generality, but usually only represent the ancestral state of the BM: R = 1n and β = μ. The coefficient b represents a common transgressive evolution effect, that would affect all the hybridization events uniformly, while the vector d has h entries with a specific deviation from this common effect for each event, and represents heterogeneity.
Fisher Test
When written this way, the problem of testing for transgressive evolution just amounts to testing the fixed effects b and d. Some hypotheses that can be tested are summarized in the next table. ℋ0 corresponds to the null model where the hybrids inherit their parents’ weighted average. ℋ1 is a model where all hybridization events share the same transgressive evolution effect, the trait being shifted by a common coefficient b. Finally, ℋ2 is a model where each hybridization event k has its own transgressive evolution effect, with a shift b + dk.
Tests of fixed effects are very classic in the statistics literature (see e.g. Lehman 1986; Searle 1987). Compared to a likelihood ratio test, an F-test is exact and is more powerful, when available. Here we can define two F (Fisher) statistics F10 and F21 (see the Appendix). To see if ℋ2 fits the data significantly better than ℋ1, we compare F21 to an F distribution with degrees of freedom and n − r[R N], where r is the matrix rank, and [R N] is the matrix obtained by pasting the columns of R and N together. To test ℋ1 versus the null model ℋ0, we compare F10 to an F distribution with degrees of freedom and We study these tests for several symmetric networks in the following section.
SIMULATION AND POWER STUDY
In this section, we first analyse the performance of the PCM tools described above, and then provide a theoretical power study of our statistical tests for transgressive evolution.
Implementation of the Network PCMs
All the tools described above, as well as simulation tools, were implemented in the julia package PhyloNetworks (Solís-Lemus et al. 2017). To perform a phylogenetic regression, the main function is phyloNetworklm. It relies on functions preorder! and sharedPathMatrix to efficiently compute the variance matrix using the algorithm in Proposition 1, and on julia package GLM (Bates 2016) for the linear regression. All the analysis and extraction tools provided by this GLM package can hence be used, including the ftest function to perform the Fisher statistical tests for transgressive evolution. For the Xiphophorus fishes study (see below), we developed function calibrateFromPairwiseDistances! to calibrate a network topology based on pairwise genetic distances.
Simulation Study
Setting
We considered 4 network topologies, all based on the same symmetric backbone tree with unit height and 32 tips, to which we added several hybridization events (Fig. 3, top). Those events were either taken very recent and numerous (h = 8 events each affecting 1 taxon) or quite ancient and scarce (h = 2 events each affecting 4 taxa). All networks had 8 tips with a hybrid ancestry. All the hybridization events had inheritance probability γ = 0.3. We then simulated datasets on these networks with μ = 0, σ2 = 1, and Pagel’s λ transformation with λ in {0, 0.25, 0.5, 0.75, 1}. Recall that λ = 0 corresponds to all tips being independent, and λ = 1 is the simple BM on the original network. Each simulation scenario was replicated 500 times. To study the scalability of the implementation, we then reproduced these analysis on networks with 32 to 256 tips, and 1 to 8 hybridization events, each affecting 8 tips.
We analysed each dataset assuming either a BM or a λ model of evolution. When λ ≠ 1, we could study the effect of wrongly using the BM. All the analyses were conducted on a laptop computer, with four Intel Core i7-6600U, and a 2.60GHz CPU speed.
Results
When the vanilla BM model is used for both the simulation and the inference, the two parameters μ and σ2 are well estimated, with no bias, for all the network topologies tested (Fig. 3, last two rows, red boxes for λ = 1). The estimation of μ is quite robust to the misspecification of the model, while σ2 tends to be over-estimated (Fig. 3, last two rows, red boxes for λ ≠ 1). This is expected, as in this case, the BM model wrongly tries to impose a strong correlation phylogenetic structure on the data, and can only account for the observed diversity by raising the estimated variance, to accommodate both phylogenetic variance and independent intra-specific variation. When we use the true λ model for the inference, this bias is corrected, and both μ and σ2 are correctly estimated (Fig. 3, last two rows, blue boxes). Furthermore, the λ estimate has a small bias but rather high variance (Fig. 3, second row). As expected, when the number of taxa increases, this variance decreases (data not shown). Finally, our implementation is quite fast (Fig. 4), with computing times ranging between 1 and 10 ms for a BM fit, and between 10 ms and 1 s for a Pagel’s λ fit.
Power Study of the Statistical Tests for Transgressive Evolution
We determined that our test statistics have the following noncentral Fisher distributions:
The noncentral coefficient are determined by Δ10 and Δ21, detailed in the Appendix. These Δ terms are zero under the null hypothesis (ℋ0 for Δ10 and ℋ1 for Δ21), and depend on the network topology through the metric defined by Vtip, and through the regression matrix N.
Because we know the exact distribution of our F statistics, we do not need to resort to simulations to assess the power of these tests. In the following, we present a theoretical power study.
Test ℋ0 vs ℋ1
We first studied the theoretical power to detect a single transgressive evolution effect, depending on the size b of this effect, and on the position of the hybridization event on the network. We considered 4 network topologies, using the same backbone tree than in the simulation study above, but adding only one hybridization event, occurring at various depths, from a very recent event affecting a single taxon to a very ancient event affecting 8 taxa (Fig. 5, top). The inheritance probability of this added hybrid branch was fixed to γ = 0.4. This parameter proved to have little influence to detect transgressive evolution (data not shown), for all the values tested, between 0 and 0.5. The underlying BM process had fixed ancestral value μ = 0, and variance rate σ2 = 1. Finally, for each network topology, we varied the transgressive evolution effect from 0 to 5, and computed the power of the test ℋ0 vs ℋ1 for three fixed standard levels (α in {0.01, 0.05, 0.1}).
As expected, the power improves with the size of the effect, reaching approximately 1 for b = 5 in all scenarios (Fig. 5, bottom). In addition, the transgressive evolution effect seems easier to detect for recent hybridization events, even if they affect fewer tips. One intuition for that is that ancient hybridization effects are “diluted” by the variance of the BM, and are hence harder to detect, even if they affect more tips. This may be similar to the difficulty of detecting ancient hybridization compared to recent hybridizations.
Test ℋ1 vs ℋ2
We used a similar framework to study the power of the test to detect heterogeneity in the transgressive evolution effects. We used here the same 4 networks than in the simulation study, with 32 tips and 2 to 8 hybridization events (Fig. 6, top), but with inheritance probabilities fixed to γ = 0.4. Transgressive evolution effects were set to d = ddu, with du fixed to for i ≤ h/2 and , h being the number of hybrids, which was even in all the scenarios we considered. Note that the average transgressive evolution effect was 0, because the values sum up to 0. This allowed us to reduce the “strength of heterogeneity” to a single parameter d, which we varied between 0 and 5 (see appendices for the reduced expression of the noncentral coefficient). Like before, we computed the power of the test ℋ1 vs ℋ2 for three fixed standard levels (α in {0.01, 0.05, 0.1}).
Figure 6 (bottom) shows a similar pattern: the test is more powerful for a high heterogeneity coefficient, and for recent hybridization events. For variation of about 3.5 in transgressive evolution, the power is close to one in all the scenarios considered here.
Xiphophorus FISHES
Methods
Network inference
We revisited the example in Solís-Lemus and Ané (2016) and re-analyzed transcriptome data from Cui et al. (2013) to reconstruct the evolutionary history of 23 swordtails and platyfishes (Xiphophorus: Poeciliidae). The original work included 24 taxa, but we eliminated X. nezahualcoyotl, because the individual sequenced in Cui et al. (2013) was found to be a lab hybrid not representative of the wild species X. nezahualcoyotl (personal communication). We re-analyzed their first set of 1183 transcripts, and BUCKy (Larget et al. 2010) was performed on each of the 8,855 4-taxon sets. The resulting quartet CFs were used in SNaQ (Jhwueng and O’Meara (2017)), using h = 0 to h = 5 and 10 runs each. The network scores (negative log-pseudolikelihood) decreased very sharply from h = 0 to 1, strongly between h = 1 to 3, then decreased only slightly and somewhat linearly beyond h = 3 (Fig. 7, top left). Using a broken stick heuristic, one might suggest that h = 1 or perhaps h = 3 best fits the data. Given our focus on PCMs, we used both networks (h = 1 and 3) as well as the tree (h = 0) to study trait evolution and to compare results across the different choices of reticulation numbers.
Network calibration
SNaQ estimates branch lengths in coalescent units, which are not expected to be proportional to time, and are also not estimable for some edges (like external branches to taxa represented by a single individual). To calibrate the network, we estimated pairwise genetic distances between taxa, and then optimized node divergence times using a least-square criterion, as detailed below.
To estimate pairwise distances, individual gene trees were estimated with RAxML, using the HKY model and gamma-distributed rate variation among sites. For each locus, branch lengths were rescaled to a median of 1 to reduce rate variation across loci, before obtaining a pairwise distance matrix from each rescaled gene tree. Loci with one or more missing taxon were then excluded (leaving 1019 loci), and pairwise distance matrices were averaged across loci.
This average pairwise distance matrix was used to estimate node ages on each network (h = 0, 1, 3). The network pairwise distance between taxa i and j was taken as the weighted average distance between i and j on the trees displayed by the network, where the weight of a displayed tree is the product of the inheritance probabilities γe for all edges e retained in the tree. We estimated node ages that minimized the ordinary least-squares mismatch between the genetic pairwise distances and the network pairwise distances.
Traits
With data presented in Cui et al. (2013) and following their study on sword evolution, we revisited the hypotheses that females have a preference for males with longer swords, and that the common ancestor of the genus Xiphophorus likely had a sword. Rather than using the methods of parsimony character mapping and independent contrasts as in Cui et al. (2013), we tested the effect of hybridization on the ancestral state reconstructions and the correlation between both traits using networks with zero, one or three hybridization events, using phyloNetworklm. For each network, the topology and branch lengths were assumed to be perfectly estimated, and fixed. We also tested for phylogenetic signal in both traits on all networks using Pagel’s λ, as well as for transgressive evolution, using the F statistics defined above. For the phylogenetic regression, more than half of the species were excluded because they lack information on female preference.
Along with the datasets used, two executables julia markdown (.jmd) files are provided in the online supplementary material, allowing the interested reader to reproduce all the analyses described here.
Results
The Xiphophorus fish topologies with zero, one, and three hybridization events were calibrated using pairwise genetic distances (Fig. 7, bottom, for h = 0 and 3). With h = 1, the reticulation event did not necessarily imply the existence of unsampled or extinct taxa, so we constrained this reticulation to occur between contemporary populations (with an edge length of 0). For the network with h = 3, two reticulation events implied the existence of unsampled taxa, so we calibrated this network without constraint, to allow minor reticulation edges of positive lengths. Optimized branch lengths were similar between networks. Branch lengths were estimated to be 0 for some tree edges and some unconstrained hybrid edges, creating polytomies.
Using networks with 0, 1 or 3 hybridization events, we found a positive correlation between female preference and longer swords in males, but this relationship was not statistically significant (h = 0: p = 0.096; h = 1: p = 0.110; h = 3: p = 0.106). Ancestral state reconstruction of sword index shows the presence of a sword at the MRCA of each network because unsworded species were assigned a value of 0.275 in Cui et al. (2013) and the ancestral state in all networks was reconstructed to be 0.46. Phylogenetic signal was high for both traits with estimated λ = 1.0 on all networks (or above 1.0 with unconstrained maximum likelihood).
We also applied our tests for transgressive evolution on both traits, using the network with 3 hybridization events (Fig. 7, lower right). For the sword index, we found no evidence of transgressive evolution (p = 0.55 and p = 0.28, respectively, for homogeneous or heterogeneous transgressive evolution). However, we did find some evidence for an heterogeneous transgressive evolution effect for female preference. Testing ℋ2 against ℋ1 gives p = 0.0087. Testing ℋ2 against ℋ0 directly, we get p = 0.0064 (see the Appendix for a description of this third test, also based on a Fisher statistic). However, transgressive evolution effects were in opposite directions (one positive and two negative), such that the common effect was not significantly different from 0: ℋ1 vs ℋ0 gave p = 0.11.
DISCUSSION
Impact of the Network
The results from the fish dataset analysis using a tree (h = 0) or a network (h = 1 or h = 3) show that taking the hybridization events into account has a small impact on the ancestral state reconstruction and on the estimation of parameters, both for the regression analysis and for the test for phylogenetic signal. This finding was corroborated by simulations: when we ignored hybridization events, using a tree while the true underlying model was a network, we found that the estimation of parameters μ and σ2 was only slightly affected (data not shown). These results may indicate that major previous findings, that used a phylogenetic tree where a phylogenetic network might have been more appropriate, are likely to be robust to a violation of the tree-like ancestry assumption. Our new model may simply refine previous estimates in many cases.
However, the structure of the network has a strong impact on the study of transgressive evolution. This is expected, as the model allows for shifts below each inferred hybrid. If one reticulation is undetected, or if one was incorrectly located on the network, then the model will be ill-fitted, leading to potentially misleading conclusions. As an example, we reproduced the analysis of transgressive evolution for female preference on the network with three hybridization events, but this time pruning the network, to keep only the taxa with a measured trait. Preference data were missing for species X. signum, X. alvarezi and X. mayae, such that X. helleri became the only species impacted by one of the reticulation event, which became a simple loop in the network. In other words, X. helleri was the only descendant of the reticulation, and also the closest relative of the hybrid’s parent among the remaining taxa. The reticulation could be dropped from the pruned network. This new and simplified network only retained the two hybridization events associated with negative shifts. As a consequence, and contrary to the conclusion we found in the main text, we found support for homogeneous transgressive evolution (p = 0.0071 for ℋ1 vs ℋ0), and no support for heterogeneous effects (p = 0.88 for ℋ1 vs ℋ0). This illustrates that caution is needed for the interpretation of tests of transgressive evolution, as those highly depend on the quality of the input network inference, which is a recognized hard problem.
Network Calibration
To conduct PCMs, we developed a distance-based method to calibrate a network topology into a time-consistent network. This is a basic method that makes a molecular clock assumption on the input pairwise distance matrix. Important improvements could be made to account for rate variation across lineages, and to use calibration dates from fossil data, like in relaxed clock calibration methods for phylogenetic trees such as r8s (Sanderson 2003) or BEAST (Drummond et al. 2006). In our fish example, we averaged pairwise distances across loci, to mitigate a violation of the molecular clock that might be specific to each locus.
Our method estimated some branch lengths to be 0, thereby creating polytomies. This behavior is shared by other well-tested distance-based methods like Neighbor-Joining (Saitou and Nei 1987), which can also estimate 0 or even negative branch lengths.
We also noticed that several calibrations could fit the same matrix of genetic pairwise distances equally well, pointing to a lack of identifiability of some node ages. This issue occurred for the age of hybrid nodes and of their parent nodes. Branch lengths and node ages around reticulation points were also found to be non-identifiable by Pardi and Scornavacca (2015), when the input data consist of the full set of trees displayed by the network, and when these trees are calibrated. This information on gene trees can only identify the ”unzipped” version of the network, where unzipping a reticulation means moving the hybrid point as close as possible to its child node (see Pardi and Scornavacca 2015, for a rigorous description of “canonical” networks). This unzipping operation creates a polytomy after the reticulation point. We observed such polytomies for two events in our calibrated network (Fig. 7, bottom right). Pardi and Scornavacca (2015) proved that the lack of identifiability is worse for time-consistent networks, which violates their “NELP” property (no equally-long paths). Lack of identifiable branch lengths around reticulations is thus observed from different sources of input data, and requires more study. Methods utilizing multiple sources of data might be able to resolve the issue. For instance, gene tree discordance is informative about branch lengths in coalescent units around reticulation nodes, and could rescue the lack of information from other input data like pairwise distances or calibrated displayed trees. More work is also needed to study the robustness of transgressive evolution tests to errors in estimated branch lengths.
Comparison with Jhwueng and O’Meara (2017)
In their model, Jhwueng and O’Meara (2017) include hybridization events as random shifts. Using their notations, each hybrid k shifts by a coefficient log β + δk, with δk a random Gaussian with variance νH: δk ∼ 𝒩S (0, νH). This formulation provides a mixed effects linear model, with shifts appearing as random effects. In this framework, the test of heterogeneity (ℋ2 vs ℋ1) amounts to a test of null variance, νH = 0. In the context of mixed effects linear models, such tests are also well studied, but are known to be more difficult than tests of fixed effects (Lehman 1986; Khuri et al. 1998). Assuming that the variance νH is 0, our test for a common transgressive evolution effect (ℋ1 vs ℋ0) is then similar to the likelihood-based test for log β = 0 in Jhwueng and O’Meara (2017). A mixed-effect model is legitimate, although it might be more difficult to study theoretically, and its inference can be more tricky. Jhwueng and O’Meara (2017) indeed report some numerical problems, and rather large sampling error for both log β and νH. Current state-of-the-art methods to infer phylogenetic networks cannot handle more than 30 taxa and no more than a handful of reticulation events (Hejase and Liu 2016). Hence, it might not be surprising that estimating a variance νH for an event that is only observed two or three times is indeed difficult. On data sets with few reticulations, we believe that our fixed effect approach can be better suited. However, our approach adds a parameter for each hybridization event, whereas the random-effect approach of Jhwueng and O’Meara (2017) maintains only two parameters (mean and variance). As the available networks are likely to grow over the next few decades, this later approach might be preferable in the future.
Perspectives
As stated in the introduction, PCMs rely on two fundamental components: the species relationship model (tree or network), and the model of trait evolution. Here, we showed how a network could be used instead of a tree, but we used the most simple model of trait evolution (BM). Future developments could adapt some of the more refined models to the network framework, in order to capture the diverse tempo and modes of evolution. In doing so, the salient point to be careful about is the merging rule one might adopt for all these processes.
For instance, the Ornstein-Uhlenbeck (OU) process is popular to model trait evolution (Hansen 1997). It has extra parameters compared to the BM: a primary optimum for the trait, and α, a rubber band parameter that controls how the trait is pulled toward its optimum. Either one might vary across lineages. What behavior would be biologically realistic at reticulation points? For an OU with one single optimum value over the whole tree, the weighted average merging rule could be adopted. But how should transgressive evolution be modeled? With the OU process, shifts have been traditionally considered on the optimal value rather that directly on the process’ value, as we did for the BM (Butler and King 2004; Beaulieu et al. 2012). If a transgressive evolution shift is allowed on the optimum value, this would result in several optima on different regions of the network, which might not capture biological realism. A related problem is to find a realistic merging rule for reticulations between two species evolving in two different phylogenetic groups with different optima.
More generally, the numerous improvements that have been developed for PCMs on trees should be adapted to phylogenetic networks, such as support for measurement error or intra-specific variation (as in, e.g. Lynch 1991; Ives et al. 2007; Felsenstein 2008; Goolsby et al. 2017); distinct regimes of evolution on different regions of the network (see Beaulieu et al. 2012); and multivariate processes (Felsenstein 1985; Bartoszek et al.2012; Clavel et al. 2015).
Sticking with the vanilla BM, it could also be interesting to look into other merging rules at reticulation points. For instance, instead of taking a weighted average, one could draw either one of the two parents’ trait for the hybrid, with probabilities defined by the weights γa and γb of the parents. If such a rule could be justified from a modelling point of view, further work would be needed to derive the induced distribution of the trait at the tips of the network.
FUNDING
The visit of PB to the University of Wisconsin-Madison during the fall of 2015 was funded by a grant from the Franco-American Fulbright Commission. This work was funded in part by the National Science Foundation (DEB 1354793) and by a Vilas Associate award to CA from the University of Wisconsin-Madison.
PROOF OF THE VARIANCE FORMULA AND ALGORITHM
We prove here both formula (1) for the BM variance matrix and Proposition 1 giving an efficient algorithm to calculate this matrix. We do so by induction on the number of nodes in the network: N = n + m. When the network is made of a single node i = 1, equation (1) and Proposition 1 are obviously correct. We now assume that these results are correct for any phylogenetic network with up to N − 1 nodes, and we consider a network with N nodes. When these nodes are sorted in preorder, the last node i = N is necessarily a tip (with no descendants), so removing it and its parent edges from the original network gives a valid phylogenetic network with N − 1 nodes. Using the same notations as in the main text, we can focus on the case i = N. Because of the preorder, there is no directed path from i to j for any j < i. We use here the formulas of Definition 1, and assume σ2 = 1 without loss of generality.
If i is a tree node, then Xi = Xa + ε, with ε ∼ 𝒩 (0, ℓea), E independent of the values Xj in the subnetwork (j < i). Moreover, a < i because of the preorder. Then: and all the needed quantities on the right-hand side have already been computed because a < i. This proves (3) in Proposition 1. Next, we seek to prove (1). Note that it is valid by induction for all nodes in the subnetwork, and we just need to prove it for i = N and any j ≤ i. By induction, we have that, for any j < i,
Because a is the only parent of node i = N, any paths from the root to i must start as a path from the root to a, and then follow ea between a and i. In other words, any path from the root to a corresponds to a unique path from the root to i:
Moreover, the inheritance weight of path pa and pi = (pa, ea) are the same, because ea is a tree edge with Now take j < i. Any path pj from the root to j cannot go through i (because of the preorder), therefore it cannot go through ea, and the edges shared by pi and pj are exactly the same as the edges shared by pa and pj. Putting these considerations together, we get: which proves (1) for i = N and j < i. For j = i, any path pj from the root to j = i must go through a and ea, so that the shared edges between pi and pj are the edges shared by pa and pj, plus edge ea. Therefore, we get that where the last equality follows from This completes the proof of (1), for i = j.
If i is a hybrid node, then Xi = (γea Xa + γeb Xb) + (γea εa + γeb εb), with εk ∼ N (0, ℓek), and εk independent of the all values Xj in the subnetwork (j < i) for k = a and k = b. Again, a < i and b < i because of the preorder. Then:
This proves (4) in Proposition 1. Next, we focus on proving (1). Again, it is valid by induction for all nodes in the subnetwork, and we need to prove it for i = N and any j ≤ i. By induction, (1) holds for a, b, and any j < i. Then, because a and b are the only parents of i, any path pi from the root to i must go through a and ea, or through b and eb (and not both). In other words:
Now considering node j < i and a path pj from the root to j, pj cannot go through I so it cannot go through ea or eb. Therefore, the shared edges between pj and pi = (pa, ea) are exactly the same edges as those shared between pj and pa, and the shared edges between pj and pi = (pb, eb) are also the same as those shared between pj and pb. For j < i, we get: proving (1) for i = N and j < i. For j = i = N, we similarly decompose the set of paths 𝒫i into two sets, either going through a or through b:
This completes the proof of (1), for i = j, and for the last case when i is a hybrid node.
VARIANCE REDUCTION
Here, we prove Formula (2). As in the main text, consider a time-consistent network. For tip i, let ti be the length of any path from the root to i. If the history of tip i involves one or more reticulations then take any two paths pi and qi in 𝒫i. We have: , with a strict inequality if pi and qi are different paths. Seeing as the probability associated with the path pi , we get from Equation (1): with the equality fulfilled if there is a unique path from the root to taxon i, i.e. if i has no hybrid ancestry.
PAGEL’S λ VARIANCE
Proof of Proposition 2. In Equation 1, the first equation is straightforward, because all the edges shared by the paths to i and to j are internal edges, whose lengths are multiplied by λ. Now take a tip node i. The first step of the transformation ensures that i is a tree node. Let a be its parent node, and parent branch ea. From the recursive formula given in Proposition 1, the variance at node i is proportional to: hence the announced formulas.
SHIFTED BM MODEL WITH THE DESCENDENCE MATRIX
Proof of Formula (7). The shifts are fixed, so they do not impact the variance structure of the traits, and we only need to show that 𝔼 [Y] = TΔ. Here, we prove a slightly more general formula on the complete vector of trait values at all the nodes, that is: 𝔼 [X] = UΔ. The original equality is easily derived from this one by keeping the tip values only.
We show this equality recursively. Assume that the nodes are numbered in preorder. Denote by Ui the ith row-vector of U. Node i = 1 is the root, which is the descendant of no other node than itself, so
We now assume that 𝔼 [Xj] = UjΔ for all nodes j < i, and we seek to prove that this property is also true for node i.
If i is a tree node, then denote by a its unique parent and by ea the edge from a to i. For any node k ≠ i, 𝒫k→i = {(pa, ea): pa ∈ 𝒫k→a}. Since ea is a tree edge with γea = 1, we get from definition 3 that: hence
If i is a hybrid, then denote by a and b its two parents, by ea and eb the corresponding edges, with coefficients γea and γeb. Then for any node k ≠ i, we have: 𝒫k→i = {(pa, ea): pa ∈ Pk→a} ∪ {(pb, eb): pb ∈ 𝒫k→b}, and using definition 3:
Since no shift can occur on the hybrid branches, Δi = 0 by convention and:
This ends the recursion, and the proof of (7).
Note that this proof also gives an efficient recursive way to compute the descendence matrix U.
FISHER TEST FOR TRANSGRESSIVE EVOLUTION
The Fisher statistics used in Section Transgressive Evolution have the following expression: where ProjM denotes the projection onto the linear space spanned by the columns of matrix M, with respect to the metric defined by In other words, for any vector X:
These statistics follow a noncentral Fisher distribution as given in (9) and (10) of the main text, where
When studying the power of the test H1 vs H2, we took d = ddu, so that the noncentral coefficient is: and, as the networks are fixed, it only varies with the heterogeneity coefficient d.
Note that a third statistic, F20, can be defined in a similar way to test ℋ2 vs ℋ0 directly. We first re-write the linear model as: where there are no constraints on coefficients δ. Then the F statistic can be written as:
In the same way, it follows under ℋ2 a noncentral Fisher distribution:
With
Thank to the flexible framework provided by the GLM ftest function, all these tests are readily implemented, as long as one can fit the three models (ℋ0, ℋ1, and ℋ2).
ACKNOWLEDGMENTS
PB would like to thank Mahendra Mariadassou and Stéphane Robin for enlightening discussions, and useful comments on an early version of this work. PB also thanks Tristan Mary-Huard for sharing his extensive knowledge on the linear mixed model. The authors thank Mohammad Khabbazian for insights on the topological sorting algorithm.