Phylogenetic analysis suggests joint control of transmission mode in a grass-endophyte symbiosis

How symbionts are transmitted between hosts is key to determining whether symbioses evolve to be harmful or beneficial. Vertical transmission favors mutualistic symbionts, and horizontal transmission more virulent ones. Transmission mode evolution itself depends on whether the host or symbiont can respond to selection on transmission mode. When hosts control the transmission mode, vertical transmission should evolve under more restrictive circumstances than when symbionts are in control. We take a phylogenetic approach to determine whether the host, symbiont, or both control transmission mode using the pooid grass-epichloid endophyte symbiosis as a model system. This study is the first to investigate control of transmission mode evolution in a phylogenetic context. We find a signal of host phylogeny but only in conjunction with symbiont identity. This pattern suggests joint control of transmission mode by the host and symbiont. It also suggests that non-genetic or non-conserved symbiont traits may determine whether host traits lead to vertical or horizontal transmission.


Introduction
We study phylogenetic patterns of transmission mode in the symbiosis between cool-  Figure 2: Phylogenetic and transmission mode data. Rows represent host species, columns symbiont species. Squares represent the transmission mode exhibited by a host-symbiont pair, as in Figure 1. We detected a symbiont-specific host effect. The posterior mode of its intraclass correlation 95 (ICC) was 0.12 (12% of the total variance in transmission mode explained), with a 95% cred-96 ible interval of 1% to 27%. The host, symbiont, coevolutionary, and host-specific symbiont 97 effects each explained no more than than 0.2% of the variance in transmission mode. The 98 multivariate potential scale reduction factor [22] was 1.02. All effective sample sizes were > 99 170. When we used data from only one host and one symbiont tree, the symbiont-specific 100 host effect explained 11% of the variance. The other effects each explained < 0.4% of the 101 variance (Table S5). 102 In three out of four simulations of coevolution with fast symbiont evolution, we detected 103 a symbiont-specific host effect. In simulations of individual phylogenetic effects at normal 104 evolutionary rates, we generally detected the simulated effects, except for the coevolutionary 105 effect. We detected effects that we did not simulate in seven of fifteen simulations, although 106 in all but two simulations they appeared to be small (ICC ≤ 3%). In simulations of missing 107 data, we never detected effects not found in the original simulated data set, although in 108 seven of nine simulations we failed to detect effects found in the original data sets. Studies of specific grass-endophyte symbioses provide independent support for joint con-120 trol. Within-species genetic variation in horizontal transmission rate has been found in sym-121 bionts in the Pooideae-Epichloë interaction [12] and in both partners in the closely related 122 Danthonia spicata-Balansia hypoxylon symbiosis [23]. Furthermore, vertical transmission 123 rate is phylogenetically conserved in some pooid grasses [24] and epichloid endophytes [25].

124
Growth rate is a possible mechanism of transmission mode control. Horizontally-transmitted 125 endophytes outpace vertically-transmitted on certain sugars [26], while fast-growing host in-126 florescences can prevent symbionts from transmitting horizontally [27].

127
Two factors may have affected our estimates. First, some transmission data may be miss-128 ing or inaccurate, given that new interactions are still being discovered [28].

Transmission Mode Data
We collected transmission mode data from published studies. We searched Web of Science using the following search terms: (neotyphodium OR epichloe) AND ('transmission mode' OR 'horizontal transmission' OR 'vertical transmission' OR 'mixed-mode transmission' OR 'mixed mode transmission' OR 'pleiotropic symbiosis'). (Asexual species in Epichloë were formerly in the genus Neotyphodium). This returned 65 papers. After discarding 18 papers whose abstracts indicated that they were unlikely to contain transmission mode data, we gathered transmission mode data from the remaining papers. We obtained transmission mode data from 32 papers [references . 15 additional papers did not have any transmission mode data. We recorded transmission mode as horizontal transmission, vertical transmission, mixedmode transmission, or no transmission for each pair of host and symbiont species. A species pair was recorded as employing horizontal transmission if this was the only transmission mode reported for the pair. Similarly, vertical transmission was recorded when this was the only reported transmission mode for the species pair. A species pair was recorded as exhibiting mixed-mode transmission if it was reported to show both vertical and horizontal transmission. If no transmission data was available for a species pair, we recorded the pair as not forming a symbiosis.
Because horizontal transmission occurs via the dispersal of ascospores (although recent evidence suggests sexual reproduction is not always necessary for horizontal transmission [18,32,33]), a report that a symbiont was capable of reproducing sexually on a host was considered to be evidence of horizontal transmission.

Phylogenetic Data
We gathered phylogenetic data from TreeBASE. We used the "All text" search option and searched for the genera of the host and symbiont species in the transmission mode data set. The search terms we used are given in Table 1. Because asexual Epichloë were previously members of the genus Neotyphodium, we also used Neotyphodium as a search term. Furthermore, we appended a space when searching for members of the genus Poa to reduce irrelevant results. "Pooideae"; "Achnatherum"; "Agrostis"; "Ammophia"; "Anthoxanthum"; "Avena"; "Brachyelytrum"; "Brachypodium"; "Bromus"; "Calamagrostis"; "Cinna"; "Dactylis"; "Echinopogon"; "Elymus"; "Festuca"; "Glyceria"; "Holcus"; "Hordelymus"; "Hordeum"; "Hystrix"; "Koeleria"; "Leymus"; "Lolium"; "Melica"; "Milium"; "Phleum"; "Poa " [a space was appended to prevent return of results related only to Poacea]; "Puccinellia"; "Roegneria"; "Sphenopholis" Symbiont phylogenies "Epichloe"; "Epichlo*"; "Neotyphodium" We used the R package APE [34] to remove species not present in the transmission mode data set from the trees. We deleted any trees with fewer than two species in the transmission mode data set. Because the host tree search results contained some endophyte phylogenies, we deleted any host search results that contained endophyte species. The host and symbiont phylogenies we used in the analysis are given in Tables 2 and 3, respectively. Some phylogenies had multiple tips corresponding to the same species. We used Dendroscope's "MUL to Network, Cluster-based" algorithm [35] to merge those species that were present twice or more in a single tree. The algorithm indicated a hybrid origin for some symbiont species. These were deleted from the trees in which they appeared to be hybrids, because we were unable to use phylogenetic networks for further analysis. To maintain as much phylogenetic information as possible, we did not delete these species from trees in which the the algorithm did not indicate a hybrid origin.

Supertree Analysis
We used Clann [36] to find a set of equally probable host and symbiont supertreees from the trees produced merging identical tips and removing hybrids. We used the "Sub-tree Pruning and Regrafting" search algorithm, the "Most Similar Supertree" criterion, with the maximum number of steps as 3, the maximum number of swaps as 1,000,000, and 10 repetitions of the search. We used the comparisons weighting scheme and started from a neighbor-joining tree found from the average consensus distances. Missing data were estimated using the 4 point condition distances.
We combined the equally probable supertrees into a single majority consensus tree with Dendroscope's "MUL to Network, Cluster-based" algorithm. We used these majority consensus supertrees for the main phylogenetic effects analysis.

Single Tree Analysis
To test the effect of combining multiple phylogenetic trees in the analysis, we also estimated phylogenetic effects using the individual phylogenetic trees with the largest overlap with the transmission data set (Figure 1. These were T5125 for the hosts and T362 for the symbionts, both from Schardl et al. [37]. Species not present in the transmission mode data set were removed from both trees. The trees were then ultrametricized. They and the transmission data for the species in them were then used to estimate phylogenetic effects.

Model of Phylogenetic Effects
In our data set, transmission mode is a categorical trait that can take on four possible values (horizontal transmission, vertical transmission, mixed-mode transmission, and no transmission). Currently, there is no method for estimating host and symbiont phylogenetic effects directly from categorical data. Thus, we modeled transmission mode for each host-symbiont pair as a 3-dimensional binary trait, following the recommendation of Hadfield and Nakagawa for estimating phylogenetic effects on categorical traits [68]. We modeled phylogenetic effects as covariances induced between the logarithms of the probabilities of each species pair expressing a given transmission mode [69,70].   We can estimate the phylogenetic effect strengths most likely to have produced the observed transmission data if we make some assumptions about the distribution of Y . Specifically, we assume that Y has the matrix normal distribution given below, where σ 2 i are the phylogenetic effect strengths to be estimated and V i are the covariance structures induced by phylogenetic effects (see Table 4).

Phylogenetic Effects Estimation
We estimated phylogenetic effects using the R package MCMCglmm [71]. We ran two MCMC chains for 10 6 iterations each, with overdispersed starting values of (1) 10 −10 for the phylogenetic effects and −8 for the latent variables for one chain and (2) 5 for the phylogenetic effects and −8 for the latent variables for the other. We used a thinning interval of 500 iterations and no burn-in. The priors for the phylogenetic effect strengths were F distributions of σ 2 i 1000 with df1 = 1, df2 = 0.002. We used the default prior for the mean, which was a multivarite normal distribution with mean 0 and variance 1 · 10 10 · I 3 , where I 3 is the 3 × 3 identity matrix.
We used slightly different analysis parameters for the single tree transmission data set because it was smaller and harder to get the MCMC chains to converge. We ran the chains for 10 7 iterations to allow more time for convergence. We also used a burn-in of 2000 to allow MCMCglmm to adjust the proposal distribution for the first 2000 iterations in hopes of getting a better acceptance rate. Finally, we decreased the among-column covariance (the covariance between the log ratios of the transmission modes) from to decrease the chance of the latent variables taking on extreme values and causing numerical problems [72]. (We adjusted for this change in the among-column covariance when comparing the single tree and supertree results, discussed below.) Despite this effort to avoid numerical problems, we had to stop the analysis after 10 7 iterations because the latent variables in one chain become too small. Fortunately, the chains appear to have converged before this point.
We rescaled our estimates of the means and calculated intraclass correlations for the phylogenetic effects, which together should have removed any difference due to differences in the among-column covariance. For all analyses, we rescaled our estimates of the means to reflect the case where the among-column covariance was 0 using the method of Diggle et al. 2002 as cited in [72]. We calculated intraclass correlations (ICCs) for the phylogenetic effects using the formulaσ 2 i σ2 i +π 2 +tr(among-column covariance) , whereσ 2 i is the estimate of phylogenetic effect i. We did this for each saved iteration of the MCMC chains to get a posterior distribution of the ICCs.
We used the posterior mode as a point estimate of the ICC. We calculated the posterior mode using MCMCglmm's posterior.mode function with the parameter adjust (the scaling for the bandwidth) set to 1. We obtained credible intervals for the ICCs using Coda's HPDinterval function to get 95% highest posterior density intervals.
We checked for chain convergence using the multivariate potential scale reduction factor (MPSRF) [73] and the effective sample size, both calculated with Coda.
To analyzed our simulation results, we needed a simple way to determine whether we detected a phylogenetic effect or not. We considered an effect to have been detected if the posterior mode of its intraclass correlation was ≥ 0.02, or 2% of the total variance.

Simulations
To determine whether the observed phylogenetic pattern can emerge from a combination of coevolutionary interactions and faster rates of evolution along the symbiont phylogeny relative to the host, we simulated transmission mode data for the case where hosts and symbionts coevolved control of transmission mode. We used the same host phylogeny used for the supertree analysis. We modified the symbiont supertree to simulate faster evolution in the symbiont. We did this by reducing the correlation between related symbionts by a factor of either 2 or 20 (but not the correlation of a symbiont with itself, which is always 1). We then simulated Y sim as a matrix normal random variable with the variance structure given for Y above, with all phylogenetic effects equal to 10 −8 except the coevolutionary effect, which we set to 4. We set the mean of Y sim using The frequencies of each transmission mode were obtained from the transmission mode data set used for the supertree analysis. We also simulated each phylogenetic alone to test the accuracy of the phylogenetic effect estimates. For these simulations, we set one phylogenetic effect at a time to 4, and the others to 10 −8 . For each phylogenetic effect, we simulation transmission data three times. In two simulations (simulations 1 and 2 in Table 7) we simulated the case where the transmission modes were about four times as prevalent as in the dataset used for the supertree data set. Thus for Y sim we had where the frequencies of transmission modes and symbioses are those in the data set used for the supertree analysis.
In the third case (simulation 3 in Table 7), we simulated the case where the transmission modes were as prevalent as in the supertree data set. In this case, Y sim was the same as for the fast symbiont evolution simulations above.
We also simulated missing data for the simulations where the phylogenetic effects were four times as prevalent as in our supertree data set. We did this by randomly labeling 75% of host-symbiont pairs as not forming a symbiosis, whether or not they really did form a symbiosis. This meant that the simulated missing data sets had approximately the same fraction of symbioses recorded as our actual data set.  2 Results for Analysis of Real Data

Supertree Analysis Results
The posterior distribution of the intraclass correlations of the phylogenetic effects is given in Figure 2. The symbiont-specific host effect's distribution is centered around about 12% of the total variance. The host, symbiont, coevolutionary, and host-specific symbiont effects all have most of their mass near 0% of the variance explained.

Single Tree Analysis Results
Like the supertree, the symbiont-specific host effect had the only the posterior distribution with a large amount of mass on nonzero values (Figure 3). The posterior mode of the symbiont-specific host effect's intraclass correlation was 0.11, meaning it explained 11% of  Table 5: Estimated phylogenetic effects for analysis using single phylogenetic trees. Intraclass correlation given is the posterior mode.
the total variance, with a 95% credible interval of 0% of the total variance to 82% ( Table 5). The other phylogenetic effects had posterior modes of ≤ 0.4% of the total variance. All the phylogenetic effects had larger credible intervals than their counterparts in the supertree data set. Further, the lower bounds of the credible intervals of the phylogenetic effects were all very close to 0% of the total variance. This may be due to the difficult of inferring phylogenetic effects on a small transmission data set.
Because of numerical problems with the MCMC chains used to estimate the phylogenetic effects for the single tree data set, we had to stop the analysis after 10 million iterations. Although the chains appeared to converge when we examined the trace plots, the multivariate potential scale reduction factor was 1.30, and the effective sample sizes of the means of log ratios of the transmission modes were in the range of 67 to 83. The symbiont-specific host effect had an effective sample size of 139. The other phylogenetic effects had effective sample sizes > 1000.

Fast Symbiont Evolution
Three of our four simulations of coevolution combined with fast symbiont evolution appeared to have a symbiont-specific host effect ( Table 6). The posterior modes of the ICC of the symbiont specific host effect ranged from 3.8% of the total variance to 11% in these three simulations. In the fourth simulation the symbiont-specific host effect explained only 0.6% of the variance. In this simulation, the host effect had the largest ICC, explaining 3.4% of the variance. The only other phylogenetic effect detected in any simulation was a coevolutionary effect which explained 3% of the variance in one of the simulations where a symbiont-specific host effect was detected.
Based on our simulation results, it looks like it is possible for a coevolutionary effect  to be mistaken for a symbiont-specific host effect when the symbiont is evolving quickly. However, we did have some difficulty in detecting a coevolutionary effect in simulations where the symbiont was evolving at the same speed as the host. It is possible that regardless of the speed of symbiont evolution, the coevolutionary effect is generally easy to mistake for a symbiont-specific host effect (which we detected in all three simulations of normal-speed coevolution below).

Single Phylogenetic Effects
We didn't see much difference between the data sets simulated with four times the frequency of symbioses in the real data set and those simulated with the same same frequency as in the real data. We never detected a coevolutionary effect, but otherwise we were generally successful detecting the phylogenetic effects we simulated, detecting them in at least two out of three simulations. Unfortunately, we also detected phylogenetic effects that were not present in seven of our fifteen simulated data sets (see Table 7). When we simulated the coevolutionary effect, we detected effects other than it in three out of three simulations. In two cases we detected a symbiont-specific host effect, once in conjunction with a host effect. We detected a host-specific symbiont effect in the third simulation. Besides the coevolutionary effect simulation, there didn't seem to be a pattern to which simulations produced false positives.
When we detected phylogenetic effects that weren't simulated, their posterior modes were ≤ 3% in five cases. One larger effect was the symbiont-specific host effect detected in one symbiont effect simulation, which had a posterior mode of 5.8%. And in one simulation of the coevolutionary effect, a host effect was detected with posterior mode of 8.2%, and the symbiont-specific host effect had a posterior mode of 4.6%. In all cases, the MCMC chains appeared to converge. The MPSRF was ≤ 1.04 in all analyses, and the effective sample size was ≥ 200.
Our simulation results suggest that we don't have much difficulty detecting phylogenetic effects other than the coevolutionary effect, which appears to be strangely difficult to detect. However, our data set is too small for us to be certain that we will only detect phylogenetic effects that are really present in the data.

Missing Data
When we simulated missing data, we never detected effects that weren't detected in the original simulated data set (see Table 8). We failed to detect at least one effect that was detected in the original simulated data set in seven out of nine cases where at least one phylogenetic effect was detected in the original data sets. In six cases where a phylogenetic effect was detected originally, we detected no phylogenetic effects at all.
The multivariate potential scale reduction factor was ≤ 1.1 for all but one analysis, which had a MPSRF of 1.11. The effective sample size was > 100 in all analyses, and generally much larger.
Our results suggest that missing data in our real data set may cause us to fail to detect phylogenetic effects that are really present. It is possible but less likely that any missing data caused us to detect phylogenetic effects that are not really present. Table 7: Estimated phylogenetic effects for simulated data. Colored boxes indicated phylogenetic effects that were detected (posterior mode of intraclass correlation was > 0.04). Blue boxes and bold text indicate that the effect detected was the one simulated. Red boxes and italic text indicate that the effect detected was not simulated. Simulations 1 and 2 for each phylogenetic effect had the likelihood of each transmission mode set to four times its frequency in the true supertree data set. Simulation 3 had the likelihood of each transmission mode the same as in the true data set.