On the importance of accounting for intraspecific genomic relatedness in multi-species studies

1. Analyses in many fields of ecology are increasingly considering multiple species and multiple individuals per species. Premises of statistical tests are often violated with such datasets because of the non-independence of residuals due to phylogenetic relationships or intraspecific population structure. Comparative approaches that account for the phylogenetic relationship of species—for which the benefits are demonstrated—are well developed. However, the importance of accounting for the intraspecific genetic structure, especially with the phylogenetic structure, is rarely addressed in the ecological literature. 2. We investigated whether it is beneficial to account for intraspecific genomic relatedness in multi-species studies. For this, we used a Phylogenetic Mixed Model to analyze simulated data and results from a budburst experiment where clippings of 10 tree and shrub species were subjected to different treatments in terms of temperature and photoperiod. 3. We found that accounting for intraspecific genomic relatedness yields more accurate and precise fixed effects as well as increased statistical power, but more so when the relative importance of the intraspecific to the phylogenetic genetic structure is greater. Analysis of the budburst experiment further showed that accounting for intraspecific and phylogenetic structures yields improved estimates of warming and photoperiod effects and their interactions in explaining the time to budburst. 4. Our results show that important statistical gains can be made by incorporating information on the intraspecific genomic relatedness of individuals in multi-species studies. This is relevant for investigations that are interested in intraspecific variation and that plan to include such observations in statistical tests.


Introduction 25
The reactions of different species to external stimuli are not independent. Because physiological 26 responses have a genetic basis, closely related species are more likely to have similar responses to a 27 specific treatment. This phylogenetic non-independence of species responses violates the assump-28 tions of most statistical tests, such as the independence of residuals in regression, and negatively 29 impacts the results in terms of parameter estimates and p-values (e.g., Revell, 2010). This has been 30 recognized for some time and a family of methods-comparative methods-have been developed to 31 address this problem (Felsenstein, 1985;Grafen, 1989;Lynch, 1991;Ives et al., 2007;Felsenstein, 32 2008; Revell, 2010;Hadfield and Nakagawa, 2010). 33 Recently there has been growing awareness among ecologists of the need to also consider in- studies rarely account for both phylogenetic and intraspecific genetic correlations simultaneously, 43 even though the sampling structure in many ecological studies calls for such a design. 44 Presently, studies that account for phylogenetic correlation almost always ignore intraspecific 45 genetic structure and as such assume that intraspecific samples are drawn from a single population. 46 In contrast, many ecological studies explicitly sample individuals across important geographical 47 ranges or from populations among which gene flow could be restricted, resulting in a potentially 48 non-trivial correlation structure among samples. If this correlation is important, statistical tests 49 that do not account for it are expected to be biased. 50 3 Until recently, the difficulty of obtaining genetic data to accurately estimate intraspecific genetic 51 correlations provided sufficient justification for ignoring this source of variance in ecological stud-52 ies. But the rapid development of sequencing techniques now allows precise estimation of genetic 53 relatedness (Gienapp et al., 2017), at relatively affordable prices. This could allow a better under-54 standing of how ecological responses are influenced by the genetic relationships among species and 55 populations at once. One area where this potential is particularly high is climate change research, 56 where evidence of rapid ecological and evolutionary change is growing. Research has highlighted 57 that species responses to climate change appear phylogenetically patterned, with species from cer-58 tain clades and with particular traits appearing most vulnerable to local extinctions with warming 59 (Willis et al., 2008). At the same time other work has highlighted discrepancies in species re-60 sponses when studied over space (Charmantier et al., 2008), suggesting populations within species 61 may show different responses to climate change. This is supported by population-level research 62 that has found large differences in the range shifts of northern versus southern populations' with 63 warming (Anderson et al., 2009;Chen et al., 2011). Such results make clear that the best estimates 64 of responses will need methods that consider variation at both the species and population levels, 65 and the connections between different populations and different species, all at once.

66
The objectives of this report are to assess the importance of accounting for intraspecific genetic 67 correlations in ecological studies. To achieve this, we use the phylogenetic mixed model (PMM) 68 that allows multiple levels of genetic structure to be considered simultaneously (Lynch, 1991 none currently provides as much flexibility as the PMM. We assess the importance of accounting 72 for intraspecific genetic correlation structure using simulated data and provide a climate change-73 related empirical example where we investigate the importance of temperature and photoperiod on 74 the timing of budburst of ten tree and shrub species. where y is the response variable, µ is the intercept, x is an explanatory variable, β the regression 86 coefficient, a represents the effects due to the phylogenetic structure, b the effects due to the 87 intraspecific structure, and e the residuals. x is a fixed effect (there could be more than one), 88 whereas a and b are random effects. The random effects and residuals are assumed to follow 89 normal distributions: σ 2 a is the phylogenetic variance, σ 2 b is the intraspecific variance, and σ 2 e is the residual variance.

91
The matrices A and B represent the phylogenetic and the intraspecific correlation structures, 92 respectively. The identity matrix I indicates that the residuals are independent and identically 93 distributed. Accordingly, the (co)variance structure (V) of the model is V = σ 2 a A + σ 2 b B + σ 2 e I.

94
The PMM allows the estimation of the proportion of the total variance (σ 2 = σ 2 a + σ 2 b + σ 2 e ) that 95 is due to the genetic structure; this is the heritability (h 2 ) parameter of quantitative geneticists 96 (Housworth et al. 2006). In the present context, heritability is the quotient obtained by dividing 97 the sum of the phylogenetic and intraspecific variances by the total variance: h 2 = (σ 2 a + σ 2 b )/σ 2 .

98
The remaining variance, 1 − h 2 = σ 2 e /σ 2 , is the non-genetic variance that could be due to the 99 environment or other effects that impact the individuals in a way that is not defined by the genetic 100 correlation structures. The PMM also allow the estimation of the relative contribution of the 101 intraspecific correlation structure to the total genetic structure, which is σ 2 b /(σ 2 a + σ 2 b ).

102
Phylogenetic generalized least squares

103
A brief mention of PGLS seems important as it is a popular comparative method. A PGLS model 104 that would include phylogenetic and intraspecific correlation structures could be denoted as: In other words, the residuals of the regression are normally distributed according to a correlation 106 structure that is a combination of phylogenetic and intraspecific effects, with respective weights 107 determined by the parameter δ. The main difference with the PMM is the absence of a residual 108 term: in PGLS residuals are completely structured by the genetic correlation matrices provided.

109
This assumption can be relaxed by rescaling the phylogenetic tree to give more or less weight to 110 the terminal branches of the tree (Revell, 2010). We do not consider this PGLS model further here, 111 but it is further developed and compared to the PMM using simulations in Appendix S1.

159
The data were analyzed in MCMCglmm with warming, photoperiod, and their interaction as fixed 160 effects. For the random effects, we fitted the four models used in the simulations in terms of 161 variance structure. We used the same priors as for the simulated data, but ran the chains for 166

167
Simulations 168 The null model without genetic correlation structure always performed worst in terms of precision 169 and accuracy, whereas the intra and inter+intra models performed best (Fig. 1). The inter model 170 with only phylogenetic structure did not perform as well as models intra and inter + intra, but its

179
All models had similar type I error rates (Fig. 2), except for the intra model that was slightly 180 higher, especially for increasing importance of the phylogenetic structure. The power of the models 181 was similar for β = 0.1, whereas the models intra and inter+intra had the best power for β = 0.25.

182
The power of model inter with β = 0.25 improved with increasing importance of the phylogenetic 183 effect and approached the performance of intra and inter + intra when the phylogenetic effect was 184 three times as important as the intraspecific effect (i.e., when σ 2 a :σ 2 b = 1.5 : 0.5).

185
Varying the amount of population structure had little impact on the results (Appendix S1; Figs.  Figure 1: Results of the simulation study for the four variance structure models in terms of slope accuracy and precision, and for estimates of heredity (h 2 ) with 10 species and 10 individuals per species. Accuracy is the mean absolute distance between the estimated slope (β) and the true slope (β), precision is the mean of the standard deviation of the posterior distribution ofβ for each simulation, and h 2 is proportion of the total variance explained by the genetic correlation structure (the dashed line indicates the true value). The x-axis indicates the ratio of phylogenetic (σ 2 a ) to intraspecific (σ 2 b ) variances used in the simulations. Only the results for β = 0.25 are shown as these results were not influenced by the slope.  when data were simulated for a single species, that is without phylogenetic effects (Figs. S5, S6).  The wider posterior intervals obtained for the fixed effects with the null model illustrate the 205 importance of taking into account the genetic structure present in the data (Fig. 3). The other 206 models gave similar results, but there was a slight improvement in precision when the intraspecific 207 genetic structure was included. Notably, the interaction between warming and photoperiod was 208 estimated with less variance when the intraspecific structure was accounted for, whereas the same 209 interaction had higher variance (and the 95% posterior interval included 0) under the null model.

210
The random effects can be inspected to see how the total variance of the models was partitioned 211 ( Table 1). The model where only the phylogenetic structure was incorporated (inter) explained 65% 212 of the total variance, which is much more than when only the intraspecific variance was accounted 213 for (intra; 33%). When both genetic sources of variance were estimated (model inter + intra), the  Table 1: Mean proportion of the total variance explained by the random effects of the models fitted to explain change in days for budburst, with their 95% posterior intervals (in brackets). The heredity (h 2 ) and the proportion of the total genetic structure due to the intraspecific correlation (σ 2 b /(σ 2 a + σ 2 b )) are given for models where they can be estimated. explained by the intraspecific variance is small in model inter + intra (4.4%), it is significant and 217 helped to reduce the unexplained variance in the model (Table 1). The proportion of the total 218 genetic variance due to the intraspecific structure in the inter + intra model was estimated to be 219 6.8%, which is modest but significantly greater than 0.

220
The results from the best model (inter + intra) suggest that the 5 • C warming treatment had 221 the strongest effect on budburst, followed by a longer (four hours) photoperiod (Fig. 3). The 222 interaction between these effects was positive and significant, suggesting that they are not additive.  The models incorporating intraspecific genetic correlations performed well because they parti-235 tion the total variance in the data-that would otherwise be classified mostly to the error term-to 236 the genetic correlation structure(s). This is shown by the higher values of the variance due to ge-237 netic effects (heredity; h 2 ) in models that included intraspecific genetic correlations ( Fig. 1; Table   238 1).

239
In addition to providing more accurate fixed effects, the finer partitioning of the total variance 240 gives a better understanding of the study system by quantifying the proportion of the variance that 241 is due to the intraspecific genetic structure. Notably, the improved performance observed when 242 incorporating the intraspecific structure is not due to the specific modelling framework used in this 243 study, namely the phylogenetic mixed model. Indeed, the simulations we performed under a PGLS 244 model that incorporates intraspecific structure also showed a marked improvement in performance 245 compared to standard ordinary least squares (Appendix S1; Figs. S7, S8). correlation structure has little effect on the data; in such cases the estimated variance due to 269 intraspecific structure will simply be small.

270
Another advantage of the PMM is that it allows modeling several random effects simultaneously 271 (Garamszegi, 2014). In our analyses, the total variance of the model included a phylogenetic 272 fraction, an intraspecific fraction, and a residual fraction. But it would be straight-forward to also 273 add a random effect that could account for measurement error, given the appropriate study design.

274
In contrast, the PGLS approach we introduced only considered a phylogenetic and an intraspecific The importance of accounting for intraspecific genomic relatedness will depend on the importance 287 of the intraspecific genetic structure. Our results showed that the advantages gained from this 288 approach are more important with greater population structure (obtained with smaller effective 289 population sizes, restricted gene flow and longer divergence times) and when the intraspecific vari-290 ance has a greater relative importance compared to the phylogenetic variance (provided that the 291 phylogenetic structure is corrected for). In our empirical example, the variance explained by the 292 intraspecific correlation structure was small but significant. The modest effect might be due to 293 the fact that the ten species sampled belonged to distinct angiosperm orders, representing impor-294 tant evolutionary distances that may strongly affect the reactions of individuals to the treatments 295 compared to the intraspecific structure. However, the genetic data showed important population 296 structure between the two sampled populations, which probably explains why the inter + intra 297 model clearly better fitted the data.

298
The gain from modelling intraspecific correlation structure also depends on the genetic nature 299 of the traits studied. Indeed, traits under strong genetic control could react in a way that more 300 closely fit the genetic correlation structure. It has recently been reported that spring phenology is 301 particularly plastic across populations (Aitken and Bemmels, 2016), and this might have contributed 302 to the small estimated relative importance of the intraspecific genetic structure. It is possible that 303 the study of other traits with a stronger genetic basis (e.g., timing of budset) could have resulted 304 in larger improvements when accounting for the intraspecific correlation structure.

305
Comparative methods are being increasingly used to correct for the phylogenetic non-independence 306 of species in statistical tests, in part because of the ease with which one can obtain a well resolved 307 phylogeny. Our results show that important gains can also be obtained by accounting for the 308 intraspecific genetic structure.