Genetic differentiation is determined by geographic distance in Clarkia pulchella

Both environmental differences and geographic distances may contribute to the genetic differentiation of populations on the landscape. Understanding the relative importance of these drivers is of particular interest in the context of geographic range limits, as both swamping gene flow and lack of genetic diversity are hypothesized causes of range limits. We investigated the landscape genetic structure of 32 populations of the annual wildflower Clarkia pulchella from across the species’ geographic range in the interior Pacific North-west. We tested whether climatic differences between populations influenced the magnitude of their genetic differentiation. We also investigated patterns of population structure and geographic gradients in genetic diversity. Contrary to our expectations, we found an increase in genetic diversity near the species’ northern range edge. We found no notable contribution of climatic differences to genetic differentiation, indicating that any processes that might operate to differentiate populations based on temperature or precipitation are not affecting the putatively neutral loci in these analyses. Rather, these results support seed and pollen movement at limited distances relative to the species’ range and that this movement and the subsequent incorporation of immigrants into the local gene pool are not influenced by temperature or precipitation similarities among populations. We found that populations in the northern and southern parts of the range tended to belong to distinct genetic groups and that central and eastern populations were admixed between these two groups. This pattern could be the result of a past or current geographic barrier associated with the Columbia Plateau, or it could be the result of spread from separate sets of refugia after the last glacial maximum.

Introduction that did not have satisfactory 260/230 or 260/280 ratios were cleaned with ethanol precipitation. DNA was eluted and stored in 10mM Tris-HCl pH 8.

124
Library preparation and sequencing 125 We prepared for two lanes of sequencing, with six individually barcoded samples from each population in 126 each lane (191 or 192 individuals per lane, because we only had DNA of a high enough quality from a total 127 11 individuals from one population). Our library preparation protocol was a modified version of Poland  Reactions were then cleaned with 1.6 volumes of SPRI beads and two 80% ethanol washes and resuspended 135 in 12 µL of Tris-HCl pH 8. 136 Amplification was carried out in 10 µL reactions using 4 µL of cleaned ligation product, Kapa HIFI  After amplification, samples were quantified using fluorometry, then each plate was pooled according to 140 individual concentrations to yield a final product with equal amounts of library from each individual. This 141 pooled library was run out on a 1.5% agarose gel and bands containing fragments 400 to 600 bp long were 142 excised and cleaned using a gel extraction kit (Qiagen). The eluted product was cleaned and concentrated 143 using SPRI beads. 144 Finally, we reduced the number of high copy fragments from our library using a protocol modified by M. 100 bp reads on the Illumina HiSeq 2000 platform at the Biodiversity Research Centre at UBC. 155 Sequences were processed and aligned using components of the Stacks pipeline (version 1.40, Catchen et al., 156 2011(version 1.40, Catchen et al., 156 , 2013. Reads with uncalled bases or low quality scores (average quality in a 14-base sliding window <10) 157 were discarded. Ten samples had far fewer reads than the rest and these were excluded prior to alignment. 158 Paired end reads were pooled with first end reads, i.e. during alignment and SNP detection the two ends of 159 each read were treated as if they were independent loci (we later checked for linkage disequilibrium among 160 SNPs). During initial "stacking" and catalog building we allowed sequences to diverge at 3 bases, and set 161 the minimum depth of coverage required to create a stack at 3 (Rochette and Catchen, 2017). Modifications 162 to these parameters did not result in substantial differences in values of pairwise F ST (data not shown). The 163 maximum number of stacks per locus was set to 3, and gapped alignments were not allowed. We enabled the 164 removal algorithm, which drops highly repetitive stacks (removes initial stacks that have >2 SD coverage 165 relative to individual sample mean), and the deleveraging algorithm, which breaks up or removes over-merged 166 sequences. Our catalog was built using all samples. We employed the rxstacks corrections module to correct 167 or omit loci with putative sequencing errors, loci with low log-likelihoods (<-10), confounded loci, and loci 168 with excess haplotypes.

169
SNP tables were generated using the populations module of Stacks. Initial inspection of PCA plots using 170 SNPRelate (Zheng et al., 2012) revealed three individuals that were not clustering with the other individuals 171 from their populations. We consider it more plausible that these represent mis-labeled samples in the field, 172 greenhouse, or lab than long-distance migration events. Downstream analyses were performed without these 173 individuals. Therefore, in our final dataset, seven populations had only 11 individuals, one population 174 had only 10, one population had only 8, and the remaining 23 populations were each represented by 12 175 individuals. In our analyses we included only loci that had coverage of at least 12x in 75% of individuals 176 in 75% of populations, with a minimum minor allele frequency of 0.05 and a maximum heterozygosity of 177 70% across all populations. We checked that pairwise F ST was not sensitive to these parameter choices. In 178 case of multiple SNPs occurring in a single locus, we kept just the first one. After applying these filters, 179 2983 SNPs were retained. Linkage disequilibrium was generally low among our loci (r 2 <0.2 for 99.9% of 180 pairs of SNPs). F ST was calculated using the implementation of Weir and Cockerham (1984) and expected 181 heterozygosity (within-population gene diversity) was calculated using methods from Nei (1987) in the R 182 package hierfstat (Goudet and Jombart, 2015). Because populations varied in the average proportion of loci 183 that were successfully genotyped (three populations had <60% success; among all populations the median 184 success rate was 78% and the range was 23-92%), we checked that expected heterozygosity did not correlate with genotyping success rate (r = 0.27, P = 0.13).

186
Quantifying isolation by environment vs. isolation by distance 187 We used BEDASSLE (Bradburd et al., 2013) to estimate the relative contributions of geographic distance 188 and climatic differences to genetic differentiation. BEDASSLE is implemented in R (R Core Team, 2017), 189 and it employs a Markov chain Monte Carlo (MCMC) algorithm to estimate the relative effect sizes of 190 geographic distance and environmental differences on covariance in allele frequencies among populations. As 191 environmental covariates, we used pairwise differences in average September-July temperature and average 192 spring/summer precipitation (April-July). We initially generated resistance-weighted distances between 193 populations using projected habitat suitability as a conductance matrix, but these distances were highly 194 correlated with actual geographic distances and did not produce better model fits in preliminary analyses, so 195 we did not use them in these models. We estimated effect sizes of geography, temperature, and precipitation 196 differences using all 32 populations, but also ran BEDASSLE for subsets consisting of populations clustered 197 in the central and northern parts of the range (indicated in Table S1) to see if we could detect effects 198 of the environment that may be obscured or weakened at large geographic scales. Prior to analysis, we 199 divided pairwise geographic distance and precipitation differences by their standard deviations so that these 200 predictors were on a scale more similar to pairwise temperature differences. We ran these models for 10 201 million generations, and thinned the chains by sampling every 1000 generations. We visually inspected 202 MCMC traces and marginal distributions to ensure that models reached stationary distributions. All results 203 are reported after a burn-in of 20%, with effect sizes back-transformed to the scale of the original data. 204 We checked these results against partial Mantel tests of pairwise geographic, temperature, and precipitation 205 differences on pairwise F ST using the R package phytools (Revell, 2012). We did not rely upon partial Mantel 206 tests as our main analytical method because of their potential to have inflated Type I error rates (Guillot 207 and Rousset, 2013).

208
Assessment of spatially continuous vs. discrete genetic differentiation 209 We were interested in evaluating whether population structure was well-described by modelling populations 210 as admixtures between multiple discrete genetic groups, as might be caused by geographic barriers (i.e., the 211 Rocky Mountains) or historic phylogeographic processes. We evaluated how well models prescribing various 212 numbers of discrete genetic groups described differentiation and similarity among our populations using 213 conStruct (Bradburd et al., 2017). conStruct is implemented in R (R Core Team, 2017), and is similar to the 214 frequently-used program Structure (Pritchard et al., 2000) but allows genetic differentiation to increase with 215 geographic distance between populations even when these populations draw from the same genetic groups.

216
In the spatial implementation of this program, populations are composed of admixture from a user-specified 217 number of discrete layers (K), and genetic similarity decays with geographic distance within each of these 218 layers. We ran conStruct for 1000 iterations setting the number of layers to 1, 2, 3, 4, and 5. We compared 219 the fits of each of these different parameterizations using cross-validation and by evaluating the contribution 220 of each additional layer to the total covariance of these loci. For cross-validation, we fit models with subsets 221 containing 90% of loci and evaluated the resulting model fit by calculating the log likelihood of the remaining 222 loci. We performed 100 replicate cross-validation runs.

223
Exploring spatial patterns in genetic diversity 224 We examined whether population genetic diversity (as estimated by expected heterozygosity) exhibited 225 geographic trends. We used linear models in R (R Core Team, 2017) to test whether expected heterozygosity 226 was predicted by latitude or by proximity to the range edge (as measured by the distance of a population 227 to the nearest edge of a polygon drawn around all localities of the species; Figure 1).

229
Isolation by environment vs. geographic distance 230 Overall F ST among these populations is 0.135. Genetic differentiation between populations of Clarkia pul-231 chella is primarily structured by geographic distance, with no apparent contribution of the environmental 232 variables that we have considered here ( Figure 3). The effect size of a temperature difference of one degree 233 (C) relative to the effect of 100 km of geographic distance is 1.18 x 10 -7 (95% credible interval = 8.52 x 10 -8 234 -1.58 x 10 -7 ; Figure S1A), and the effect of 10 mm of spring/summer precipitation difference relative to 235 the effect of 100 km of geographic distance is 5.84 x 10 -7 (95% credible interval = 1.50 x 10 -8 -2.98 x 10 -6 ; 236 Figure S1B). The scales at which these ratios are presented are arbitrary, but they were chosen so that the 237 range of values among populations is on the same order of magnitude: 100 km represents about one sixth of 238 the maximum pairwise geographic distance, 1 • C represents approximately one fourth of the maximum pair-239 wise temperature difference, and 10 mm precipitation represents about one fourth of the maximum pairwise 240 precipitation difference. The climatic effect sizes we found are so small that the effects of these variables 241 can be considered nonexistent in terms of their biological importance. Effects of environmental differences 242 did not emerge at smaller geographic scales in subsets of populations in the north (effect of temperature 243 differences relative to geographic distance: 5.89 x 10 -8 (8.61 x 10 -9 -1.14 x 10 -7 ), effect of precipitation 244 differences relative to geographic distance: 9.73 x 10 -6 (5.81 x 10 -7 -2.11 x 10 -5 ); Figure S2) or center (effect 245 of temperature: 2.34 x 10 -7 (1.44 x 10 -8 -4.73 x 10 -7 ), effect of precipitation: 9.46 x 10 -7 (3.06 x 10 -8 -4.80 246 x 10 -6 ); Figure S3). These conclusions are consistent with the results of partial Mantel tests, in which only 247 pairwise geographic distance is a significant predictor of pairwise F ST (Table 1).

248
Genetic structure of populations 249 The genetic structure of these populations is explained slightly better by a model of admixture between 250 two genetic groups than by a model of continuous genetic differentiation across space, as indicated by 251 the increase in predictive accuracy in models where two layers were allowed rather than one (Figure 4).

252
Northern populations primarily belong to one genetic group, while southern populations belong to another, 253 and populations from mid-latitudes are a mix of the two ( Figure 5). Allowing more than two layers did not shown). Although models with two layers did have greater predictive accuracy than those with one, when 257 K = 2 the amount of covariance contributed by the second layer was small relative to the first (Table 2).

259
Genetic diversity increases with latitude among these populations (estimate = 0.0104, SE = 0.0019, df = 30, 260 P < 0.0001 , Figure 6A), but is not related to distance from the range edge (df = 30, P = 0.811). Genetic 261 diversity appears to be lower in populations in the southern half of the range, and also in populations near 262 the eastern range edge, but is higher in central and northern populations ( Figure 6B).

264
We contrasted the relative effects of geographic vs. climatic distances on genetic differentiation in Clarkia 265 pulchella, examined whether geographic structure in this species could be described by assigning populations 266 to distinct genetic groups, and tested for geographic gradients in genetic diversity. Our analyses revealed a 267 genetic structure that is predominantly shaped by geographic distances between populations. In addition to 268 this pattern of isolation by distance, populations partition into northern and southern groups, with admixed 269 populations in the center of the range. Genetic diversity was highest in northern and central populations, 270 resulting in a trend of increasing genetic diversity with latitude. 272 At the scale of the geographic range in Clarkia pulchella, isolation by distance is the dominant pattern.

273
This likely reflects gene flow that is strongly restricted by geographic distances between populations. This 274 is perhaps not surprising, given that this species has no obvious mechanism for seed dispersal and our best 275 guess is that gene flow between populations is facilitated by occasional pollen movement by bumblebees, 276 hummingbirds, and other floral visitors. In the case of an absence of climatically structured seed and 277 pollen movement, selection against migrants and their offspring is the remaining mechanism that could drive 278 isolation by environment. While C. pulchella does appear to be locally adapted to historic climate (Bontrager 279 and Angert, in prep), selection against foreign genotypes may not be strong enough to preempt the spread 280 of neutral loci, even as recently-arrived loci that confer poor performance in a given environment are purged.

281
This could lead to a signal of isolation by distance at neutral loci, while populations are still adaptively 282 differentiated based on their local climate.

283
It is possible that the absence of an effect of temperature and precipitation differences on genetic structure 284 is the result of our experimental design, and that environmental differences might matter in other contexts.

285
There may be environmental variables other than those we have considered here that are more important 286 in determining the movement of genes or the realized rate of gene flow among populations. These could be 287 climatic, but also could include soil characteristics, or local adaptation to competitors, pollinators, or soil 288 biota. It is also possible that the effects of environmental differences are more detectable at smaller spatial  Phylogeographic research on species occupying the arid inter-mountain region is less common. In the Great 302 Basin pocketmouse, a species with a range that overlaps with that of C. pulchella, a north-south split in 303 genetic structure was detected in approximately the same location as in our results (Riddle et al., 2014). It 304 is possible that the Columbia Basin (or some geographic feature within it) represents a barrier to gene flow, 305 either past or ongoing, for a variety of taxa that occupy the dry intermountain region. The habitat affinity 306 of species can influence the effect of glaciation events on genetic structure (Massatti and Knowles, 2014), 307 therefore further work on C. pulchella, including paleoclimate modelling or modelling demographic history, 308 might allow for an interesting contrast with the relatively well-studied mesic flora of the Pacific Northwest.

309
Genetic diversity increases with latitude 310 We expected we would see lower genetic diversity at higher latitudes, but we detected the opposite: genetic  The more common expectation for geographic patterns in genetic diversity is that range edge populations 324 will have lower genetic diversity (Vucetich and Waite, 2003). This prediction is based on the assumption of 325 an abundant center distribution pattern, in which edge populations are small, and may experience frequent 326 turnover or constant directional selection (if they are far from the phenotypic optima of an extreme environ-327 ment). Our results are not consistent with this being the case for C. pulchella, at least not at all range edges. 328 We note however that populations at southern and eastern edges do appear to have lower genetic diversity 329 relative to the northern and north-central populations, and further work could be done to investigate the 330 processes that might generate this pattern.

332
Our investigation of the genetic structure of Clarkia pulchella has revealed some intuitive patterns, as well as 333 surprising ones. Despite substantial heterogeneity in climate across the species' range, genetic similarity is 334 primarily determined by geographic proximity. Though a signal of isolation by distance is not surprising in a 335 sessile organism studied at a large spatial scale, the absence of any effect of environment indicates that to the 336 extent that populations experience gene flow, it may be from both similar and divergent environments. This  Table 1 Results of partial Mantel tests of pairwise geographic distance (km), pairwise temperature differences ( • C, September-July, 1951-1980, and pairwise precipitation differences (mm, April-July, 1951-1980 Table 2 Covariance contributions of each layer in conStruct models with the number of layers (K) set to 1, 2, 3, 4, or 5.  Table S1. Background shading shows elevation. The Columbia Basin is the unsampled area west of population D11. Number of layers Relative explanatory power Figure 4 Results of 100 replicate cross-validation runs of conStruct with the number of layers set to 1, 2, 3, 4, or 5. In each replicate, the model is built using 90% of loci, and the log-likelihood of the remaining loci is calculated. Predictive accuracy is then calculated as the difference in log-likelihood between each model and the best model (i.e. the best number of layers) in each replicate. These results indicate that models constructed with two layers are best, because they provide as much explanatory power as other models without further complexity.    Expected heterozygosity appears to be higher in central and northern parts of the range, but lower in the south and east.