Genome-wide sampling suggests island divergence accompanied by cryptic epigenetic plasticity in Canada lynx

Determining the molecular signatures of adaptive differentiation is a fundamental component of evolutionary biology. A key challenge remains for identifying such signatures in wild organisms, particularly between populations of highly mobile species that undergo substantial gene flow. The Canada lynx (Lynx canadensis) is one species where mainland populations appear largely undifferentiated at traditional genetic markers, despite inhabiting diverse environments and displaying phenotypic variation. Here, we used high-throughput sequencing to investigate both neutral genetic structure and epigenetic differentiation across the distributional range of Canada lynx. Using a customized bioinformatics pipeline we scored both neutral SNPs and methylated nucleotides across the lynx genome. Newfoundland lynx were identified as the most differentiated population at neutral genetic markers, with diffusion approximations of allele frequencies indicating that divergence from the panmictic mainland occurred at the end of the last glaciation, with minimal contemporary admixture. In contrast, epigenetic structure revealed hidden levels of differentiation across the range coincident with environmental determinants including winter conditions, particularly in the peripheral Newfoundland and Alaskan populations. Several biological pathways related to morphology were differentially methylated between populations, with Newfoundland being disproportionately methylated for genes that could explain the observed island dwarfism. Our results indicate that epigenetic modifications, specifically DNA methylation, are powerful markers to investigate population differentiation and functional plasticity in wild and non-model systems. SIGNIFICANCE Populations experiencing high rates of gene flow often appear undifferentiated at neutral genetic markers, despite often extensive environmental and phenotypic variation. We examined genome-wide genetic differentiation and DNA methylation between three interconnected regions and one insular population of Canada lynx (Lynx canadensis) to determine if epigenetic modifications characterized climatic associations and functional molecular plasticity. Demographic approximations indicated divergence of Newfoundland during the last glaciation, while cryptic epigenetic structure identified putatively functional differentiation that might explain island dwarfism. Our study suggests that DNA methylation is a useful marker for differentiating wild populations, particularly when faced with functional plasticity and low genetic differentiation.


Background 27
Investigations into divergent selection have often relied on quantifying phenotypic variation and 28 the heritability of such traits, the latter assumed to be genetic polymorphisms transmitted via an 29 organism's DNA 1 . However, detecting adaptive divergence in species that experience high rates 30 of gene flow is challenging due to the homogenization of genomic regions that are neutral or 31 under weak selection 2 . Environmental conditions can be a powerful driver of adaptive 32 divergence, where relationships are traditionally ascertained by correlating allele frequencies to 33 environmental variation 3 . An aspect of adaptive molecular evolution that is undetected by 34 standard genetic sequencing involves direct modifications to the structure of DNA. Epigenetic 35 modifications like DNA methylation are influenced by environmental conditions, directly affect 36 gene expression, and may be indicative of early adaptive divergence due to local adaptation 4-6 . 37 DNA methylation could play a role in local adaptation due to its regulatory role in transcription by 38 modifying chromatin structure, repressing transcription factors, or recruiting protein complexes 39 that block transcriptional machinery, especially around CpG islands 7-9 . CpG islands are dense 40 clusters of cytosine-guanine dinucleotides and frequently occur near the transcription start site 41 of genes and have functional role with gene expression 10-12 . Consequently, DNA methylation, 42 especially around CpG islands, could explain the molecular basis of local adaptation due to its 43 regulatory function on gene expression, and might be an overlooked molecular marker of 44 adaptive divergence and rapid evolutionary adaptation. 45 allele frequencies are correlated to climatic gradients in both population time-series and fine-52 scale genetic analyses 13,15,16 , suggesting climate might also influence patterns of methylation in 53 lynx. Second, Canada lynx show a subtle cline in body size with larger individuals in Alaska 17 to 54 smaller individuals in insular populations including Newfoundland and Cape Breton Island 18 . 55 Body size changes in island populations appear to be consistent with the 'Island Rule' 18 , where 56 insular mammals are smaller in size compared to their mainland counterparts [19][20][21][22] . If functional 57 genes related to body size are repressed in geographically isolated populations, then epigenetic 58 modifications might be an underlying mechanism driving population divergence 23 . Based on 59 these two mechanisms (climate and island isolation), we tested two predictions: i) climatic 60 conditions are the driving force behind spatial epigenetic structuring 24 ; and ii) patterns of DNA 61 methylation over CpG islands and gene bodies are more correlated with biogeographical 62 substructuring than methylation patterns over DNA of unknown function, due to the regulation of 63 transcription arising from adaptation to local conditions 25 . 64

Results 65
We identified differential methylation at base-pair resolution across the genome in 95 Canada 66 lynx epidermal tissue samples from four populations (n = 23-24 per sampling area) across North 67 America, including one insular population in Newfoundland (Figure 1; Supplemental Table S4). 68 The sampled populations have a wide geographical spread, with an average minimum distance 69 between populations (Québec and Newfoundland) of 1,158 km and a maximum distance 70 (Alaska and Newfoundland) of 5,520 km. Habitats around these populations present a dynamic 71 range of environmental conditions, ranging from 32 to 432 mm of winter precipitation and a 72 mean annual temperature range 26 of -6.3 to 4.7° C. To determine associations between these 73 environmental conditions and genome-wide patterns of DNA methylation, we created a reduced 74 representation bisulfite sequencing (RRBS) library (full protocol in Supplemental). Paired-end 75 sequencing on a HiSeq2500 generated a total of 210,773,612 filtered and demultiplexed reads 76 that were aligned to the domestic cat genome (85.0% average mapping success; Supplemental 77 Table S4; Felis catus; NCBI felcat9.0) and variants were called using specially designed 78 software for bisulfite converted reads 27 . We mapped our reads to the human (Homo sapiens) 79 and lambda phage genomes to rule out contamination (2.7% and <0.1% success, respectively), 80 and assessed bisulfite conversion efficiency by including non-methylated lambda phage DNA in 81 the sequencing lane. Furthermore, we quantified the temporal effects of methylation, the 82 ramifications of missing data, and model parameterization with sensitivity analyses, with no 83 implications on overall inferences (Supplemental Figures S5-S8). 84

Pronounced population structure revealed by methylation patterns 85
To examine regions with putative regulatory function, CpG islands were first bioinformatically 86 identified de novo (N = 28,127) using hidden Markov models and contained an average GC 87 content of 59.2% with a posterior probability of observed-to-expected GC content (CpGo/e) of 88 1.13. Our DNA methylation analysis was broken into two subsets based on proximity to genomic 89 features, with identical filtering parameters for each subset to maximize comparative inferences.  Table S6). Qualitatively, population structure between geographically 93 peripheral populations was most pronounced in DNA methylation patterns over CpG islands and 94 gene bodies compared to methylation patterns over regions of unknown function. The first two 95 axes of a principal coordinates analysis (PCoA) on a Euclidean distance matrix summarizing 96 CpG island and gene body methylation (PCoA1 = 6.2%, PCoA2 = 5.0% variation explained) 97 distinctly cluster Alaska from the remaining mainland populations, while the mid-continental 98 populations show no structure ( Figure 2). Similar, but less distinct patterns were seen in the 99 ordination of methylation patterns over unannotated regions of the genome, which showed more 100 distinct structure between the Québec and Manitoba populations ( Figure 2). Analogous 101 population-level trends between the datasets were further confirmed with Pearson and 102 Spearman's rank coefficients of the first PCoA axis (r = 0.91; ρ = 0.88). 103

Differential methylation over genes related to morphology 120
We identified differential methylation directly over CpG islands and gene bodies using beta 121 regressions (n = 329), with population as an explanatory variable (i.e. windows with > 10% 122 overall difference in methylation and p-value < 0.001). Of the 16 gene regions identified, 11 123 were significantly differentiated in the Newfoundland population (Table 1). All differentially 124 methylated regions with associated morphological function were hypermethylated, as well as 125 genes with direct epigenetic function with morphological interactions (i.e. transcription 126 regulation, DNA-binding transcription factors; Table 1). The 11 differentially methylated regions 127 in the insular population have a difference in methylation of 14%-40% compared to mainland 128 populations, which suggests that morphological divergence in the insular population might be 129 regulated by repressed gene expression due to DNA methylation. Additionally, we identified 130 three differentially methylated regions in the Alaskan population related to spectrin, 131 carbohydrate, and ATP binding, suggesting differential expression of genes with metabolic 132 function. 133

Genetic structure driven by insular and climatic divides 134
We examined the relationship between neutral SNPs, environmental variation, geographic 135 Newfoundland. Despite the lack of genetic structure between mainland populations 29 , we see 170 distinct patterns of DNA methylation that differentiate Alaska from mid-and east-continental 171 populations. Furthermore, we identified variation that is associated by trends in winter 172 precipitation and temperature that is not accounted for by geographic distance. 173 The differentiated DNA methylation patterns in the Newfoundland population suggest a 174 molecular pathway for morphological differences between mainland and island populations, as 175 predicted by the Island Rule. Evidence for this adaptive divergence was observed in disparities 176 between methylation in CpG islands/gene bodies and DNA of unknown function and specific 177 genes. We identified specific genes (ZEB1 and HDAC9) that may underlie differences in body 178 size between mainland and Newfoundland lynx. The hypermethylated HDAC9 gene is of  Table S1). An individual sample of completely non-224 methylated lambda phage genomic DNA (200 ng; Sigma-Aldrich -D3654) with a unique 225 barcode was included to assess bisulfite conversion efficiency (Supplemental Table S7). 226 Barcoded samples were then combined into eight pools to ensure consistent reaction 227 environments for the entire library using a QIAquick PCR Purification Kit (Qiagen, Valencia, CA, 228 USA) following manufacturer's instructions and 10 μL of 3M NaAc was added to neutralize pH. 229 We then performed a SPRI size selection on each pool (0.8x volume ratio) with Agencourt®

Bioinformaticsquality checks and SNP calling 245
We assessed sequencing success as well as removed adapter and low-quality reads via 246 FastQC 42 and Cutadapt 43 implemented in TrimGalore! v0.4.4 44 . Individuals were demultiplexed 247 using python scripts 41 . Paired-ends reads for Canada lynx samples were initially aligned to several genomes (Felis catus, Homo sapiens, Lambda Phage) using Bismark 27 to assess 249 mapping efficiency and contamination (Supplemental Table S4). Paired-end reads for 250 downstream analyses were aligned to the domestic cat genome with Bismark, using relaxed 251 mapping parameters (score_min L,0,-0.6) 27 . 252 SNPs were called from indexed BAM files with CGmapTools 28 using a coupled Bayesian 253 wildcard algorithm with a conservative 0.01 error rate and a static 0.001 p-value for calling 254 variant sites, which generated variant call files (VCFs). VCF files were indexed, merged, and 255 filtered using VCFtools 45 for bi-allelic loci with a sequencing depth of at least five, and were 256 shared between at least 50% of the individuals (max-missing 0.5) and a minor allele frequency 257 (maf) of > 0.001. Sites were further filtered by removing any variants out of Hardy-Weinberg 258 equilibrium within populations (p-value < 0.05). A pair-wise Euclidean dissimilarity (distance) 259 matrix was computed on the SNP data using the function daisy within the package cluster 46 260 using R v3.4.2 47 . This dissimilarity matrix was then summarized in a principal coordinates 261 analysis (PCoA) using the dudi.pco function in adegenet 48 . Missing SNP data was imputed by 262 mean allele at a population level. Pairwise FST was calculated using StAMPP (Supplemental 263 Figure S2) and relative FST was measured using a null model Bayesian approach implemented 264 in BayeScan v2.0 30 (Supplemental Figure S2). AMOVA between populations was computed 265 using the poppr.amova function within poppr 49 , and heterozygosity was assessed with 266 adegenet. 267

Bioinformatics -DNA methylation 268
We identified methylated and non-methylated positions by first filtering BAM files for incomplete 269 bisulfite conversion based on reads containing more than three methylated positions in a CHH 270 or CHG context. In the remaining reads, methylated positions in a CpG context with a 271 sequencing depth of at least five were extracted with Bismark 27 , truncating the last two bases of 272 the forward mate-paired reads (R1) and the first two bases of the reverse mate-paired (R2) reads. Methylation polymorphisms in areas of overlap between read pairs were extracted only 274 once. After confirming that our non-directional library contained roughly equal reads for all 275 possible amplified DNA strands, we proceeded to analysis. 276 We first generated a custom CpG island annotation track using hidden Markov models 277 based on CpGo/e implemented in makeCGI 50 . Only islands with a calculated posterior probability 278 greater than 99.5% were retained for analysis, based on CpGo/e. Mapped and extracted 279 methylated sites were then imported into Seqmonk 51 using the generic text importer, and raw 280 data was qualitatively visualized against the annotated domestic cat genome (felCat9.0). We 281 analyzed DNA methylation over CpG islands and gene bodies by creating 5,000-bp running 282 windows directly over and 25,000-bp upstream of gene bodies, combined with windows directly 283 over CpG islands. Each window was assigned a methylated percentage score based on the 284 overall ratio of methylated to non-methylated bases within the feature. We filtered this window-285 set for regions that had at least one CpG and equal representation from each population. This 286 process was repeated for our second subset of analyses that investigated methylation patterns 287 of unannotated regions of the genome. For this analysis, we created 5,000-bp running windows 288 across the entire genome and removed any windows that overlapped with the initial windows 289 over CpG islands and gene bodies by more than 1%. Data was filtered, exported, and 290 summarized in the same way as the first analysis. 291

Sensitivity analyses 292
To determine the implications of missing data and PCoA axis retention thresholds, we 293  Table S8). Overall trends in explanatory effects (adjusted R 2 ) and qualitative 297 inferences (PCoA clustering) were investigated and no change in inferences were determined.
We examined the implications of arbitrary PCoA axis retention by repeating all analyses, but 299 instead using different cumulative variation explained thresholds as response variables. We 300 performed a number of db-RDAs using all axes explaining 30%, 50%, 75%, and 95% 301 cumulative variation as response variables, and results were qualitatively similar regardless of 302 axis retention (Supplemental Figure S8, Table S10). 303

Quantifying environmental associations 304
To determine if patterns of DNA methylation and genomic variability could be explained by 305 macro-scale climatic conditions, geographic distance, or insular divergence, we performed a 306 distanced-based redundancy analysis on the summarized SNP and DNA methylation data. 307 Meaningful axes explaining > 30% of the cumulative variation in the data were used as 308 response variables in a distance-based redundancy analysis (db-RDA) conducted in vegan 52 , 309 using variables that putatively describe the environmental determinants of population structure 310 in Canada lynx. Our covariates included a binary variable of insularity, which identified the 311 Newfoundland population against mainland populations and was used to describe the largely 312 impassable barrier of the Strait of Belle Isle between Newfoundland and mainland Labrador 53 . A 313 variable of geographic distance was included which was simply the first axis of a principal 314 coordinates analysis (PCoA) on a Euclidean distance matrix of latitude and longitude (PCo1 = 315 99.7% of the variation). In addition to the geographic variables, we included a biotic variable of 316 percent tree cover 54 , a randomly generated numerical variable to assess the effect of noise, and 317 a climate variable. For the climate variable, we performed a PCA on climate data to prevent 318 multi-collinearity, which reduced annual temperature ranges, winter precipitation, and minimum 319 coldest temperature to a single PCA axis 26 (PC1 = 85.6% of the variation). Linearity was 320 confirmed between response and explanatory variables, and multi-collinearity between 321 explanatory variables was assessed using the VIF and any variables > 4 were removed 322 (Supplemental Table S12). Step-wise model selection using the function ordistep within vegan 52 323 was performed to isolate the best overall model using a QR decomposition technique based on 324 p-values (Supplemental Table S11). To isolate the individual explanatory power of each 325 variable, we performed partial distanced-based redundancy analyses (p-dbRDAs) on the 326 variables that were identified as significant in the full db-RDA. 327

Differentially methylated regions and gene ontology 328
We identified differentially methylated regions with functional biological correlates by performing 329 beta-regressions on windows over CpG islands and gene bodies for all 95 individuals with 330 percent methylation as the response variable and population as the explanatory variable. Beta 331 regressions are appropriate for proportion or percentage data 55 , and we set an alpha threshold 332 at conservative levels seen in similar studies 34 (p-value < 0.001). Overall differences in percent 333 methylation were calculated, and genes were identified as either hypermethylated or 334 hypomethylated based on their relative degree of methylation compared to other populations. 335 Direct overlap between our differentially methylated regions and the felcat9.0 gene annotations 336 were extracted using Seqmonk 51 . We then identified functional associations by searching 337 UniProt (https://www.uniprot.org) by gene name against the database of genes in the domestic 338 cat genome using the search term "organism: 'Felis catus (Cat) (Felis silvestris catus) (9685)'". 339

Acknowledgements 340
The authors thank Niels C.A.M. Wagemaker for his assistance with library preparation and his 341 generous donation of methylated adapters. We would also like to thank Felix Krueger and 342 Sibelle Torres Vilaça for their assistance with bioinformatics. We would like to thank Joe M.  Boudreau for their invaluable statistical advice and friendly manuscript reviews. We thank the 345 Natural Resources DNA Profiling and Forensic Centre (NRDPFC) for their assistance with 346 extractions and laboratory equipment, and the Centre for Applied Genomics with sequencing 347 Table 1. List of differentially methylated regions that have direct overlap with annotated genes. Differentially methylated regions were identified using beta regressions with a pvalue < 0.001 and when methylation levels differed by more than 10% between the identified population and the others. Hyper-and hypo-methylation is relative to the other three populations. GO annotated functions were retrieved from UniProt (www.uniprot.org).  . Distance-based redundancy analysis (db-RDA) on DNA methylation data over CpG islands and gene bodies, with population delineated by colour. The axes of a principal coordinates analysis (PCoA) were used a response variable to determine biogeographical relationships. The explanatory variables included a distance variable (the first axis of a PCoA on latitude and longitude); insularity (a binary variable distinguishing the Newfoundland island population); and climate (the first axis of a PCA summarizing winter precipitation, annual temperature ranges, and coldest minimum temperature).

Figure 2.
Principal coordinates analysis (PCoA) plots of variation across three molecular marker datasets, with individuals as single circles and populations delineated by colour. All molecular data was summarized with a pair-wise Euclidean dissimilarity matrix. Methylation was summarized with 5,000-bp running windows over CpG islands and gene bodies (n = 329) and over unannotated regions (n = 376). SNP variants were called from bisulfite converted reads and reflect unstructured mainland populations (n = 496). . Visual depiction of partial distance-based redundancy analyses (p-db-RDAs). The effect sizes indicate the independent explanatory effects of each variable on explaining methylation patterns, subtracted from the effect of any other variable. The effect size is adjusted R 2 , (adj. R 2 ) and the test-statistic is a pseudo-F generated using QR decomposition within vegan.