Roles of adenosine and cytosine methylation changes and genetic mutations in adaptation to different temperatures

Epigenetic modifications have been found to be involved in evolution, but the relative contributions of genetic and epigenetic variation in adaptation are unknown. Furthermore, previous studies on the role of epigenetic changes in adaptation have nearly exclusively focused on cytosine methylation in eukaryotes. We collected phenotypic, genetic, and epigenetic data from populations of the bacterium Serratia marcescens that had undergone experimental evolution in contrasting temperatures to investigate the relationship between environment, genetics, epigenetic, and phenotypic traits. The genomic distribution of methylated adenosines (m6A) pointed to their role in regulation of gene expression, while cytosine methylation (m4C) likely has a different role in S. marcescens. We found both environmentally induced and likely spontaneous methylation changes. There was very little indication that methylation changes were under genetic control. Decomposition of phenotypic variance suggested that both genetic and epigenetic changes contributed to phenotypic variance with slightly higher contribution from genetic changes. Overall, our results suggest that while genetic changes likely are responsible for the majority of adaptation, adenosine methylation changes have potential to contribute to adaptation as well.


INTRODUCTION
Epigenetics and adaptation

Introduction
The traditional view of evolution is that adaptation proceeds via DNA sequence changes. 22 However, this view has been challenged in recent years as some epigenetic changes, such as DNA methylation changes, have been found to be heritable. Epigenetic changes that 24 are inherited could potentially affect evolution (Jablonka and Raz, 2009;Day and Bonduriansky, 2011;Danchin et al., 2011;Kronholm and Collins, 2016). While convincing 26 cases of epigenetic inheritance do exist, the role of epigenetic variation in evolution has also been met with skepticism (Charlesworth et al., 2017), the main argument for 28 caution being that despite frequent observations we know very little about the relative contributions of genetic and epigenetic variation to adaptation. 30 Epigenetic variation can be divided into two groups: spontaneous epigenetic variation and induced epigenetic variation (Kronholm, 2017). Spontaneous epigenetic vari-32 ation is analogous to genetic mutations, such that epigenetic changes occur at a certain rate and are random with respect to fitness. Mutation accumulation experiments in 34 plants have shown that cytosine methylation changes do exhibit these kind of changes and they occur at much higher rates than genetic mutations (Becker et al., 2011;2011).
Inter-pulse duration (IPD) ratios were used with the PacBio RS Modification and 156 Motif Analysis protocol to detect modified bases and methylation sequence motifs.
Positions detected as modified by this protocol were labelled as either m6A, m4C or 158 "modified base" if the modification type could not be identified. The estimated fraction of modified copies (methylation fraction) was provided for m6A and m4C bases. The 160 protocol only reports bases for which IPD ratio is significantly different from 1, which makes it highly coverage-dependent for modifications with weak signal such as m4C 162 amd m5C. In order to extract the estimated IPD ratios for all bases and not only the ones detected as modified by this protocol, we also ran ipdSummary separately using 164 all the aligned subreads for each sample.
Genome annotation 166 We used a previously published and annotated genome for Serratia marcescens strain ATCC 13880 (RefSeq entry GCF 000735445.1) to annotate the chromosome sequence 168 of the reference strain used in our experiment. CDS from the RefSeq entry were aligned to the reference genome using blast (Camacho et al., 2009) and for each CDS 170 the best high-scoring segment pair was used to propagate annotation to the reference genome if its length was at least 99% of the CDS length. After annotating CDS, they 172 were assembled into operons using the Operon Mapper server (http://biocomputo.  constraints, such as codon usage and palindrome avoidance (Rocha et al., 1998). In order to test if the target oligonucleotide sequence of the adenosine methylase (5 - meaningful approach, as is done in eukaryotes for m5C (e.g. Hagmann et al. (2015)). 208 We aimed at identifying both methylated positions (m6A) and methylated regions (m4C) of interest, focusing on partially methylated m6A in GATC motifs for the former 210 and using a kernel density (KD) approach for the latter to detect regions where cytosine modification into m4C was more frequent than expected by chance.  , 2006). This competition for access to GATC motifs between Dam and regulatory proteins allows for heritable regulation of gene expression in bacteria lineages (Braaten 220 et al., 1994). To identify GATC loci which were not fully methylated in our dataset, we considered for each GATC locus the estimated fractions of modified adenosines on 222 the plus and minus strand for each strain, as reported by the PacBio pipeline. Loci for which no fraction was reported for a given strain were assigned a value of 0 (i.e. 224 completely unmethylated) for that strain. Assuming that the distribution of modified fractions on the plus and minus strand for fully methylated loci would follow a trun-226 cated bivariate distribution centered around (1, 1) (i.e. both strands fully methylated),

Determination of tetramer composition bias in Serratia
we defined the set of partially methylated GATC loci of interest for downstream analy-228 ses as the loci which presented modified fractions on the plus and minus strands which deviated from the point of full methylation at coordinates (1, 1) more than four times 230 the average quadratic distance to (1, 1), in at least one strain (Supplementary Figure   S2). We checked that low estimated values of methylated fractions were not due to 232 low coverage of the corresponding GATC loci by examining the relationship between coverage bins and estimated methylated fraction (Supplementary Figure S3).

234
Detection of clustered m4C based on kernel density estimates other, slightly less than the average spacing of observed m4C positions), we calculated a kernel density estimate with a Gaussian kernel with a standard deviation of 65 bases.

248
This observed density estimate was then compared with density estimates generated by randomly permuting the 77 478 m4C locations among the 3 059 758 cytosine locations 250 on the reference genome: a kernel density estimate was produced for each permuted dataset at the same grid points and with the same kernel parameters as for the observed 252 density estimate. We performed 100 000 permutations and identified candidate grid points for which the proportion of permuted estimates above the observed estimate 254 was < 0.001, suggesting that the frequency of m4C bases was higher than expected by chance in the genome grid cells centered at those grid points. The advantage of this 256 permutation approach is that it takes into account the local distribution of cytosine bases across the genome, since it is maintained in the permuted datasets. Consecutive 258 candidate grid cells were assembled into segments while single candidate grid cells were discarded, yielding a final number of 167 segments that were used as candidate 260 "clusters" of m4C. For each sequenced strain and each "cluster", we calculated the value of the corresponding m4C epiallele by averaging the methylated fraction of all 262 m4C observed in the cluster region. Positions which were not detected as m4C in a given strain (but were in at least another one) were assigned a methylation fraction of 264 0 for this strain before calculating the average epiallele. We note that this approach could result in underestimating the m4C fraction for cytosines with lower coverage 266 as we observed that average coverage for m4C bases with low estimated methylation fraction tended to be slightly higher than for bases with high methylation fraction 268 (Supplementary Figure S4), which is expected given the relatively weak kinetic signal produced by m4C modification. The association between epigenetic changes and genetic mutations was investigated using the methylated positions (for m6A) or regions (for m4C) of interest identified as 274 described above and the genetic mutations for which the minor allele was present in at least two of the sequenced strains (i.e. genetic mutations present in a single strain were 276 not used). For each genetic mutation, we built a quantile-quantile curve comparing a uniform distribution of p-values on [0, 1] with the distribution of observed p-values 278 from t-tests between the genetic mutation and the epigenetic changes. In order to determine if the genetic mutation under consideration was more strongly associated 280 with epigenetic changes than expected by chance, we then compared this quantilequantile curve with a set of similar quantile-quantile curves obtained from permutated 282 datasets where the clones labels were randomly shuffled. This permutation approach allowed us to account for the effect of our dataset structure (such as allele frequencies 284 and distribution of methylated levels) on the expected distribution of p-values.

Decomposition of phenotypic variance 286
To quantify the relative contributions of genetics and epigenetics to the phenotypic variance, we used random effect models in which the variance component was split into genetic, epigenetic and residual variance (Thomson et al., 2018). The model we used for each trait was: where y is a column vector containing the trait values for each strain centered to a mean of 0 and scaled to a standard deviation of 1, MVN is the multivariate normal 288 distribution parameterized by a vector of means and a variance-covariance matrix, 0 is a column vector of zeros, σ 2 gen , σ 2 epi and σ 2 res are the genetic, epigenetic and residual 290 variances for the trait, G and E are the matrices of genetic and epigenetic similarity between strains, respectively, and I is the identity matrix. The genetic similarity matrix 292 G was built from the 54 variable genetic loci observed in our dataset and the similarity between any two strains was calculated as the proportion of those loci for which the 294 strains shared identical alleles. The epigenetic similarity matrix E was built from the methylation fractions for the partially methylated m6A epiloci in GATC motifs 296 identified previously. First, an Euclidean distance matrix with elements d ij was built from the methylation fraction data. The similarity measure between any two clones 298 was then calculated as 1 − (d ij / (2D)), where D was the average distance between the reference strain and all the evolved strains. This is conceptually equivalent to 300 measuring a phylogenetic similarity between two species as the shared evolutionary distance from the root of the phylogeny to their last common ancestor divided by the 302 average distance from the root of the phylogeny to all observed species.
We run the models using the MCMCglmm package in R (Hadfield, 2010). We used 304 an inverse-Gamma prior for each variance component, with shape 0.5 and scale 0.5 (corresponding to V = 1 and nu = 1 in MCMCglmm parameterization). Four chains were run for 10 000 iterations, of which the first half was discarded as burn-in, with a thinning of 10. Total phenotypic variance was calculated as σ 2 tot = σ 2 gen + σ 2 epi + σ 2 res , 308 and r 2 gen , r 2 epi and r 2 res were calculated as σ 2 gen /σ 2 tot , σ 2 epi /σ 2 tot and σ 2 res /σ 2 tot , respectively.
Calculation of partial variance components. In our experiment, epigenetic similarities could be due to genetic control, and both epigenetic and genetic similarities could be due to the evolutionary treatment. To disentangle the joint and disjoint effects of treatment, genetics and epigenetics on phenotypic variance, we fitted seven mixed models similar to the one described above, built from all possible combinations of the treatment, genetic and epigenetic variance components. The fullest model was: where σ 2 evo is the treatment variance for the trait and T is the design matrix describing the evolutionary treatment corresponding to each clone. The six other models were: For each of those models, we calculated the proportion of the phenotypic variance for each phenotypic trait: r 2 evo,gen,epi , r 2 gen,epi , r 2 evo,epi , r 2 evo,gen , r 2 epi , r 2 gen and r 2 evo . We then calculated the joint and disjoint proportions explained by treatment, genetics and 314 epigenetics using a variation partitioning approach (Legendre and Legendre, 1998) as depicted in the following representation: gen,epi (f + g) = r 2 gen + r 2 evo − r 2 evo,gen (e + g) = r 2 epi + r 2 evo − r 2 evo,epi g = (d + g) + (f + g) + (e + g) − r 2 evo − r 2 gen − r 2 epi + r 2 evo,gen,epi Taking into account the effect of genetic loci associated with epigenetic changes. After having identified the genetic loci potentially associated with epigenetic changes, we re-ran the same phenotypic variance decomposition as presented above but adding in Equations 1, 2 and 3 the fixed effect of one genetic locus (or haplotype) at a time and removing the locus (or haplotype) from the calculation of the genetic similarity matrix. For example, when testing for the effect of haplotype a on phenotypic variance decomposition, Equation 1 would become: where G a is the genetic similarity matrix calculated after removing the haplotype a 318 from the table of genetic variants. Equations 2 and 3 would be modified similarly. The obtained variance decomposition is thus providing the contribution of evolutionary 320 treatment, genetic and epigenetic variations to phenotypic variance after removing the effect of haplotype a on the phenotypes. In this study, we were interested in the overall relationships between evolutionary treatment, genetic, epigenetic and phenotypic datasets to understand how they glob- If strains which are close based on one distance measure also tend to be close based on 332 another one, then it suggests a dependence between the underlying sets of measurements used to calculate those distances (Mantel and Valand, 1970 Table S1). This suggests that the ancestor clone used to initiate the replicated populations in the evolution experiment, and which was itself derived from 350 the reference strain sequenced here, actually contained at least two lineages which were preserved in some populations at 38 • C but in no other evolutionary treatment. We  Table S1). Remarkably, those three 360 genes are all involved in some steps of the biosynthesis of lipopolysaccharide. In the evolution experiment population sizes were on the order of 8 × 10 6 cells per 400 µL cul-362 ture well at plateau, and the fraction transferred to the next generation was 1/10 of the previous culture. This yields a large effective population size N e = 2.6 × 10 6 during the 364 experiment (N e = N 0 g, where N 0 = 8 × 10 5 cells is the bottleneck size and g = 3.3 is

Methylated bases and methylation motifs
The role of methylation in bacteria is still being actively investigated, so we first exam-  To investigate the potential link between adenosine methylation and regulation of gene expression, we searched for evidence of differential usage bias of the 5 -GATC-3 target The 5 -GATC-3 tetramer was more abundant in genes and rarer in promoters than 426 expected by chance, even when taking into account biases in dimer and trimer usage ( Figure 2). . Right side panel, tetramer usage bias taking into account the observed frequencies of dimers and trimers (Markov chain approach). A positive usage bias means that a given tetramer is observed more frequently than expected by chance. The distance between an observation and its projection on the identity line (shown for palindromic tetramers) shows how imbalanced is the usage bias between promoters and genes: observations below the identity line represents tetramers which are rare in promoters compared to genes, and vice-versa.

Genomic methylation profiles
We investigated the profiles of m4C and m6A methylated fractions at the boundaries  As mentioned above, the vast majority of adenosines present in GATC motifs were detected methylated in all sequenced strains (99.7 %). Moreover, these methylated 458 bases had consistently high methylation fractions, with almost all GATC adenosines being close to full methylation. However, we identified 907 GATC adenosines not fully 460 methylated ("low-meth m6A") using our filtering criteria, i.e. 1.2 % of adenosines in GATC, which were located in 458 distinct GATC palindromes. Based on the genomic 462 distribution of all GATC motifs, these low-meth m6A were more frequent than expected in promoter regions (χ 2 = 530.49, df = 1, p < 0.001) and rarer than expected in detected as m4C in at least one sequenced strain, compared to a genome-wide average of 2.5 %. Those m4C segments were not evenly distributed between operon and 472 promoter regions, even when taking into account GC-content genomic distribution: like low-meth m6A positions, m4C segments were more frequent than expected in pro-474 moter regions (χ 2 = 16.5, df = 1, p < 0.001) and rarer than expected in operon regions (χ 2 = 390.7, df = 1, p < 0.001).

476
Association between genetic mutations and epigenetic changes. We found tentative evidence of association between genetic mutations and m6A methylation  and were less variable inside operons compared to outside (p-value < 0.001).

500
In order to determine if methylation changes happened during the evolution experiment, we investigated the methylation changes for the m6A and m4C epiloci of interest 502 across strains by classifying those epiloci into low variability and high variability sets.
The high variability set was further divided into loci with continuous and discontinous

Decomposition of phenotypic variance
To evaluate how much genetic and epigenetic changes contributed to phenotypic variation, we used a random effect model incorporating similarity matrices based on geno-532 types and epigenotypes. Our approach is conceptually equivalent to estimating a genetic heritability and an epigenetic heritability for each phenotypic trait. Comparing 534 the phenotypic variance decomposition between a purely random-effect model (i.e. including haplotype a in the genetic similarity matrix, Figure 6) and a model including  Figure S11), which is consistent with the fact that it was associated 598 with m4C changes but not m6A changes.   Epigenetic changes did not appear to be under genetic control, as only a small proportion of the methylation variation was associated with genetic mutations. While 624 our statistical power to detect association is limited with our data since there is no segregation among the bacterial clones, extensive genetic control would require assum-for genetic mutations to explain the observed epigenetic variation. In the cases where we observed an association between a mutation and epigenetic changes, a single mutation or haplotype was indeed tentatively associated with multiple epigenetic changes.

630
Moreover, there was no relationship between distance to the mutation and the epigenetic changes, indicating that genetic control over long distance is plausible, mediated 632 perhaps by indirect effects of the mutation. However, considering that we cannot distinguish between a mutation inducing a methylation change and a methylation change 634 hitchhiking with an adaptive genetic mutation, it does not seem plausible that even a majority of the observed methylation changes were induced by genetic mutations.

636
Furthermore, in the case of m6A, association was only observed with haplotype a and could be therefore be due to the shared history of those epiloci with haplotype a prior 638 to the initiation of the evolution experiment, if we assume that haplotype a was part of some unexpected standing genetic variation in the ancestor culture at the time of Overall, those results suggests that evolving at lower average temperature imposed 706 stronger selective constraints on the genetic variants in our experimental organism while higher temperature allowed for more diverse genetic trajectories, and conversely 708 that epigenetics were more important for differentiation between the constant and fluctuating conditions at the same average temperature.
is important to note that our experimental design and protocol which included growing 712 the evolved clones in common conditions for a few generations before DNA extraction for sequencing precludes detecting rapidly reset epigenetic changes.

714
In conclusion, we have shown that substantial epigenetic variation in adenosine methylation exists in Serratia marcescens and that some of this variation is likely to 716 have functional consequences and to be adaptive. The role of epigenetic changes may be mediation of initial plastic responses, or just generation of variation as a bet-hedging 718 strategy. Furthermore, at least part of the variation in methylation is likely to be neutral. Genetic variants from unexpected pre-existing standing genetic variation in 720 our experiment seem to be responsible for the majority of divergent adaptation between the 38 • C treatment and the others, but the large shared contribution of genetic and 722 epigenetic variation to phenotype even when taking into account haplotype a suggests that genetics and epigenetics both exert a strong control on bacterial traits.   Supplementary Figure S4: Relationship between coverage and estimated fraction of methylated cytosines (m4C). All cytosines detected as m4C, from all sequenced strains, are included. Data is binned by intervals of methylated fraction of 0.1 width, as indicated on the x axis. There is a correlation between coverage and estimated methylated fraction (Spearman's ρ = -0.17, p < 0.001), indicating that higher coverage is needed to estimate low methylation fractions. However, 90 % of detected m4C have a methylation fraction > 0.4, suggesting that the bias due to this coverage effect is likely to be small.    Vir. cold    Vir. cold  Supplementary Figure S12: Correlogram for distance measures based on treatment, genetic, epigenetic and phenotypic data. EVO, GEN, EPI, PHE: distance measures grouped in evolutionary treatment, genetic, epigenetic and phenotypic-based measures, respectively. In GEN: non-syn., indels resulting in frame shift and non-synonymous SNPs; syn., all other indels and SNPs. In EPI: m6A, all m6A fractions; GATC m6A, m6A fractions in GATC motifs; low meth. m6A, m6A fractions for GATC loci detected as partially methylated (see Methods); m4C, all m4C fractions; clusters m4C, m4C epialleles calculated for m4C clusters based on the kernel density permutation approach. In PHE: phage activ., rate of activation of prophage KSP20; temp.-related, temperaturerelated traits from Ketola et al. (2013); coselected, coselected traits from Ketola et al.  Figure S13: Correlogram (values) for distance measures based on treatment, genetic, epigenetic and phenotypic data. In each cell, the top number is Mantel's R and the bottom number is the corresponding p-value. Cells with content in bold have p < 0.05. EVO, GEN, EPI, PHE: distance measures grouped in evolutionary treatment, genetic, epigenetic and phenotypic-based measures, respectively. In GEN: non-syn., indels resulting in frame shift and non-synonymous SNPs; syn., all other indels and SNPs. In EPI: m6A, all m6A fractions; GATC m6A, m6A fractions in GATC motifs; low meth. m6A, m6A fractions for GATC loci detected as partially methylated (see Methods); m4C, all m4C fractions; clusters m4C, m4C epialleles calculated for m4C clusters based on the kernel density permutation approach. In PHE: phage activ., rate of activation of prophage KSP20; temp.-related, temperature-related traits from Ketola et al. (2013); coselected, coselected traits from Ketola et al. (2013).