Higher temperatures worsen the effects of mutations on protein stability

Understanding whether and how temperature increases alter the effects of mutations on protein stability is crucial for understanding the limits to thermal adaptation by organisms. Currently, it is generally assumed that the stability effects of mutations are independent of temperature. Yet, mutations should become increasingly destabilizing as temperature rises due to the increase in the energy of atoms. Here, by performing an extensive computational analysis on the essential enzyme adenylate kinase in prokaryotes, we show, for the first time, that mutations become more destabilizing with temperature both across and within species. Consistent with these findings, we find that substitution rates of prokaryotes decrease nonlinearly with temperature. Our results suggest that life on Earth likely originated in a moderately thermophilic and thermally fluctuating environment, and indicate that global warming should decrease the per-generation rate of molecular evolution of prokaryotes.


24
Genetic mutations are the primary source of the extensive phenotypic diversity that can be observed 25 today, providing the raw material upon which selection can act. Thus, understanding the factors that 26 influence the effects of mutations, as well as the rate at which they occur, is necessary for predicting 27 the impacts of environmental change on organisms. From a thermodynamic standpoint, a given 28 nonsynonymous mutation (i.e., a mutation that alters the amino acid sequence of a protein) can be 29 classified as beneficial or detrimental based on its effect on the stability of the protein (Zeldovich where the protein population is equally divided into the folded and unfolded states. The two hypotheses make different predictions for the relationship between per-generation substitution rate and temperature. In panel C, the substitution rate has a similar shape to that of the protein stability curve. In contrast, in panel D, the substitution rate is nearly invariant at low temperatures (where mutations have a negligible effect on protein stability), but decreases nonlinearly at higher temperatures.
to be exclusively driven by the decline in protein stability at high temperatures (Puurtinen et al. (shown in purple and mustard colors; Figure 3A). The purple cluster contains structures from 136 bacteria and archaea with both short and long LID domains, but without the two extra -strands, 137 causing these ADKs to form monomers ( Figure 3B, subpanels I, II, and III). In contrast, the mustard 138 cluster contains archaeal ADKs that can form homotrimers ( Figure 3B, subpanel IV). Both clusters 139 contain ADKs from two major archaeal clades: the Euryarchaeota and the TACK superphylum. 140 This suggests that the two clusters are the outcome of a gene duplication that took place in the 141 archaeal lineage, before the split of Euryarchaeota and TACK. The large distance that separates 142 the two clusters, the divergence in secondary and quaternary structure between the two clusters, 143 as well as systematic differences in their site-specific rates of evolution (inset of Figure 3A) are all 144 consistent with this hypothesis. It is worth emphasising here that the activity of the ADK protein is 145 directly linked to organismal fitness (Couñago and Shamoo, 2005;Couñago et al., 2006). Thus, the 146 significant structural differences that can be observed between the two ADK types would be highly 147 unlikely to occur without the presence of two gene copies. One copy would remain functional, 148 whereas the other would be able to evolve under relaxed selection pressure. Given that no species 149 in our dataset expressed both ADK types (according to the UniProt database), either gene copy is also possible that the distribution of the two ADK types across species may be partly the outcome 155 of horizontal gene transfer events. In any case, we henceforth use the terms "ancestral-type" to 156 refer to ADKs from the purple cluster and "archaeal-type" to refer to those from the mustard cluster.   a purely random and gradual (Brownian motion) manner ( Figure 4B-C), likely reflecting the rapid 194 evolution of the ADK sequence (see the inset of Figure 3A). 195 We additionally checked if the temperature-dependent ΔΔ hypothesis also holds within species. nonsynonymous mutations become on average more destabilizing with temperature. Each data point represents the weighted median effect of all possible mutations for a single ADK at a temperature close to those experienced by the species. Mutations are less detrimental to archaeal-type ADKs, suggesting a potential evolutionary benefit for species whose genomes code for this ADK type. All coefficients of the model had 95%

II. Polaromonas naphthalenivorans
Highest Posterior Density (HPD) intervals that did not include zero. B-C: For both ancestral (panel B) and archaeal (panel C) ADK types, ΔΔ ∼ evolves along the species phylogeny, reflecting the evolution of species' thermal strategies (e.g., adaptation to psychrophilic temperatures). Nevertheless, the phylogenetic heritability is lower than that expected by a random walk in the parameter space (Brownian motion), possibly in part due to episodes of convergent evolution or interspecific recombination of the ADK gene (Feil et al., 1996) ln(K · t gen ) = −16.76 − 0.09 · T + 5 · 10 -4 · T 2 Figure 6. The generation time-corrected substitution rate decreases with temperature. This nonlinear monotonic relationship is consistent with our finding that mutational effects are temperature-dependent, becoming increasingly deleterious as temperature increases. Temperature explains 84% of variance on its own, whereas an additional 10% of variance is explained by a phylogenetic random effect on the intercept. detected a strong negative association between generation time-corrected substitution rates and 241 temperature ( Figure 6). In contrast, a systematic effect of cell volume on substitution rate was not 242 found, as the exponent of cell volume was statistically indistinguishable from zero. Our finding of a decline in substitution rates with temperature appears to contradict the results 293 of two previous studies by Gillooly and co-workers (Gillooly et al., 2005(Gillooly et al., , 2007. These studies found 294 that across animals, nucleotide and amino acid substitution rates increase with temperature and 295 decrease with body mass, consistent with a dependence of mutation rate on metabolic rate. The 296 discrepancy between their results and ours probably arises from two main differences in study 297 design. First, Gillooly and co-workers assumed that generation time has the same temperature and 298 body size dependence as mass-specific metabolic rate. Second, because they only examined the 299 substitution rates of animals, their datasets did not cover the entire temperature range, but mainly 300 mesophilic temperatures. 301 Overall, the present study sheds new light on the mechanism by which nonsynonymous muta-

309
Homology modelling 310 We performed homology modelling using the intensive mode of the Phyre2 server which can 311 produce a homology model based on multiple templates available from the PDB. The choice of the 312 most appropriate template(s) is made by taking into account i) the confidence of the alignment 313 between the query and template sequences, and ii) the coverage of the query sequence by the 314 template(s). Any amino acids of the query sequence that are not covered by the template(s) are 315 modelled using an ab initio approach. As the accuracy of ab initio modelling is usually lower, we 316 rejected any homology models for which more than four amino acids had to be modelled ab initio. 317 We also ensured that the sequence identity between each query sequence and its chosen 318 template(s) was sufficiently high. For this, we used Rost's empirical equation (Rost, 1999): ( 2) identical is the percentage of sequence identity and the length of the alignment. Plotting this 320 equation for different values of gives a line that partitions the parameter space into a "safe zone" 321 for homology modelling (i.e., where the resulting structural models are expected to be reliable) and  (Banks et al., 2005). The minimized protein was solvated in a periodic orthorhombic 398 simulation box with Simple Point Charge water molecules (Berendsen et al., 1981). Na + and Cl − 399 ions were added to neutralize the charge of the protein as appropriate, and to generate a NaCl salt 400 concentration of 0.15 M. The resulting system was relaxed using Desmond's standard protocol. 401 We next performed ten independent simulations (with randomised initial atomic velocities) 402 per ADK. These were performed under the NPT ensemble, i.e., with temperature and pressure 403 being maintained constant using the Nosé-Hoover chain thermostat (Martyna et al., 1992) where x, y, and z are the axes of the three-dimensional space. As the entire protein moves in 417 space during the course of a molecular dynamics simulation, an objective comparison of two 418 conformations requires finding the superposition between them that minimizes the RMSD. 419 It is worth stressing that ADK structures contain a large number of amino acids that participate 420 in neither helices nor -strands. Such amino acids are usually highly flexible and their motions 421 are largely random. Therefore, to reduce the influence of highly flexible amino acids on the 422 identification of protein conformations, we excluded such amino acids from the calculation of the 423 RMSD matrix. For this, we converted the trajectories from the Desmond format to DCD format with 424 the VMD program (Humphrey et al., 1996). VMD was also used to generate a Protein Structure File. 425 These two files were then analysed with Carma (Glykos, 2006) Estimation of the relationship between substitution rate and temperature 471 We fitted variants of Equation (1)

I. The Gibbs-Helmholtz equation
If all environmental variables other than temperature (e.g., pH, salt concentration) are kept constant, the relationship between protein stability (Δ ) and temperature ( ) for proteins unfolding according to the two-state model can be described by the Gibbs-Helmholtz equation (Privalov, 1990): Here, m (the "melting temperature", in K units) is the temperature at which Δ = 0 due to heat denaturation. At m , the protein population is equally distributed between the two states (folded and unfolded). Δ m (kcal ⋅ mol −1 ) is the difference in enthalpy between the folded and unfolded states at m . Δ p (kcal ⋅ mol −1 ⋅ K −1 ) is the difference in heat capacity between the two states, which is assumed to be independent of temperature.

II. List of analysed prokaryotic ADKs
Appendix 1-table 1. Prokaryotic ADKs that were included in this study.

Species
UniProt accession ID Thermal group

Acidothermus cellulolyticus A0LRP1 thermophile
Anoxybacillus flavithermus B7GJ88 thermophile Number of aligned amino acids % of sequence identity Appendix 1-figure 2. Percentage of sequence identity versus alignment length for all template sequences of this study. Note that many of our homology models were built using multiple templates and, therefore, multiple data points can correspond to the same homology model. Data points at 100% sequence identity indicate ADKs whose structures were available from the PDB, and for which homology modelling was done only to add any missing amino acids or not at all. interquartile ranges from each boxplot edge. Given that all structures had >94% of their amino acids in the most favoured and additional allowed regions, and at worst just over 3% of amino acids in disallowed regions, their quality can be considered sufficiently high.

IV. Species tree reconstruction
Appendix 1-  1-figure 4. The maximum likelihood phylogeny of the species in our dataset of ADKs, as inferred with IQ-TREE. Each phylum is shown in a different colour. Additionally, the phyla that belong to the TACK superphylum are explicitly marked. In contrast to the ADK gene tree, most nodes of the species tree have sufficiently high statistical support (black and grey circles).

V. Analysis of molecular dynamics trajectories
Appendix 1-figure 7. Illustration of the procedure for obtaining representative ADK conformations, based on the molecular dynamics trajectories of Francisella philomiragia. A: The RMSD matrix among all snapshots is calculated after excluding highly flexible amino acids. B: Conformations are grouped into major structural clusters, visualised here using multidimensional scaling. Note that the reported goodness of fit refers to the ability of two dimensions to properly reflect the RMSD values among data points, and is not a measure of clustering quality. C: From each cluster, an average structure is produced. The flexibility of amino acids for each conformation is shown on a colour scale that ranges from dark orange (highly flexible) to white to dark blue (least flexible). The percentages show the proportion of protein snapshots that belong to each conformation.

VII. The inter-and intraspecific relationships between median mutational effects and temperature
Appendix 1-table 4 lists the models that were fitted to characterise the relationship between ΔΔ ∼ and temperature across species. Models with at least one coefficient whose 95% HPD interval included zero are not shown. To estimate the species-specific relationships between median ΔΔ and temperature, we fitted six mathematical equations to the data (Appendix 1-figure 9). The most appropriate shape for each species (Figure 5) was determined using AICc. It is worth noting that the ADK of the LUCA had the lowest temperature sensitivity across the six species, suggesting a likely evolutionary trend towards increasing temperature sensitvity. However, given that i) the ADK gene is under strong selection and that ii) it evolves rapidly and in a convergent manner across species (Figure 3A), such a trend in temperature sensitivity could be an artefact of the ancestral reconstruction of the LUCA sequence. To understand whether this trend is real, future studies could perform a similar analysis for genes that are characterised by non-convergent and less rapid evolution.

VIII.I Estimated sequence and 3D structure
To reconstruct the ADK sequence of the LUCA, we submitted to the FASTML server (Ashke- Its homology model that we obtained via Phyre2 is shown in Appendix 1-figure 11.
Appendix 1-figure 11. The inferred ADK structure of the LUCA. Helices are shown in orange, whereas -strands are in blue. The LID domain is long, similarly to that of most ancestral-type ADKs.

VIII.II Estimation of the temperature of optimal catalysis ( opt )
In microbes, the temperature of optimal catalysis of a given enzyme ( opt ) has been found to correlate with the temperature of maximum growth rate (

IX. Estimation of cell volume from cell dimension measurements
We estimated the cell volume ( ) of spherical microbes (Appendix 1- figure 13A) as For microbes whose shape was reported as similar to a disc, a filament, or a cylinder (Appendix 1- figure 13B), we used the following equation: Last, for rod-shaped microbes (Appendix 1- figure 13C), we estimated their cell volume as the sum of the volumes of a cylinder and two half-spheres:

X. Estimation of prokaryotic generation times as a function of tem-
perature and/or cell volume

X.I Temperature
The relationship between population growth rate and temperature in ectotherms has the shape of a unimodal curve (the thermal performance curve; Angilletta 2009; Appendix 1- figure 14A). As previously mentioned, the temperature at which the curve peaks is defined as pk , whereas the population growth rate at pk is pk . pk tends to be slightly above the mean environmental temperature (Martin and Huey, 2008; Angilletta, 2009). Calculating the inverse of pk gives an estimate of the minimum generation time ( gen ) of a microbial species or strain.
As Smith et al. (2019) have shown, in prokaryotes, pk tends to increase with pk until ≈ 45°C, after which it levels off. To quantify this relationship in a continuous manner, we fitted a few alternative models to their dataset. More precisely, we specified ln( pk ) as the response variable, whereas the predictor was pk , either using a natural logarithm transformation or a square root transformation. We note that we did not use a Boltzmann-Arrhenius term, even if the Metabolic Theory of Ecology framework also predicts an exponential decrease of gen with temperature (Savage et al., 2004), due to the implicit assumption that the trait being studied (i.e., gen ) is directly governed by the influence of temperature on enzyme kinetics.
Such an assumption has been previously proven invalid (e.g., see The results are shown in Appendix 1-table 5 and Appendix 1-figure 14. Appendix 1-table 5. Alternative models for the relationship of ln( pk ) with pk across prokaryotes.
The model with the lowest DIC is highlighted. C Appendix 1-figure 14. The relationships of maximum population growth rate ( pk ) and minimum generation time ( gen ) with pk . A: The thermal performance curve of population growth rate of the archaeum Natrinema pellirubrum (Robinson et al., 2005). Gold circles stand for the experimental measurements, whereas the black line is the fit of the four-parameter variant of the four-parameter Sharpe-Schoolfield model (Schoolfield et al., 1981). pk is the temperature at which growth rate takes its maximum value, pk . The left vertical axis is in units of s −1 , whereas the right vertical axis is in myr −1 .

Model
B: In prokaryotes, pk increases with the natural logarithm of pk . This explains 13% of variance, whereas species identity leads to 92% of the variance being explained. The equation that is shown has pk in units of s −1 and pk in°C. C: gen decreases as a power law of pk .

X.II Cell volume
We further examined whether cell volume also influences the minimum generation times of prokaryotes. To this end, we estimated cell volumes for as many species in the dataset of Smith et al. (2019) as possible. We then fitted models with MCMCglmm that had ln( pk ) as the response and ln( pk ) and/or ln(cell volume) as fixed effects. Species identity was again specified as a random effect on the intercept, both with and without a phylogenetic correction. Given that the cell volume of most species was available as ranges and not as a single value, we incorporated the uncertainty in our models by fitting them 30 times. For each of the thirty runs, we set each species' volume by sampling from a uniform distribution with bounds equal to the minimum and maximum cell volume of the species. Two independent MCMC chains were run for each model, for two million generations. Samples from the posterior were obtained every thousand generations after the first two hundred thousand generations and were examined for sufficient convergence. Model selection was done according to DIC, averaged across the thirty runs.
Models with phylogenetic correction had higher DIC values than their non-phylogenetic equivalents. Across the latter, the 95% HPD interval of the coefficient of ln(cell volume) almost always included zero. This was the case both when ln(cell volume) was the only fixed effect (Appendix 1- figure 15A) and when it was accompanied by ln( pk ) (Appendix 1- figure   15B). This shows that cell volume does not systematically influence pk across prokaryotes. whereas whiskers indicate the 95% HPD interval. The estimates in panel A were obtained through a regression of ln( pk ) against the natural logarithm of cell volume. In contrast, in panel B, the regression also contained the natural logarithm of pk as another fixed effect.

XI. The relationship between substitution rate and temperature
According to the neutral theory of molecular evolution (Kimura, 1987), the rate of molecular substitution per time ( ) can be predicted by the generation time ( gen ), the mutation rate per generation ( ), and the proportion of selectively neutral mutations ( 0 ): This equation makes the assumption that selectively beneficial mutations are extremely rare and can be ignored, i.e., 1 − 0 stands for the proportion of deleterious mutations.
Temperature influences gen (Appendix 1- figure 14), but also potentially the numerator of Appendix 1-equation (4), given that mutations become more detrimental with temperature as we show in this study. Thus, 0 and possibly should decrease with temperature. In contrast, Gillooly et al. (2005,2007) have argued that, similarly to mass-specific metabolic rate, should increase with temperature and associate (positively in prokaryotes, negatively in protists and metazoans; DeLong et al. 2010) with body size (or cell volume). Such a pattern would be consistent with the "metabolic rate hypothesis" which posits that mutation rate depends on metabolic rate, due to the production of oxidative free radicals as a byproduct of metabolism. Thus, to estimate the potential effects of cell volume ( ) and temperature ( ) on the numerator of Appendix 1-equation (4), the equation can be written as Here, is a species-specific normalisation constant that is independent of cell volume and temperature, is a dimensionless exponent, whereas ( ) a temperature dependence function for which we used either a logarithmic ( ( ) = ⋅ ln( )) or a quadratic equation ( ( ) = ⋅ + ⋅ 2 ). It is worth noting that we did not model temperature as a Boltzmann-Arrhenius function (Gillooly et al., 2007) because the latter implies that substitution rates are directly governed by the effects of temperature on the activity of underlying enzymes.
This assumption has been previously shown to be invalid for metazoans (Lanfear et al.,

2007).
Appendix 1-equation (5) can be simplified by removing and/or if these are found to have no systematic effect on . To understand if and how temperature drives substitution rates, we multiplied both sides of Appendix 1-equation (5) with gen and took the natural logarithm of both sides: ln( ⋅ gen ) = ln( ) + ⋅ ln( ) + ( ).
The mean DIC values of alternative fitted models are shown in Appendix 1-table 6.
Models which included the natural logarithm of cell volume as a fixed effect are not shown, as the coefficient of cell volume always had a 95% HPD interval that included zero.