The ups and downs of amino acid co-evolution: evolutionary Stokes and anti-Stokes shifts

The most fundamental form of epistasis occurs between residues within a protein. Epistatic interactions can have significant consequences for evolutionary dynamics. For example, a substitution to a deleterious amino acid may be compensated for by replacements at other sites which increase its propensity (a function of its average fitness) over time - this is the evolutionary Stokes shift. We discovered that an opposite trend -the decrease in amino acid propensity with time-can also occur via the same epistatic dynamics. We define this novel and pervasive phenomenon as the evolutionary anti-Stokes shift. Our extensive simulations of three natural proteins show that evolutionary Stokes and anti-Stokes shifts occur with similar frequencies and magnitudes across the protein. This high-lights that decreasing amino acid propensities, on their own, are not conclusive evidence of adaptive responses to a changing environment. We find that stabilizing substitutions are often permissive (i.e., expand potential evolutionary paths) whereas destabilizing substitutions are restrictive. We show how these dynamics explain the variations in amino acid propensities associated with both evolutionary shifts in propensities.

constraints on proteins.
where N e is the effective population size and π   Ashenberg et al., 2013), the propensity for certain amino acids relative to other amino acids changes 105 over time. In natural proteins, these variations may be due to global constraints on protein stability, similar frequencies? To address this, we developed four metrics for quantifying these two phenomena.  (Table S1). The consistency in P anti−Stokes suggests that protein structures and 154 mutation biases are not major determinants of evolutionary shifts in propensities. The proportions of 155 P anti−Stokes ranged from 0.46 to 0.57 across metrics M1-3 ( Figure 3A). However, P anti−Stokes estimated 156 based on metric M4 were significantly less than 0.5 which is expected because of the more conserva-suggesting that both phenomena occur with comparable frequencies ( Figure 3B It has long been observed that a site's location in the protein influences its evolutionary dynamics. For In the absence of selection, all mutations are neutral and are fixed (or lost) by the action of genetic 231 drift, resulting in propensities that vary randomly over time (Wright, 1929). In contrast, our simula- for the lack of fit (excess of high p-values), is if propensities changes were autocorrelated. Indeed, we 246 observed a substantial negative autocorrelation in the differences in π h a (s x ) and π h a (s x+1 ) (table S4), 247 implying that an increase in propensity tends to be followed by a decrease (and vice versa). This is 248 perhaps not surprising since if the resident amino acid propensity decreases, then the site will either 249 substitute away from the current amino acid or replacements will occur in other parts of the proteins 250 increasing the propensity for that amino acid. Alternatively, as the propensity for a resident amino 251 acid increases, there will be fewer ways for it to increase further than for it to decrease (for example, 252 consider the dynamics when propensity is equal to one).

253
Lastly, we were interested in assessing whether random fluctuations in propensities could result 254 in P Stokes and P anti−Stokes comparable to those observed in our simulations. To do this, we simulated 255 500 bounded random walks (between 0 and 1) of amino acid propensities with step sizes drawn from a 256 normal distribution with mean (µ=0) and standard deviation (σ= 0.1) estimated from the step sizes so that the sequence changes from s x → s x+1 , the fitness and propensity landscapes at most other 279 sites in the protein will subsequently change. In figure 4A, the grey dots represent the change in the 280 propensity of the resident amino acid at each site following a substitution (∆π h a = π h a (s x+1 ) − π h a (s x )).

281
The red dots represent the change in the propensity of the resident amino acid at the substitution 282 site, and therefore a change in the resident amino acid from . We whereas senescence is a consequence of an adaptive response to some change in the protein's external 398 environment.

399
Alternatively, propensity shifts may be viewed as dynamics that arise due to the protein adapting 400 to internal, rather than external, changes. In this sense, neighbouring sites may "compensate for" 401 or "adapt to" a deleterious substitution that occurred at an interacting site. However, our results product, and therefore all synonymous codons have the same fitness. We assumed a fixed N e = 100. 451 We initiated each simulation at a randomly generated amino acid sequence. Then, we used where N = j − i.
where π h a (s) is the propensity of amino acid a at site h given background sequence s. The entropy is 503 maximized when all amino acids are equally likely, and is minimized (= 0) when only a single amino 504 acid is observed. To determine how the landscapes change in response to changes in the background 505 protein sequence, we compared the entropy before and after the substitution We classified a substitution as permissive if the average ∆H across all sites was positive, and restrictive 507 if the average ∆H was negative.

508
For all results described in this study, we only considered the dynamics when a residue was 509 accepted and subsequently replaced within the time-frame of the simulation. However, we repeated 510 the analyses with the inclusion of partial windows (where for example an amino acid is accepted 511 during the simulation but the simulation ends prior to its replacement) which revealed similar results 512 with respect to the proportion of evolutionary Stokes and anti-Stokes shifts (figure S12).

513
The rate of amino acid replacement values, are more likely to occupy the site (have high π h a ), and will have a low rate of being replaced.

517
Conversely, sites with low fitness benefit are less likely to be present at the site (low π h a ), and will 518 have a high rate of being replaced. Therefore, in addition to amino acid propensities, we looked at 519 the replacement rates over time. We calculate the rate of leaving the resident amino acid at a site h 520 as the sum of the transition rates (using equation (3)) over all sequences that differ from the current 521 sequence by a single nucleotide and have a different amino acid at site h.

522
Mixed linear model analysis 523 In order to assess if amino acid propensities shifts were consistent with random fluctuations we fitted 524 the data to a mixed linear model of the form where x ∼ N(0,σ 2 ) and β ∼ N(0,σ 2 β ). We tested a null model assuming random shifts in propensities 526 where σ 2 β = 0 against an alternative model where σ 2 β > 0.

527
All code used to simulate, analyze, and plot data has been uploaded and is freely available from 529 https://github.com/noory3/antiStokes shifts.    F.  The relationship between the stability effect of a substitutions (∆∆G) and the resulting average change in landscape uniformity (avg ∆H). Color bar represents the proportion of sites for which the propensity for the resident amino acid decreased (∆π h a < 0). Positive avg ∆H values imply that, on average, the landscapes became more uniform. Therefore, the substitution is deemed permissive. Negative avg ∆H are indicative of restrictive substitutions. Plotted results are based on a single simulation of the 1pek protein. (C) The percentages of different types of substitutions for each of three proteins (1qhw, 2ppn, and 1pek). Percentages are calculated from 500 protein-specific trials