AI-predicted protein deformation encodes energy landscape perturbation

AI algorithms have proven to be excellent predictors of protein structure, but whether and how much these algorithms can capture the underlying physics remains an open question. Here, we aim to test this question using the Alphafold2 (AF) algorithm: We use AF to predict the subtle structural deformation induced by single mutations, quantified by strain, and compare with experimental datasets of corresponding perturbations in folding free energy ΔΔG. Unexpectedly, we find that physical strain alone – without any additional data or computation – correlates almost as well with ΔΔG as state-of-the-art energy-based and machine-learning predictors. This indicates that the AF-predicted structures alone encode fine details about the energy landscape. In particular, the structures encode significant information on stability, enough to estimate (de-)stabilizing effects of mutations, thus paving the way for the development of novel, structure-based stability predictors for protein design and evolution.

AI has ushered in a revolution in structural biology, yet we are still in uncharted waters [1,2].In particular, it is not clear whether AI algorithms that predict protein structure from sequence, such as AlphaFold (AF) [3] or RoseTTAFold [4], owe their unprecedented accuracy to highly sophisticated pattern recognition or these algorithms can capture some of the many-body physics underlying protein folding.Recent studies provide evidence suggesting that AF has learned an effective energy functional that is searched in order to accurately predict the native structure [5], even if it includes uncommon structural motifs [6].
A stringent test for whether an AI algorithm has learned the actual physical energy landscape would be the capacity to probe, from the predicted structure alone, changes in the thermodynamic free energy (∆∆G) due to single mutations.This would indicate that the predicted structure encodes fine details about the physical energy landscape.Besides this fundamental interest, such capacity may be impactful in applications: Of particular importance for protein design and sequence generation [7][8][9][10], and for protein evolution [11][12][13] is the ability to predict whether a given protein sequence will lead to a stablyfolded structure [3,4].AF has been reported to predict folded structures for proteins that are not stable [14][15][16].Despite this, some analyses suggest that AF and RoseTTAFold can be used for predicting stability changes upon mutation [14,[17][18][19][20].All this motivates us to systematically investigate here the question of whether AI-predicted structures can be used to infer changes to free energy landscapes.
To this end, we study a curated subset of 2,499 measurements of stability change (∆∆G) due to single mutations, taken from ThermoMutDB (TMDB) [21].Strain is a simple general measure of deformation in proteins [22][23][24][25][26][27].We find that the deformation upon mutation -as measured by the effective strain [14] correlates well with ∆∆G.We show it is essential to average over an ensemble of multiple AF-predicted structures to get precise estimates of strain due to mutation, and that most of the relevant information is gleaned from the residues within 15 Å of the mutated residue.Our initial motivation was to exam-ine whether the AF structures encode any information about stability.Surprisingly, we found that correlations between strain and ∆∆G compare well against those obtained using state-of-the-art ∆∆G predictors, suggesting that AF predictions are highly informative of stability changes.We propose that new energybased force fields can be developed that may provide a mechanistic understanding of the effects of mutations on stability.Since stability measurements are easier than structure determination, such a development could settle the question of whether AI algorithms have truly learned the physics of protein folding by testing on massive stability datasets [5,6,28].
Strain correlates with ∆∆G in Thermonuclease.-Wefirst examine thermonuclease (NUC, Uniprot ID, P00644; staphylococbus aureus) -the protein that has the highest number of mutants in TMDB (491 after applying controls, Appendix A).NUC consists of a folded region (starting around K88) and an extended disordered region near the N-terminus (Fig. 1A), indicated by the low pLDDT (AF-predicted confidence score) values (Fig. 1A-B).The NUC mutants are all sampled from the folded domain (Fig. 1C).Note that we define ∆∆G such that an increase in ∆G upon mutation is destabilizing.
To measure deformation, we calculate effective strain (ES; Appendix B) per residue, S i , between wildtype (WT) and mutant structures predicted by AF (Fig. 1D).Disordered residues always show high ES (Fig. 1D, dotted lines) due to prediction noise, regardless of mutations [14].We therefore exclude residues whose pLDDT < 70 from the ES calculations (Fig. 1D, solid lines).Even without disordered residues, if we only compare two static structures (Fig. 1D-E, teal), we still see residual ES in regions far from the mutated site.This occurs since regions in proteins with high flexibility tend to have high variability across repeat AF predictions (Supplementary Material [SM] Fig. 1) [29].To achieve a more accurate estimate of deformation due to mutation, we calculate deformation using 'averaged' AF structures (Appendix B) [14].This drastically reduces ES in most regions (which originates chiefly from noise and fluctuations), except for regions near the mutated site (Fig. 1D-E, orange).Using this more precise measure of deformation due to mutation, we find a significant correlation (Spearman's ρ = 0.35) between strain at the mutated site S m and change in stability ∆∆G (Fig. 1F).This correlation is even higher (Spearman's ρ = 0.57) when calculating the sum of strain over all residues within a spherical neighborhood of radius γ = 15 Å around the mutated site, S n (Fig. 1G).See SM Fig. 2 for similar analyses without excluding low pLDDT residues and without using average structures.Our rationale for choosing γ = 15 Å will become clear in a following section.For now, we highlight that, in this particular example (NUC) it appears that AF-predicted deformation correlates quite well with empirical measurements of changes in stability.

Strain correlates with ∆∆G within protein families.-
We expand our analysis of NUC to more protein families, again focusing on the families that have the highest coverage in TMDB.For the second-to-fifth most common proteins, we find correlations between stability change and local deformation ranging from 0.39 ≤ ρ ≤ 0.61 (Fig. 2A).Extending this analysis to the 40 most common proteins reveals that there are insufficient samples to show significant strain-stability correlations in most cases.Nevertheless, in the 16 correlations that are statistically significant, the average Spearman's ρ is 0.54 (Fig. 2B) with an overall range 0.29-0.78.We see that for all cases with sufficient data, there appears to be a consistent correlation between strain and changes in stability.
Determinants of strain-stability correlation.-Tobetter understand why strain is correlated with changes in stability, we examine the correlations between strain at individual residues S i (not necessarily the mutated ones) and ∆∆G, and compare this with the number of neighbors within 10 Å of each residue i.For NUC, we find that mutations in buried regions (those with many neighbors) tend to have an outsize impact on stability (Fig. 2C), as expected, given the standard paradigm of buried residues having low mutation rates [30].In general one expects mutations of buried residues to affect more bonds and therefore inflict larger stability changes.Indeed, across the five most common proteins in the TMDB, we see a clear trend whereby the residues with the highest strain-stability correlations are amongst the most buried within that protein (Fig. 2D).
Evidently, when only a few points are sampled, the resulting correlation is not particularly informative.Also, in a large protein, more points are needed to achieve sufficient sampling.This is demonstrated most clearly in IgG-binding protein G (IgGb, P06654): this protein has a length L = 448, so even though we have 225 samples, many buried residues do not correlate with stability changes (Fig. 2D).This is because many regions have no mutations sampled from them, so deformation remains low no matter how buried the residues are.Another complication arises from the abundance of disordered regions (SM Fig. 3) in which AF appears to have little capacity to predict mutation effects, while well-folded regions are small.As a result, for IgGB the link between number of neighbors and effect on stability is weak.This case highlights the need for a nuanced approach to understanding the relationship between strain and changes in stability.
Range of informative residues.-Wefind that the S n −∆∆G correlation increases with γ, the radius of neighborhood used to calculate S n , up to about 10 ≤ γ ≤ 20 Å, depending on the protein family (Fig. 2E).This optimal length scale of ∼15 Å, could be the outcome of two possible effects: First, we note that the residues that are most informative of stability changes are the most buried ones (Fig. 2C), and not necessarily the mutated residues.Hence, the optimal γ needs to be large enough to include some of these informative, buried residues in the S n calculation.In TMDB, for example (SM Fig. 4), the average distance between buried and mutated residues is 10 Å.This length scale can be rationalized by a simple geometric argument: the average distance to the center of a sphere (the most buried part) of diameter d is r 0 = (3/8)d; the average volume of an amino acid is ∼144 Å 3 [31], so in a folded, approximately spherical domain of ∼100 amino acids, r 0 ∼ 11.3 Å.
Second, when we compare pairs of single AFpredicted structures, which exhibit significant noise unrelated to mutations (SM Fig. 1), strain-stability correlations reach a maximum as a function of γ and then decrease; this decrease is not observed for averaged structures (Fig. 2E).This is consistent with reports that AF strain predictions are usually indistinguishable from noise after about 15 Å, while this range could be extended to 25 Å (and noise reduced by a factor of 4) by averaging over structures [14,22].
In summary, we propose that the reason for the maximum information from strain around a range of 15 Å is due to both the need to increase γ to include buried residues and also because long-range mutation effects may be masked by prediction noise.In this context, we note that conformational dynamics of proteins were reported to exhibit correlations with similar and even longer ranges, as demonstrated by normal mode analysis [22,32,33].However, one needs to remember that displacement and strain -the tensorial derivative of the displacement -may show different correlation lengths [22].Strain correlates with ∆∆G across protein families.-Wefind a moderate correlation (ρ = 0.36) between S n and ∆∆G when comparing all the measurements together.This correlation is expected to be limited due to the inability of strain to differentiate between stabilizing and destabilizing mutations.Since Eq.( 1) measures the absolute relative change in distances, S ≥ 0 by definition, and thus the strain is invariant to reversing the reference and target structures.Indeed, we find a somewhat higher correlation (ρ = 0.41) between S n and the magnitude of stability changes, |∆∆G|.This indicates that more generally, large structural changes lead to large stability changes, independent of the sign of the change.
We do not expect a simple mapping between S n and ∆∆G, given the complexity of protein structures and intramolecular interactions.We obviate protein size effects to an extent by only looking at mutation effects within γ = 15 Å, but there are other protein-specific factors -such as the degree of disorder, protein shape, flexibility, and amino acid packing -that may alter the relationship between strain and ∆∆G for different proteins.Nonetheless, the strain-stability correlations shown here indicate that strain due to mutation contains considerable information about stability changes that may be leveraged in subsequent development of predictors of ∆∆G.
Strain correlates with ∆∆G almost as well as tailored ∆∆G predictors.-Toput the strain-stability correlations in context, we compare them with two stateof-the-art ∆∆G predictors, DDMut and FoldX (Appendix C) [34,35].FoldX predicts ∆G from structure using empirical energy-based potentials; it is used to calculate ∆∆G by first a generating structure for the mutant based on a reference WT structure, which enables calculation of ∆G for both WT and mutant structures.DDMut uses a neural network to predict ∆∆G, using a reference structure and a mutation as input.
We find that S n −|∆∆G| correlations are almost as high as correlations obtained using FoldX on average, but lags behind the more recent algorithm, DDMut (Fig. 4).We were genuinely surprised by this performance since strain is a simple general measure of deformation and not designed specifically for stability, in contrast to FoldX and DDMut.However there is clearly room for improvement: simply by counting the number of neighbors at the mutation site we see a correlation of ρ = 0.30 (Fig. 4D); the mean absolute error obtained from a linear fit is 1.2 kcal/mol for both S n and DDMut (SM Fig. 5).We note that our aim here is not to use strain to predict ∆∆G, but rather to see whether the strain predicted by AF is informative about stability changes.But given the surprisingly high correlation, we speculate that AF-predicted structures can be leveraged to produce even better ∆∆G predictors.
We expected that since FoldX calculates ∆G for a structure, it would give us a more accurate estimate of ∆∆G than strain if we use it on AF-predicted structures.Surprisingly, we find that this method (AF-FoldX) performs worse than strain in this case (ρ = 0.26).Likewise, we find that calculating strain using the structures generated by FoldX results in a lower correlation with ∆∆G (ρ = 0.30, SM Fig. 6).We considered also the possibility that AF-predicted structures are not as accurate as FoldX-generated structures.However, we find that strain calculated using FoldX structures is not correlated with distance from the mutated site (SM Fig. 7), indicating that FoldX is less accurate than AF in predicting the effect of mutations on structure.These results suggest that there is a promising path for generation of new energy-based methods for ∆∆G prediction using AFpredicted structures.Can AF be used to predict stability changes?-We emphasize that our aim here is not to outright develop a ∆∆G predictor but rather to investigate whether AF predictions are informative of stability changes.We have found that a general measure of deformation, strain, correlates quite well with ∆∆G.Although it was not designed to be a ∆∆G-predictor, S n appears to be almost as good at predicting the magnitude of stability changes as state-of-the-art ∆∆G predictors.Of course, this needs to be tested on a larger set of measurements, and more structures are needed for experimental validation of the relationship between strain and stability (SM Sec.1C).We also note that higher correlations have been observed for these predictors on different datasets, so this analysis should be repeated on larger sets of ∆∆G measurements [34,36].Yet, within these limitations, it seems clear that there is sufficient information in AFpredicted structures to make ∆∆G predictions.It stands to reason that new physics-based models can be developed to achieve even better predictions, off the back of AF-predicted structures.Alternatively, these structures could be used to reparameterize existing force fields such as FoldX.The recent explosion of high-throughput measurements [28,37] will certainly lead to more machine-learning and sequencebased approaches.Nevertheless, we feel that physicsbased methods are essential to offer a detailed view into the mechanistic effects of mutations on stability.The breakthrough by AF in structure prediction might offer the key to this future.Strain-energy relation.-Ourpredictions allow us to deduce an effective strain-energy relation by modeling the protein as an elastic spring network (as detailed in SM Section 2) [23,25].Within this framework, mutation can be seen as introducing a point defect that perturbs the network [26], thereby inducing energy change ∆∆G, which includes contributions from entropy, changing topology (breaking and forming bonds), and energy release in pre-stressed frustrated bonds [38].The energy change is quadratic in the strain, with average effective spring constants K ranging between 1-10 kcal/mol/Å 2 (SM Fig. 9).These typical values of K [27] estimate the average curvature of the high-dimensional energy landscape [39].K varies widely in and between protein families, reflecting the anisotropic geometry and heterogeneity of the landscapes.Advice for using AF to predict mutation effects.-Aprevious study examined the same ThermoMutDB dataset, yet they concluded that AF cannot be used to predict stability.This is due to using changes in pLDDT to measure mutation effects, which does not appear to be reliable for this purpose (SM Fig. 10) [14].We recommend using strain as a more robust measure of the effect of mutations on structure, particularly when using averaged structures.We recommend using about 10 to 20 (i.e., 5 models, 2-4 repeats) structures to get averages, as there are diminishing returns on performance gains (SM Fig. 11).Finally, we tested the strain predicted by the recently released Al-phaFold3 [40] in three TMDB mutants by comparing with AF2 predictions (SM Fig. 12-13), finding no major differences, except in loops and disordered regions.
We have studied the correspondence between AlphaFold (AF) predictions of mutation effects on structure (measured using strain) and changes in stability.We find that strain correlates well with ∆∆G, almost as well as state-of-the-art ∆∆G predictors.Altogether, our findings suggest that new algorithms can be developed to extract more information from AF structures to produce accurate physics-based models of stability change upon mutation.Furthermore, quantitative mapping of the energy landscape may pave the way for more realistic modeling of functional conformational dynamics of protein machines, beyond coarse-grained elastic-network models [22,41].
Code used to filter the ThermoMutDB can be found at github.com/jomimc/AF2_Stability_PRL_2024.We used the PDAnalysis python package to calculate strain.All AF2-predicted PDB structures are available at [42], compressed using FoldComp [43].
This work was supported by the Institute for Basic Science, Project Code IBS-R020-D1.
Appendix A: Stability Data.-We select all proteins from ThermoMutDB (TMDB) [21] with sequence length, 50 ≤ L ≤ 500, and measurements for single mutants made within 293 ≤ T ≤ 313 K and 5 ≤ pH ≤ 8; this amounts to 5,078 out of 13,337 measurements.One problem we found with TMDB is that the indices in mutation codes can be associated with either Protein Data Bank (PDB) [44], Uniprot [45], or "unsigned", yet we wanted to match mutations to Uniprot sequences.Hence, if indices were matched to PDB indices that differed from Uniprot indices, we used a custom script to convert the mutation codes to the correct Uniprot indices.This script only works when the mutation code index refers to the "_atom_site.label_seq_id"entry of the mmCIF file; we excluded many cases where the TMDB mutation indices refer to idiosyncratic indexing (i.e., not starting from one) in the "_atom_site.auth_seq_id"entries in the mmCIF files.Out of caution, if no correct matches were found for a protein we excluded all TMDB entries that were related to this protein (i.e., if they have the same value in the 'PDB_wild" column).We excluded "unsigned", since these were ambiguous, and we discovered that some of these were labelled with incorrect Uniprot accession ids.After this procedure we are left with 3,236 measurements.When multiple ∆∆G measurements are available we report the average ∆∆G across all measurements.We leave out a mutant if the standard deviation of the ∆∆G measurements is higher than 1 kT; this occurred in only about 2 % of cases.We examined why some of these cases had such high variance, and found occasional errors where the ∆∆G sign differed from the reference it was taken from.A full list of errors and corrections can be found in SM Section 1D.The final set of ∆∆G measurements includes 2,499 unique mutants.We study correlations within individual wild-type (WT) proteins (and their single mutants), and across the full set of measurements.
Appendix B: Structure Analysis.-Weuse the Colab-Fold implementation of AF to predict protein structures [46].We run all 5 models, without using templates, and using 6 recycles, and minimization using the Amber forcefield.We run 10 repeat predictions for each sequence, for each model, and create averaged structures following [14].We calculate effective strain, ES, which measures the deformation between mutants and WT proteins, for both averaged and non-averaged structures.ES can be described simply as the average relative change of C α − C α distances between neighboring residues; we define neighbors as residues whose C α positions are within 13 Å.We calculate ES as [14], where r ij is the distance vector between C α positions in residue i and neighbor j in a reference structure, r ′ ij is the corresponding distance vector in a target structure which has been aligned to r ij , N i is the set of neighbors, and n i = |N i | is the number of neighbors.We refer to ES measured at the mutated site m as S m .We also calculate the neighborhood sum of strain S n over residues whose distance to the mutation site δ i , is shorter than some threshold γ, We mainly use γ = 15 Å.We typically only include AF-predicted residues in strain calculations if pLDDT > 70, and treat the rest as disordered, except where otherwise noted.Appendix C: ∆∆G predictors.-Weuse two state-ofthe-art methods to predict ∆∆G: FoldX 5 [47], and DDMut [34].We use the API for the DDMut web server to predict ∆∆G.For FoldX, we use the Build-Model command to generate structures of the WT and mutant sequences and ∆∆G predictions.We use five runs as recommended, and report average values of ∆∆G.For each algorithm, we provide the top-ranked (by pLDDT) AF-predicted WT structure as an input, along with a list of all mutants for that protein.

Supplemental Material for "AI-predicted protein deformation encodes energy landscape perturbation"
John M. McBride 1, * and Tsvi Tlusty

PDB Structure Data
A. How many structures are needed to average?
Since we found that correlations between S n and ∆∆G are higher when using averaged AF-predicted structures, a pertinent practical question is, how many structures should one use?We looked at the correlations (both ∆∆G and |∆∆G|) as a function of the number of structures used to create average structures, finding that most of the improvement is seen when using only five structures instead of one (Fig. 11).We suggest that using 20 structures is sufficient to achieve good performance.

B. pLDDT is less informative of stability than strain.
Since AF's predicted confidence measure, pLDDT, is a good indicator of disorder, we also investigated whether changes in pLDDT are informative of changes in stability.We compared changes in pLDDT at the mutated site m, changes in the average pLDDT (across the entire protein), for both single and averaged structures, against both ∆∆G and |∆∆G|.We find poor correlations between changes in pLDDT and ∆∆G (Fig. 10), in agreement with Pak et al.
C. Insufficient PDB data to measure empirical strain-stability correlation.
We sought to repeat our analysis using experimental structures from the Protein Data Bank (PDB).We only found ∼ 100 entries which have both wild-type (WT) and mutant experimental structures.We removed any NMR structures, and only considered pairs with the same oligomeric state and same bound substrates.This left us with 36 pairs of PDB structures, which is too few to study statistical correlations.It is possible that there are sufficient ∆∆G measurements across other data sets, but the limit of unique mutations is determined by the number of possible structure pairs that differ by a single amino acid substitution.
In a previous study (McBride et al., 2023) we only found ∼1,000 of these, so there is some hope that this analysis can be performed if the other datasets have enough ∆∆G measurements.

D. Construction of a reduced ThermoMutDB set
Our original approach was to use the ThermoMutDB subset provided by Pak et al. (2023), however we decided to work exclusively with the original ThermoMutDB data, for two of reasons.We found that the "TMDB_ID" values do not always match with the original ThermoMutDB ID numbers.We also found a case where a mutant (P00644, L89V) was assigned an incorrect mutation position; in this case, both the PDB SEQRES and Uniprot sequences have a lysine at position 89, but the correct mutation code should be L171V.
We found several issues in the ThermoMutDB that led to exclusion or correction of entries.We found that for some entries, the reference protein for which the mutation was defined (and ∆∆G was calculated) was not the WT sequence found in Uniprot; (Consortium, 2018;Isom et al., 2008Isom et al., , 2010;;Merkel et al., 1999) we excluded these.We found duplications of entries, where measurements from one paper were referenced in several other papers, yet both were included; (Robinson et al., 2018) we excluded these entries.We found that one measurement (Y27A) in one source, (Shortle et al., 1990) and all measurements in another source (Carra and Privalov, 1995) were included with the incorrect sign; we corrected these.

Linking spatial deformation and energy using elastic network models
It is instructive to examine the energy landscape within a framework of effective elasticity theory, which allows us to link the scale of free energy changes with the underlying structural changes.The generic relation for a single spring is U = 1 2 K∆r 2 , where K is a spring constant and ∆r is the deviation from the spring's equilibrium length.A generalization to 3D continuum material is a quadratic form for energy density, u = 1 2 k αβ γδ S αβ S γδ , where S is the 3 × 3 strain tensor, k is the 3 × 3 × 3 × 3 elasticity tensor (Landau et al., 1986), and we use Einstein's summation convention.
To obtain a similar effective strain-energy relation for the protein, we model a protein as an elastic network where amino acids are connected by homogeneous springs with elastic constant K between their C α positions.Two C α positions i and j are connected if they are within γ Å from each other, |r i j | < γ.To measure the deformation, we use arXiv:2311.18222v2[q-bio.BM] 19 Jul 2024 our previously-developed metric called the local distance difference (LDD), (McBride et al., 2023) which is the L2norm of the vector of changes in the C α −C α distances between neighbors, Here, r i j and r ′ i j are the vectors between residue i and neighbor j in, respectively, a reference structure and a target structure, and the summation is over the n i neighbors j for which |r i j | < γ.In practice, we average |r i j | and |r ′ i j | over many structures to get an estimate of the equilibrium distance between neighboring residues.LDD is highly correlated with effective strain (McBride et al., 2023), and thus is also correlated with ∆∆G (Table I).Then, the overall elastic energy of the network is simply the sum where the additional 1 2 factor accounts for counting twice each i − j bond.
To get a rough estimate for the typical values of effective spring constant K that describes the energy landscape of the protein, we first compute the LDD by comparing wild-type and mutated AF-predicted structures, exactly in the manner in which the effective strain is computed (see App. B in manuscript).Next, we equate the change in stability ∆∆G to the elastic energy U in Eq. ( 1), from which we obtain an approximation for K, It is essential to note that ∆∆G includes significant contributions from changes in the entropy and the topology of the bond network (bonds may break and reform), which may sometimes decrease the overall energy.Moreover, many bonds in the equilibrium conformation of the protein are "frustrated" (Ferreiro et al., 2014), i.e. shorter or longer than their equilibrium length.Hence, deformation can release some of the elastic energy in these pre-stressed bonds.For all these reasons, the estimated spring constant K from Eq.( 2) is only an effective parameter, which serves as a ballpark figure for the actual strength of elastic interactions between amino acids.
Using this method, we estimate K for each of the 46 protein families for which we have 10 or more ∆∆G values.We calculate the sum of squared LDD i for each protein in a family, fit a straight line between this and ∆∆G using Eq.( 1) with an intercept at zero, and get K from the gradient.As expected, the value of K depends strongly on the topology of the assumed bond network, in which the number of springs is determined by the neighborhood radius γ.Choosing γ ∼ 6 − 7 Å corresponds to bonds between amino acids that are likely to be directly in physical contact.At this range, we find an average spring strength of K ∼ 10 kcal/mol/Å 2 (Fig. 9).Average number of neighbors within 10 Å FIG. 4 The most buried residues tend to be within 15 Å of other residues.For each protein we measure the number of neighbors per residue.Neighbors are defined as the set of residues whose C α distances are within 10 Å. Plot shows average number of neighbors of residues as a function of the distance to the mutated site.Average is calculated for all unique proteins in our reduced ThermoMutDB sample.7 Correlation between distance from mutated site δ i and S i for all proteins in our reduced ThermoMutDB sample.Different distributions are shown for shear calculated using: AF-predicted structures, all residues and pairs of single structures (AF).AF-predicted structures, only residues with pLDDT > 70 and pairs of single structures (AF; pLDDT > 70).AF-predicted structures, all residues and pairs of averaged structures (AF; Ave).AF-predicted structures, only residues with pLDDT > 70 and pairs of averaged structures (AF; pLDDT > 70; Ave).FoldX-predicted structures, all residues and pairs of single structures (FoldX).FoldX-predicted structures, all residues and pairs of averaged structures (FoldX; Ave).

FIG. 1 .
FIG. 1. Strain calculated using AF-predicted structures correlates with ∆∆G.A: AF-predicted structure of thermonuclease from staphylococcus aureus (NUC) -the most common protein in TMDB.Residues are colored according to pLDDT (AF-predicted confidence score).B: pLDDT per residue.C: Distribution of 491 mutation sites along the sequence, and corresponding changes in stability ∆∆G.D: Strain upon mutation (A176G, ∆∆G = 2.4 kcal/mol) per residue; the mutated site is indicated by the black dashed line.The strain calculation either includes (dotted line) or excludes (solid line) residues with pLDDT < 70, and uses either a single pair of structures (green) or pairs of averaged structures (orange).E: Strain as a function of distance from the mutated site, δ i , for single pairs and ensemble-averaged pairs of structures.F-G: Empirical ∆∆G vs. strain at the mutated site, Sm (F), and the sum of strain over all residues within 15 Å of the mutated site, Sn (G).Solid line shows the median; Spearman's ρ, Pearson's r, and corresponding p values are shown; circles are shaded by density.

FIG. 2 .
FIG. 2. When and why doesSn correlate with ∆∆G? A: Change in stability ∆∆G against strain near mutated residue Sn for four proteins (the second to fifth most common in TMDB; the most common one, NUC, is shown in Fig.1); Spearman's ρ, Pearson's r, and corresponding p values are shown; black line is the median; circles are shaded by density.B: Sn−∆∆G correlation for each of the 40 most common proteins in TMDB as a function of the number of samples; results with p < 0.05 are shown by dark circles.C: S i −∆∆G (Pearson's) correlation for each residue i as a function of the number of neighbors (for NUC only).D: Distributions of numbers of neighbors, grouped by S i −∆∆G Spearman's correlation, for the five most common proteins.E: Sn−∆∆G (Pearson's) correlation for each residue i as a function of the neighborhood threshold value γ for calculating Sn, for both pairs of single structures and averaged structures, for five proteins.

FIG. 3 .
FIG. 3. Sn−∆∆G correlations across proteins.Strain near mutated site Sn vs. ∆∆G (A) and magnitude of stability change |∆∆G| (B) for all mutations in our reduced TMDB sample of 2,499 unique mutants., Pearson's r, and corresponding p values are shown.

FIG. 4 .
FIG. 4.Comparison with state-of-the-art algorithms.A-C: Correlation (Spearman's ρ; linear correlations are shown in SM Fig.8) between strain Sn and magnitude of stability change |∆∆G|, vs. correlation between ∆∆G and predictions of DDMut (A), FoldX (B), and FoldX using AF-predicted structures (C); separate points are shown for each of the 40 most-common proteins.D: Correlations (as above) for the full sample of 2,499 unique mutants.Correlation between number of neighbors within 10 Å of the mutated site is shown as "Buried".

FIG. 1 FIG. 3
FIG. 1 AF predictions are more variable in regions with low pLDDT.Strain per residue S i vs. pLDDT per residue for: (A) Strain calculated between repeat AF predictions of the same protein sequence; blue dots are shown for different pairs of single structures; black circles indicate means across multiple pairs.(B) Strain calculated for pairs of averaged structures, whose sequences differ by a single mutation; residues with pLDDT < 70 are excluded.Pearson's correlation coefficient and p value are shown.

FIG. 5 FIG. 6
FIG.5Mean absolute error of linear fits to ∆∆G using various methods.
FIG.7Correlation between distance from mutated site δ i and S i for all proteins in our reduced ThermoMutDB sample.Different distributions are shown for shear calculated using: AF-predicted structures, all residues and pairs of single structures (AF).AF-predicted structures, only residues with pLDDT > 70 and pairs of single structures (AF; pLDDT > 70).AF-predicted structures, all residues and pairs of averaged structures (AF; Ave).AF-predicted structures, only residues with pLDDT > 70 and pairs of averaged structures (AF; pLDDT > 70; Ave).FoldX-predicted structures, all residues and pairs of single structures (FoldX).FoldX-predicted structures, all residues and pairs of averaged structures (FoldX; Ave).

FIG. 8
FIG.8Comparison with state-of-the-art algorithms.A-C: Correlation (Pearson's r) between strain S n and magnitude of stability change |∆∆G|, vs. correlation between ∆∆G and predictions of DDMut (A), FoldX (B), and FoldX using AF-predicted structures (C); separate points are shown for each of the 40 most-common proteins.D: Correlations (as above) for the full sample of 2,499 unique mutants.Correlation between number of neighbors within 10 Å of the mutated site is shown as "Buried".

FIG. 9 FIG. 10 FIG. 11 FIG. 12
FIG.9Computation of effective spring constant from stability changes.Distribution of homogeneous spring constant K inferred from empirical ∆∆G and predicted deformation using an elastic network model (Eq.(2)).Different values of K are obtained for different elastic network topologies, defined by the neighborhood cutoff radius γ Å.
Choosing Larger neighborhood radii γ ∼ 10 − 11 Å, as commonly assumed in elastic network models (McBride et al., 2022), yields lower average values of K ∼ 1 kcal/mol/Å 2 .The effective K values vary broadly, with a standard error of about one order of magnitude, reflecting the diverse, intricate geometry of protein energy landscapes.Even in a single protein family, the highly anisotropic landscape exhibits a wide distribution of spring constants.Certain directions in the space of conformations are very flat as the corresponding deformation hardly changes ∆∆G, while other directions are very steep with large ∆∆G changes (McBride et al., 2022).

TABLE I
Comparison of linear and rank-order correlations between ∆∆G and either Effective Strain or LDD (summation over residues near the mutated site), for main text Figures1-3.