Structure-based prediction of Ras-effector binding affinities and design of 1 ‘branchegetic’ interface mutations

Ras is a central cellular hub protein controlling multiple cell fates. How Ras interacts with a 16 variety of potential effector proteins is relatively unexplored, with only some key effectors 17 characterized in great detail. Here, we have used homology modelling based on X-ray and 18 AlphaFold2 templates to build structural models for 54 Ras-effectors complexes. These models were used to estimate binding affinities using a supervised learning regressor. Furthermore, we 20 systematically introduced Ras ‘branch-pruning’ (or branchegetic) mutations to identify 200 21 interface mutations that affect the binding energy with at least one of the model structures. The 22 impacts of these branchegetic mutants were integrated into a mathematical model to assess the potential for rewiring interactions at the Ras hub on a systems level. These findings have 24 provided a quantitative understanding of Ras-effector interfaces and their impact on systems 25 properties of a key cellular hub.


Introduction
effector interactions, where mutations are introduced into Ras that differentially impact binding 82 to effectors 21 . For example, introducing a mutation can result in a steric clash in the interface 83 formed with one effector, but not with another; hence the interaction with one effector is broken 84 while intact for another. 85 In this work, we first generate homology models of all Ras-effector (RBD) interactions 86 and predict affinities of the RBDs in complex with Ras•GTP. We then use the generated model 87 structures to employ a systematic branchegetic strategy that explores the impact of Ras interface 88 mutants on binding to all effectors. Altogether, our results contribute to a quantitative and Ras consists of the two switch regions, switch 1 and switch 2. One of the main structural features 98 of the interface between Ras and RBDs is the formation of an intra-molecular β-sheet between 99 2 on RAS and 2 on the RBD. This interaction is a highly conserved structural feature across 100 all available complex structures, with deviations in orientation of less than 1 Å ( Figure 1B, see 101 Methods for MAE). 102 Analyzing the energetic profile of the interface in silico using the FoldX force field 103 shows that there are also some recurring hotspots in the interface (highlighted in Figure 1C).

104
I36, D38 and Y40 are well characterized as important residues for the interaction between Ras 105 and effector domains. Additionally, for the structures 3DDC and 1LFD, the mutation E31K was 106 introduced to stabilize the complex for crystallization. Our analysis confirms that this interface 107 mutation has indeed been favorable for the complex formation. Finally, the energetic In order to study these interface features in a more diverse set of structures, homology models 118 of the complex between RAS and RBD were constructed for all proteins containing an RBD in 119 the human proteome. 120 The homology modelling pipeline is based on the already existing complex models 121 (Table S1). Also, with the recent release of AlphaFold2 14 and the accompanying AlphaFold 122 Protein Structure Database 15 , the RBD domains of all potential effectors were extracted from 123 that database and used. The structures of RBDs are predicted with good confidence by 124 AlphaFold2 and our analysis indicates that AlphaFold2 is reliable at predicting the RBD fold 125 ( Figure S1). Additionally, AlphaFold2 complex modelling was attempted for all potential 126 complex structures and models which AlphaFold2 were confident in (by Predicted Alignment 127 Error (PAE)) and where the -sheet alignment of the interface was within a tolerance to what 128 has been observed in crystal structures, were used as templates as well ( Figure S2 and S3, 129 compare MAE Figure 1). There are two kinds of templates to use: 1) complex templates which 130 comprised of the already experimentally determined complex structures of Ras and RBDs, as 131 well as "good" AlphaFold2 predicted complex templates (Table S1), and 2) templates of the 132 RBDs alone which were extracted from the AlphaFold2 Protein Structure Database (Table S2). 133 An overview over the pipeline is depicted in Figure 2. Homology modelling was performed 134 using homelette 22 with modeller and altMod [23][24][25] . Evaluation of predicted structures was 135 performed using QMEAN, MolProbity and SOAP potentials [26][27][28][29] . The top 300 models for each 136 target for each source of complex templates (experimental or AlphaFold2) were selected by 137 combining the different scores. Subsequently, analysis using FoldX (interaction energy and 138 alanine scan) was performed.

139
In order to evaluate our approach, we generated a validation set, in which we created 140 models for the structures already solved by X-ray crystallography without using information 141 from the specific structure. Models generated in this validation set were compared to the 142 underlying ground truth by assessing their correlation of the in silico alanine scan results to 143 those of the crystal structures of interest. Using this ground truth, different methods to select 144 representative models from the hundreds of structures were assessed. In particular, we evaluated 145 the hyperparameters of an unsupervised learning pipeline comprised of different feature 146 selection and dimensionality reduction strategies followed by clustering with the OPTICS 147 algorithm (see Methods for more details about the hyperparameter space). After clustering, 148 three representative structures were chosen. Based on the performance in the validation set, the 149 optimal set of hyperparameters was chosen (Table S3,   Analogous to how we characterized the interfaces of the experimentally determined 153 complex structures before, we performed the same analysis on the complex models. The overall 154 FoldX interaction energies for the models are diverse, indicating that maybe some of the 155 complexes are energetically unfavorable and would not form ( Figure S5B). In general, the 156 binding energies are lower than what would be observed in crystal structures, which is to be 157 expected. There are one or two outliers with regards to FoldX binding energy, namely RASSF8 158 and PIK3C2B. In particular, RASSF8 is also showing an uncommon hotspot profile, with 159 multiple unfavorable hotspots that are only appearing for this set of structures ( Figure S5A).

160
Based on this behavior, RASSF8 is excluded from further analysis.

161
The analysis of hotspots confirmed the already established hotspots. I36, Y40, D38 and 162 E37 are the most commonly observed hotspots ( Figure 3AB, Figure S5A). Interestingly, while 163 I36, Y40, and E37 are exclusively favorable to the interaction with the effector protein, D38 164 seems to be also unfavorable in some of the structures ( Figure 3B). The energetic diversity of 165 the hotspot D38 was further investigated in the models. For this, two models were picked for 166 which D38 was a favorable hotspot in the alanine scan analysis ( Figure 3C: RASSF1, Figure   167 3D: RGL3), and two models were picked for which it was unfavorable ( Figure 3E: ARAP2, 168 Figure 3F: RAPGEF3). Next, neighboring amino acids were analyzed. For favorable 169 interactions, we were able to observe positively charged amino acids. On RASSF1, we find 170 K216 and H217, whereas on RGL3, there is R630. These amino acids probably form strong 171 interactions with the negatively charged D38 on Ras. In contrast, for the models where D38 172 comes up as an unfavorable hotspot in the interface, we observe an uncharged, mostly 173 hydrophobic neighborhood.  (Table S5). From a combination of different regressors, feature selection procedures, and 185 hyperparameters, the best approach was determined using a cross validation strategy (see 186 methods). A support vector machine-based regressor (see hyperparameters in Table 4) 187 performed best in cross-validation with an R2 score of 0.53. Then, the performance of the best 188 approach was evaluated in an out-of-sample test set, where it achieved an R2 score of 0.77. The Having Ras-effector structural models available allowed us to analyze the individual 206 contributions of switch 1 and switch 2 to binding using the results from alanine scanning 207 (similarly as done before for the X-ray Ras-effector structures). We find that generally most contributions increase to 6 to 9 kcal/mol with also increasing switch 2 contributions (1 to 2 215 kcal/mol). We also observed negative switch contributions (mainly for switch 1), indicating 216 that these proteins are either not well predicted or non-binders. Indeed, all structures with 217 negative switch energies are predicted to be weak binders.

219
Branch pruning analysis using Ras-effector model structures 220 Next, we were interested in exploring surface mutations on Ras that would selectively influence 221 the binding to some, but not all effectors. Both enhancing and inhibiting mutations are of 222 interest. This could enable the engineering of the Ras effector system to respond to stimuli in 223 different ways and to study selective sets of effectors. We previously reported a framework for 224 the identification and evaluation of so called 'branch pruning' mutations 21 . Since our protein 225 is interacting with a many different effectors at the same time through the same interface it will 226 be quite unlikely to identify mutations that selectively target only one protein. Instead, it is more 227 likely that we will identify mutations that enhance in interaction with some proteins while 228 inhibiting some others. indicated. This is probably the best point to disrupt the system. Y40, interestingly, while being 234 a ubiquitous hotspot, is not a good spot for engineering the interface because mutations seem 235 to affect protein stability (compare with Figure S7A). Also of interest is that we detect both 236 mutations that increase and disrupt binding ( Figure S7B).  Next, we explored to which extend, for a specific effector, it is possible to modulate its 257 binding. For this, we visualized the possible changes to the relative amount of effector bound 258 to Ras across all systems tested ( Figure 7C). On the side of proteins that can be negatively  Finally, we investigated whether there are recurring states that the modelled system 266 assumes and whether these states are dependent on Ras-GTP load, the tissue, or interface 267 mutation. To this end, we applied uniform manifold approximation and projection (UMAP) to 268 our systems to transform a high-dimensional space of absolute and relative effector binding into 269 a 2D space. Then, we used OPTICS to identify areas of high density in this 2D plane and 270 assigned them into 24 distinct clusters as well as outliers outside the clusters ( Figure 8A). For 271 each cluster, we picked three of the systems at random and visualized their relative effector 272 binding ( Figure S8). Each of these clusters belongs to a different "state" that the Ras-effector investigated for Ras-GTP load, tissue composition, and interface mutation status ( Figure 8BC). 278 We find that there are different ways a distinct state of the system can come together: for the 279 state analyzed in Figure 8B, we can see that it is composed of many different tissues, but only 280 a handful of interface mutations, most of them D38 mutations. This indicates that this state can 281 be reached from many different tissues by a specific, recurring set of mutations. In contrast, the 282 state analyzed in Figure 8C is entirely composed of a single tissue (lymph node) with many 283 different mutations, indicated that this state can only be reached by a specific tissue. In this work we have shown a complete structural reconstruction of Ras and the RBDs of its 295 effectors based on state-of-the-art technology. We used these structural models to investigate 296 the workings of the interface between Ras and its effectors, as well as to search for and identify 297 potential branch pruning mutations on Ras that would alter the behavior of the underlying 298 system. We analyzed the effects on the steady state of the Ras effector system.  This development was crucial for the quality of this study. While a normal homology modelling 302 pipeline would have been successful, the inclusion of both aspects of AlphaFold2 models was 303 essential for the quality of the results. On the one hand, finding additional potential complex 304 templates diversified the possible configurations of the interface that we could use to generate 305 our models. On the other hand, having high quality templates of the RBDs enabled us to 306 improve the structural predictions. Interestingly, with all the advancements that AlphaFold2 307 brought, this combined AlphaFold2/homology modelling approach yielded more consistent and 308 better results for this particular question.

309
There is a growing body of literature about how Ras dimerizes or forms multimers, or 310 interacts with the membrane to modulate the signaling. All these aspects have been deliberately 311 left out for this approach. The essence of the interaction of Ras and its effectors is the binary 312 protein-protein interaction between the Ras switch regions and the RBD. This common feature 313 was the focus of this work, and we believe that other factors such as dimer/multimerization of 314 Ras, the composition of the membrane, etc., are only modulators for this interaction.

315
By creating a complete structural system, we were able to investigate and understand    Structures were downloaded from the PDB. In the case where multiple models were available, 385 the best one by MolProbity score was selected 27 28 . The PDB files were processed using pdb-386 tools so that all GTP and Mg2+ annotations were in the expected format 38 . The list of used 387 template structures can be found in Table S1. whereas a negative G value indicates that the mutation to alanine improved the interaction 396 between the two proteins. The standard error for G values in FoldX is around +-0.8 kcal/mol.

397
In order to identify the most important residues for the respective interaction, a G cut-off of  (Table S1). When the MAE is calculated for X-ray complex templates, the 420 comparison with itself is removed from the calculation. templates of experimental origin, TMalign 43 was used to generate a structure-based sequence 455 alignment based on the RBD in the complex template and the single RBD of the target structure.

456
Then, with those two templates as inputs, models were generated. For complex templates of in 457 silico origin, structure-based sequence alignments were generated with TMalign 43 as described.

458
As an additional template a HRAS single structure (5P21 3 ) was used in the modelling process 459 since the AlphaFold2-generated complex templates, in contrast to the complex templates of For an observation X with i…m being a collection of evaluation criteria and n the total number 466 of observations. A structure with high borda score is a structure that performs well across all 467 metrices.

468
For each of the different sources of complex templates (experimental or in silico), 300 models 469 were selected in a first selection step based on the combined score. These 600 models per target 470 were then further analyzed using FoldX. In silico interaction energies were determined and in 471 silico alanine scan was performed (see Interface Characterization).

472
In addition to generating models for unknown targets, a set of validation models based 473 on the experimentally solved models were generated as well. For each of the seven PDB 474 complex structures, 300 models were generated. Inputs were restricted so that the structure to 475 be modelled would not be used as a complex template, but only the remaining six 476 experimentally derived complex templates. For each validation target, 300 models were 477 selected as described above.  (Table 3).

500
The experimental measurements of the dissociation constant between Ras-effector 501 complexes were collected from several publications (see Table S5, also available as 502 supplementary data). Effectors that were experimentally determined as non-binders were 503 removed from the data set due to uncertainty how to encode them with the varying technical 504 limitations on detectable binding energies at the time of their publication. Also, the models 505 generated for RASSF3 seem to be outliers with regards to FoldX interaction energy (see Figure   506 S5B) and were therefore removed from the prediction. A test set was manually chosen from the 507 available experimental measurements to cover the full spectrum of experimentally determined 508 interaction energies. At the end, this gave us a training set with 51 observations (17 * 3 models) 509 and a testing set with 12 observations (4 * 3 models).

510
Supervised learning was performed in scikit-learn 58 using different combinations of 511 feature selection algorithms and regressors. Feature selection was performed using either f-512 regression or mutual information as implemented by scikit-learn. Regression was evaluated for 513 different algorithms with various hyperparameter spaces (see Table S4). All combination of 514 feature selection and regression were evaluated using Group-K-Fold cross validation within the 515 training data, with k=5 and the groups corresponding to the three structural models chosen for 516 each target. Models were evaluated using R2 score. The best performing combination was 517 trained on the whole training data and evaluated against the test data. Finally, this model was 518 used to predict the binding energies for the complex structures without experimental data. pooled together and then multiplied with a loading factor to take into account the balance 556 between active (GTP bound) and inactive (GDP bound) Ras. This loading factor was set to 0.2 557 for a normal RAS system, and 0.9 for a system hyperactivated by an oncogenic hotspot 558 mutation.

559
The binding affinities were taken from the predicted binding affinities (see