An expanded classification of active, inactive, and druggable RAS conformations

RAS (KRAS, NRAS, and HRAS) proteins have widespread command of cellular circuitry and are high-priority drug targets in cancers and other diseases. Effectively targeting RAS proteins requires an exact understanding of their active, inactive, and druggable conformations, and the structural impact of mutations. Here we define an expanded classification of RAS conformations by clustering all 699 available human KRAS, NRAS, and HRAS structures in the Protein Data Bank (PDB) by the arrangement of their catalytic switch 1 (SW1) and switch 2 (SW2) loops. This enabled us to clearly define the geometry of closely related RAS conformations, many of which were not previously described. We determined the catalytic impact of the most common RAS mutations and identified several novel druggable RAS conformations. Our study expands the topography of characterized RAS conformations and will help inform future structure-guided RAS drug design.

After removing SW1 and SW2 loops with incomplete modeling or poor electron density, we arrived at 487 SW1 (70.0% of 699 structures) and 412 SW2 (58.9% of 699 structures) loops for conformational clustering ( Supplementary Fig. 2, column 2). In our analysis, we used the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm 43 with a distance metric that locates the maximum backbone dihedral difference upon pairwise comparison of loop residues (previously implemented in refs. [44][45][46][47] for other proteins). DBSCAN finds major clusters and removes outliers (labeling them as "noise"). We first separated RAS structures by their nucleotide state (0P, 2P, and 3P) and subsequently clustered the conformations of SW1 and SW2 for each nucleotide state using DBSCAN. We then assigned a small number of poorly or incompletely modeled loops to the clusters obtained from DBSCAN by using a nearest neighbors (NN) approach ( Supplementary Fig. 2).
The results for the SW1 and SW2 conformational clustering (with NN assignments added in) are displayed as Ramachandran maps per residue of each cluster compared to noise in Fig. 1h, i. We identified three SW1 and nine SW2 conformations, each of which was found across multiple RAS isoforms, PDB entries, and crystal forms (CFs; entries with the same space group and very similar unit cell dimensions and angles) (Supplementary Table  1, including the mean dihedral distance and loop α carbon atom root-mean-square deviation for each conformation). Overall, we were able to conformationally cluster 81% (N=395 out of 487) of SW1 and 58.8% (N=242 of 412) of SW2 loops that passed the completeness and electron density checks ( Supplementary Fig. 2, columns 4 versus 3).

SW1 and SW2 Conformational Clusters.
For clarity and brevity in our classification, we named each SW1 and SW2 conformational cluster by its loop name and nucleotide state and then added further designations as needed ( Supplementary Fig. 3). The correlation of the SW1 and SW2 conformations to each other are provided in Table 1 and reported throughout the text. Fig. 2 | SW1 and SW2 conformational clusters. a, SW1 conformations. In the SW1.0P conformation, the central Y32 residue in SW1 is ~12-13 Å from the active site. In the SW1.2P conformation, SW1 is "closed" and interacts with the nucleotide through the backbone atoms of residues 28-32. In the GTP-bound conformation, SW1.3P, further interactions are made with the nucleotide involving the side chains of residues Y32 and T35. b, Y32(OH):3P(O1G) distance distribution for SW1.3P substates. c, SW1.3P substates within HRAS bound to GppNHp. SW2 conformations within, d, 0P, e, 2P, and, f, 3P states. In the SW2.0P-GEF conformation, residues 58-60 of SW2 are pulled towards the nucleotide site, and the side chains of residues Q61 and Y71 form an intra-SW2 hydrogen bond (not displayed), which is not seen in other SW2 conformations. In all SW2.2P conformations, except for SW.2P-SP12, Y71 is exposed to the solvent; the opposite trend is observed in all SW2.3P conformations where Y71 is buried in the hydrophobic core of the protein, except in SW2.3P-T where it is exposed.
For the SW1 conformations, there was a one-to-one correspondence with the nucleotide state, and we, therefore, labeled these conformations SW1.0P, SW1.2P, and SW1.3P (Fig. 1h, Supplementary Fig. 3a, and Supplementary Table 1). These SW1 conformations are visualized in Fig. 2a and can be differentiated by position of residue Y32 in SW1, which was the original method for classifying these conformations 2 . Two SW1 conformations were removed from our clustering based on their infrequency of occurrence in the PDB, but are displayed in Supplementary Fig. 4; these include structures labeled by some authors as the GTP-bound "state 1" 27,29,30 (N=6) and a "non-canonical" GDP-bound "β' or Mg-free" 12,48 (N=4). Further, we do not specify whether SW1.3P is GTP-bound state 1 or state 2, since multiple structures in this cluster have been called by both of these conformations in the literature (namely 5P21, 1CTQ, 1JAH, and 1WQ1; see refs. 25,33,49 ).
For the SW2 conformations, we extended the nomenclature based on nucleotide state because of the greater complexity of patterns observed (Fig. 1i, Supplementary Fig. 3b, and Supplementary Table 1). There were nine SW2 conformations in total, including the previously described R state (SW2.3P-R) and T state (SW2.3P-T), and the SW2 conformation found in nucleotide-free structures (which we named SW2.0P-GEF for its binding to GEFs); and six previously unclassified druggable conformations, which we named by their associated bound protein (only SW2.2P-Binder) or inhibitor site (SP12 or SP2) and, in some cases, an indicator of cluster size order (A or B). The SW2 conformations are visualized by nucleotide state in Fig. 2d-f with residue Y71 displayed, because we later demonstrate that the position of this residue relates to RAS druggability.
As expected, the nucleotide-free conformations, SW1.0P and SW2.0P-GEF, only exist in structures bound to the GEF CDC25 domain of SOS1, which has SW1 held open by a SOS1 region called the "helical hairpin" 6 ( Fig.  3a, b and Table 2). All three SW1.3P substates bind to signaling effectors ( Fig. 3c and Table 2), and we confirmed that only the SW2.3P-R conformation is found as well in these complexes ( Fig. 3d and Table 2). Surprisingly, we discovered that the GEF REM domain of SOS1 preferably associates with SW1.3P-WaterHB ( Fig. 3e and Table 2) and binds to the SW2.3P-R conformation ( Fig. 3f and Table 2). Furthermore, we found that the GAP NF1 interacts with both SW1.3P-WaterHB (Fig. 3g, left and Table 2) and SW1.3P-NoHB (Fig. 3g, right and Table 2), with SW1.3P-WaterHB precluding the catalytic GAP "arginine finger" 7 from the active site and SW1.3P-NoHB enabling its direct interaction with GTP. This observation was similarly made in two previous studies, but not connected to the previously described GTP-bound (our SW1.3P) substates: one which identified these substates in RAS-NF1 complexes and called them the ground and transition states, respectively 10 ; and another which found them across monomeric RAS structures and called them Tyr32in and Tyr32out, respectively 50 .   Fig. 5a and Supplementary Table 2). Altogether, the SW1.3P structures most commonly form the α4α5 homodimer (52%; N=73 of 140 dimers), with SW1.2P forming this complex but less commonly (14%; N=20 of 140 dimers), and the remainder found in structures assigned to noise (34%; N=47 of 140 dimers) (Supplementary Fig. 5a and Supplementary Table 2). Both GTP-bound and GDPbound α4α5 homodimers were also observed in NMR experiments (PDB: 6W4E and 6W4F, respectively) 19 . Surprisingly, we found that both active, SW2.3P-R and inactive, SW2.3P-T are the most common SW2 conformations (at approximately equal rates) that form the α4α5 homodimer (Supplementary Fig. 5b and Supplementary Table 2), contrary to the expectation that only active, GTP-bound RAS would form this complex. Of note, the SW2.3P-R conformation co-occurs with all SW1.3P substates, while 95.5% of the SW2.3P-T conformations (N=21 of 22 structures in the cluster) are found in conjunction with the hydrolytically incompetent SW1.3P-DirectHB substate (Table 1). This SW1-SW2 pairing suggests that RAS proteins may first occupy SW2.3P-T to prevent inactivation, then α4α5 homodimerize, and subsequently transition to SW2.3P-R to bind to signaling effectors [32][33][34][35] . SW1 and SW2 conformations possessing druggable pockets. RAS proteins are notoriously difficult to drug, because of their conformational variability and lack of deep surface pockets 3,4 . However, through NMR experiments and other techniques, druggable pockets have been identified in certain RAS conformations [55][56][57][58] . Therefore, we analyzed the available RAS structures in the PDB for druggable pockets with the Fpocket software 59 , to associate the presence of inhibitor-bound and -unbound pockets with the identified SW1 and SW2 conformations.
We first obtained pocket descriptors for observed inhibitor-bound sites on RAS structures, including their pocket volumes and druggability scores. Out of the 699 available structures, 177 were bound to inhibitors: 48% at the SW1/SW2 pocket (SP12) site (N=85), 46% at the SW2 pocket (SP2) site (N=81), and the remaining 6% at other or multiple sites, which included the SP12 and SP2 sites as well as the base or center of the nucleotide site, a site near residue P110, or an allosteric site at the C-terminal end of the protein. We subsequently focused our analysis on the most targeted pockets, SP12 and SP2. With Fpocket, we were able to detect and calculate pocket descriptors for 93% of SP12 and 90% of SP2 inhibitor-bound sites ( Fig. 4a, b). We then used Fpocket to predict potentially druggable pockets in inhibitor unbound structures and classified which of these predictions were found at the SP12 or SP2 sites based on similarity of their residue contacts. In all, we identified 203 SP12 and 215 SP2 inhibitor-unbound sites, which translated to more than 70% of these sites existing without inhibitors present in complex.
Structural impact of G12D and G12V mutations on intrinsic hydrolysis. Mutations have been shown to impact RAS conformational preferences. Therefore, we sought to use our classification of RAS structures in the PDB to test a hypothesis proposed by Marcus and Mattos regarding the structural impact G12D and G12V mutations 32 . Through observing the placement of residue Y32 in a few WT, G12D, and G12V HRAS structures, Marcus and Mattos proposed that G12V mutations sterically push Y32 into a hydrolytically incompetent position  by, a, WT, b, G12D, and, c, G12V forms.

Discussion
Since the first HRAS structure was experimentally solved in 1990 26,70 , researchers have focused on characterizing the RAS conformational landscape through examining the structural arrangements of their SW1 and SW2 loops. In this study, we used an extended dataset (699 KRAS, NRAS, and HRAS structures), and an approach that differs from previous studies (which analyzed 121 entries at most) 40,41 , to create a data-driven classification of three SW1 and nine SW2 RAS conformations. This approach can be used to automatically conformationally classify and annotate the molecular contents of additional RAS structures as they are experimentally solved and provides a clear and consistent method for comparing WT and mutated structures across various biological and inhibitory contexts. To facilitate future analyses of RAS structures, we have created a web database presenting our analysis of RAS structures in the PDB, which includes a page for classifying user inputted structures (http://dunbrack.fccc.edu/rascore/). One uncertainty faced in defining a RAS conformational classification was identifying which GTP-bound SW1 conformations are state 1 and state 2. These SW1 conformations were discovered in the early 2000s with the observation of two peaks in the 31 P NMR spectra for the GTP α and γ phosphates 52 . Later studies found that mutations in residues Y32 and T35, as well as the common G12V mutation, cause a shift to the state 1associated peaks, while other mutations, such as G12D, and the presence of the signaling effector, RAF1, cause a shift to state 2 71-73 . Ten years after the state 1 and state 2 conformations were described, researchers experimentally solved a potential state 1 structure using a T35S mutant construct 29 , and later for WT, G12V, Q61L, and other mutations 27,30 . However, the previously identified state 1 structures were too infrequently occurring in our analysis to unambiguously name them as the actual state 1 conformation. Moreover, as alluded to by Mattos and colleagues 33,49 , the identified GTP-bound substates (our SW1.3P-WaterHB, SW1.3P-DirectHB, and SW1.3P-NoHB), could also explain the split state 1 and state 2 peaks in NMR spectra. Considering the NMR studies described above, and that we found G12D and G12V mutated structures prefer the SW1.3P-WaterHB and SW1.3P-DirectHB substates, respectively, we propose that the SW1.3P-WaterHB substate is state 2 and that state 1 is either the SW1.3P-DirectHB or SW1.3P-NoHB substates, the other non-clustered state 1 structures, SW1-disordered structures, or some mixture of these structural configurations.
In contrast to other studies 40,41 , we associated each SW1 and SW2 conformation with RAS interactions involving proteins and small molecule inhibitors. This analysis helped confirm previously held hypotheses about RAS conformations in a large dataset and uncovered some new hidden trends. For example, it has been hypothesized that RAS preferentially binds to signaling effector proteins and the GEF REM domain of SOS1 when its SW1 conformation is "GTP-bound" (our SW1.3P) and to signaling effectors when the SW2 conformation is in the "R state" (our SW2.3P-R) 15,34,35 . We found these hypotheses to be true, but further discovered that all SW1.3P substates (based on their hydrogen bonding of Y32 to GTP) and the SW2.3P-R conformation binds to signaling effectors while the GEF REM domain of SOS1 preferentially binds to the SW1.3P-WaterHB substate and SW2.3P-R conformations. Under the same SW1.3P substate classification, we clarified that the previously described GAP-binding conformations, namely the ground and transition states 10 and Tyr32in and Tyr32out states 50 , are actually the SW1.3P-WaterHB and SW1.3P-NoHB substates, respectively. Similarly, we defined that the inactive SW2 conformations "T state" [32][33][34] and "state 2* 35 are in actuality identical structural arrangements and that both belong to our SW2.3P-T conformation. Importantly, by comparing all inactive SW2.3P-T structures with their active counterparts in cluster SW2.3P-R, we confirmed the hypothesis that, unlike SW2.3P-R, SW2.3P-T does not bind to signaling effector proteins. We also found that both these conformations can form the RAS α4α5 homodimer complex required for activation of dimeric signaling effectors, such as RAF1. Furthermore, we confirmed that both GTP-bound and GDP-bound SW1 structures (here SW1.3P and SW1.2P) can form the α4α5 homodimer, as previously shown through NMR experiments of KRAS 19 , but we demonstrated as well that all three SW1.3P substates and all RAS isoforms can complex as an α4α5 homodimer at least in crystals.
A major value of this study is the definition of a comprehensive set of RAS conformations that are known targets for small molecule or designed protein inhibitors. Six out of seven of these druggable SW2 conformations are newly characterized (all except for SW2.3P-R); these include: GTP-bound SW2.3P-SP12-A and SW2.3P-SP12-B, and GDP-bound SW2.2P-SP12, SW2.2P-SP2-A, SW2.2P-SP2-B, and SW2.2P-Binder. We associated each of these conformations with their preference for binding inhibitors with certain chemistries, which is information researchers can use to select appropriate structural templates for structure-guided drug design. One overall finding from our analysis of these druggable conformations was that all of them exist in the absence of inhibitors, indicating that these structural arrangements may occur naturally within a biological context and are not solely the product of drug binding, which was an uncertainty prior to this study. In addition, we found that the SP2 inhibitor site is only present in structures with Y71 exposed to solvent, while the SP12 inhibitors site appears in structures with Y71 buried into the protein core. Although this trend for Y71 was previously described for a select few SP12 inhibitor-bound structures 8,38 , the consistency of this finding among many inhibitor-bound structures suggests it is an essential determinant of SP2 and SP12 druggability.
While this study has expanded our understanding of RAS conformations, it only marks the beginning of mapping the RAS conformational landscape. We hope that the RAS conformational classification system described here will be paired with further structure-activity relationship data to create machine learning models for RAS drug discovery. The pharmaceutical industry has over six times as many RAS inhibitor-bound structures as there are available in the PDB 74 , and analysis of these structures using the conformational clustering approach developed here would likely help identify further druggable RAS conformations. Most importantly, having all RAS structures in the PDB consistently annotated for their molecular contents and conformation will enable simple utilization of this growing structural dataset for informing future RAS drug discovery and studies examining RAS mutations.

Methods
In this study, we conformationally clustered the available human KRAS, NRAS, and HRAS structures in the PDB and determined structural features associated with each conformation. Here we describe the methods for preparing the available RAS structures, annotating their molecular contents, clustering SW1 and SW2 conformations, and performing further interrogative structural analyses.
Software Utilized. All analyses were performed using various packages in Python with versions provided in our code in GitHub (https://github.com/mitch-parker/rascore). BioPython and PyMOL were used for structure and sequence calculations and visualizations. Pandas and Numpy were used for dataset preparation and querying. SciKit Learn was used for clustering. RDKit was used for chemical searching and visualizations. Matplotlib and Seaborn were used for plotting.
Preparing Available RAS Structures. PDB entries containing human KRAS (all are 4B isoform), NRAS, and HRAS were identified by SWISS-PROT 75 identifier in the pdbaa file (December 1 2021) in the PISCES webserver 76 . For each PDB entry, the asymmetric unit and all biological assemblies were downloaded and renumbered according to UniProt 77 scheme using PDBrenum 78 . In addition, electron density of individual atom (EDIA) scores (a per atom measure of model quality) 79 for each PDB entry were downloaded from the ProteinPlus webserver 80 . Since some PDB entries contain multiple RAS polypeptide chains, we separated each RAS chain of the asymmetric unit (only first model for NMR) with its corresponding bound ligands and/or proteins. Ligands were labeled as biological, pharmacological, or chemical compounds, metal ions, residue modifications, or membrane components using a custom dictionary prepared in considering annotations from BioLiP 81 and FireDB 82 , which is included in our code in GitHub. Subsequently, ligands were assigned to a RAS chain if they (1) had the same chain label (only biological and chemical compounds, metal ions, or residue modifications); (2) had more than 5 residue contacts within 4 Å of the chain (only pharmacological compounds); or (3) linked the chain to a nanodisc (only membrane components). Proteins were assigned to a RAS chain if it had more than 5 Cβ contacts within 12 Å and 1 atom contact within 5 Å or if it had more than 5 atom contacts within 5 Å of the chain, except for the protein component of nanodiscs which were included irrespective of the number of contacts. Bound protein assignments were checked against the biological assemblies and discrepancies were corrected. We treated each RAS chain as a unique RAS structure in subsequent analyses.
Annotating RAS Structures. We annotated RAS structures by various molecular contents, many of which are not reported in PDB entries. Mutation status was identified by comparison of the sequence in the PDB entry and human UniProt 77 sequences for KRAS (P01116-1 and -2), NRAS (P01111), and HRAS (P01112). Pharmacological compounds were further classified by binding site based on the presence of one or more predefined residue contacts within 4 Å of the structure: 12, 96, or 99 for SP2; 5, 39, or 54 for SP12; 85, 118, and 119 for the base of the nucleotide site; 29, 30, and 32 for the center the nucleotide site; 106, 108, and 110 for the P110 site; and 4, 49, and 164 for the allosteric site. The SMILES strings used in searching for inhibitor chemistries (performed with RDKit) are included as a Supplementary Note. Bound proteins were labeled by Pfam 83 based on SWISS-PROT 75 identifier and further classified as an effector, (Pfams: RBD, RA, PI3K_rbd), GEF (Pfam: RasGEF), or GAP (Pfam: RasGAP), or other. Any bound protein without a SWISS-PROT identifier was classified as a designed protein "binders" or RAS linked to a nanodisc. To identify the α4α5 homodimer, we used the protocol employed in the ProtCID web server 53 , requiring an average Q-score greater than 0.3 to the α4α5 homodimer found in PDB: 3K8Y. In addition, each PDB entry was assigned to a crystal form using are method previously described and implemented in ProtCID 53,84 .
Conformationally Clustering SW1 and SW2. RAS structures were first separated by nucleotide state: 0P, 2P, or 3P. Within each nucleotide state, we clustered the completely modeled SW1 and SW2 loops possessing carbonyl (O) atom EDIA scores greater than 0.4 (i.e., well modeled) using the DBSCAN algorithm with a dihedral-based distance metric. DBSCAN finds major clusters and removes outliers 43 , which is ideal for conformationally clustering structural datasets since they usually contain several outliers that were poorly modeled or solved under rare experimental conditions. Variations of our conformational clustering algorithm have been described in our previous works [44][45][46][47] . In this study, we used a distance metric that locates the maximum angular difference (d) upon pairwise comparison of the backbone dihedral angle values phi (φ), psi (ψ), and omega (ω) for residues 1 through n of compared loops i versus j, where d(θi, θj) = 2(1 -cos(θj -θi)): Dmax(i,j) = max(d(φi 1 , φj 1 ), d(ψi 1 , ψj 1 ), d(ωi 1 , ωj 1 ) … d(φi n , φj n ), d(ψi n , ψj n ), d(ωi n , ωj n )) For SW1, we calculated Dmax for residues [25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40]. For SW2, we calculated Dmax for residues 56-76 and included chi1 (χ 1 ) of residue 71 in the calculation, since its position indicates SW2 flexibility 85,86 and because we found that it affects the SP2 and SP12 sites. These residue ranges were selected because they both quantitatively and qualitatively encompass the extent of SW1 and SW2 conformational variability across RAS structures in the PDB. Like a previous study 44 , we ran DBSCAN across a grid of parameters and applied a set of quality control filters to generate a robust consensus clustering. This procedure was necessary since a single setting of DBSCAN does not identify all possible clusters due to their varying shapes and sizes. We elaborate on our conformational clustering pipeline as a Supplementary Note.
Detecting Hydrogen Bonds. Conformational substates were defined by the hydrogen bond (HB) interaction between the Y32 OH atom and γ phosphate of GTP or its analogs: direct (DirectHB) or water-mediate (Water HB). DirectHB and WaterHB cutoffs were based on a previous analysis of protein structures in the PDB 87 . DirectHB was defined as a 2.0-3.2 Å donor-acceptor distance with 90-180° carbon-donor-acceptor and carbonacceptor-donor angles. WaterHB was defined as 2.0-3.0 Å donor-water and acceptor-water distances with 80-140° carbon-water-acceptor and carbon-water-donor angles. In the absence of a bridging-water, WaterHB was defined as a 2.8-5.6 Å donor-acceptor distance, which was arrived at using the Law of Cosines with the previously specified cutoffs and examining the distance distributions of WaterHB across RAS structures in the PDB.
Analyzing Druggable Pockets. Pocket descriptors (i.e., pocket volumes and druggability scores) for inhibitor bound SP12 and SP2 sites were calculated with the Fpocket software 59 . Fpocket was then used to predict pockets in inhibitor unbound structures. Predicted pockets were assigned to the SP12 or SP2 sites if the average Simpson similarity of their residue contacts to bound inhibitors at those respective sites was greater than 0.6.

Data Availability
All data are available as a static table in Supplementary Data 1. In addition, we have created a web database called "rascore" that presents a continually updated dataset of annotated and conformationally classified RAS structures from the PDB. Further, the rascore database includes a page for conformationally classifying user inputted structures. The link to our rascore database and accompanying code for conformationally classifying RAS structures via the command line can be found in GitHub (https://github.com/mitch-parker/rascore).

Code Availability
All open-source code can be obtained from GitHub (https://github.com/mitch-parker/rascore) under the MIT license. Software used in this study are included in Methods with more extensive details provided in GitHub, such as package versions and computational environment setup.

Supplementary Notes
Conformationally Clustering SW1 and SW2. We clustered well-modeled SW1 and SW2 loops using the DBSCAN algorithm with a dihedral-based distance metric. DBSCAN finds major clusters and removes outliers 1 . Similar to a previous study 2 , we ran DBSCAN across a grid of parameters and applied a set of quality control filters to generate a robust consensus clustering. This procedure was necessary since a single setting of DBSCAN cannot identify all possible clusters due to their varying shapes, densities, and sizes. Below, we list and explain the steps involved in our conformational clustering pipeline: 1. Run DBSCAN on Well Modeled Loops. Different parameters of DBSCAN can produce slightly divergent clustering results with merging, splitting, or disappearance of clusters. Therefore, we ran DBSCAN across a grid of parameters D=0.1-1.6 for ε (~20-80°): with steps of 0.1 and minimum samples 3-15 with steps of 1 and, following, took the consensus of these clustering results. The ε range covers the smallest regional subdivision of the Ramachandran map that residues with similar dihedrals can belong to 3 . We found this range to be ideal for conformational clustering in a past study 2 . 2. Find Passing Clusters Across Runs. Not all DBSCAN clustering runs produce ideal separation of clusters. Therefore, we used two quality control filters to remove non-optimal clusters across runs before performing the consensus procedure: (a) mean silhouette score and (b) maximum dihedral distance: a. Silhouette score is a measure incorporating the similarity of an object to members of its own cluster (cohesion) and difference from other clusters (separation) 4 . The score can range from -1 (poor match to cluster) to 1 (good match to cluster). We removed clusters with mean silhouette score less than 0.6. b. Some larger DBSCAN parameters can merge similar conformational clusters that are separate conformations. Therefore, removed clusters with a maximum dihedral distance greater than D=3.75 (~150°). Clusters with points this far apart tend to be a mix of two Ramachandran regions and therefore unsuitable for our purposes. 3. Get Union of Similar Clusters Across Runs. Upon removal of poor clusters through quality filters, we found the union of clusters across runs with a Simpson similarity score greater than 0.9. The Simpson similarity score is the number of points two clusters have in common divided by the size of the smaller cluster. In most cases, in our DBSCAN runs, a cluster at one value of ε is often a subset of another cluster (Simpson score of 1.0) at larger ε, as outlying points of the cluster are incorporated. 4. Merge Clusters with Close Loop Cα-RMSD. The maximum dihedral metric is highly sensitive to peptide flips that an author may accidentally structurally model but does not signify a different conformation. This often happens at low resolution when the electron density can be modeled in two different ways. Therefore, we merged clusters with a loop Cα-RMSD less than 1.2 Å, which we found to be an appropriate cutoff through trial and error, by testing a range of 0.5-2.0 Å and visualizing the outputted results. 5. Prune Cluster Members. At certain DBSCAN settings, some bordering structures can find their way into clusters that visually appear to be outliers. In consequence, we created a step for pruning cluster members with a nearest neighbor dihedral distance greater than D=0.45 (~40°), which covers half of the smallest regional subdivision of the Ramachandran map, and loop Cα-RMSD greater than 1.2 Å. 6. Remove Small Clusters. Since a wide range of DBSCAN parameters are traversed during clustering, some small conformational clusters can be identified that a) either have no functional or binding corollaries or b) are duplicates from a single study or set of experimental conditions. To filter out these unmeaningful conformations, we removed structures possessing less than seven chains or found in less than five PDB entries. 7. Classify Poorly Modeled Loops. Since the poorly modeled loops were not included in clustering, we used a reversal of the pruning approach to assign them to clusters. We only classified poorly modeled loops if their NN dihedral distance was less than 0.45 (~40°) or loop Cα-RMSD was less than 1.2 Å in reference to a single conformational cluster. This approach can be used to conformationally classify the additional RAS structures that will be experimentally solved and deposited to the PDB, or ones produced through computational simulations.
Inhibitor Chemistry Search. We used the following canonical SMILES strings retrieved from PubChem 5 in searching for inhibitor chemistry with RDKit: Pocket volumes and druggablity scores for SW1 conformations with SP12 inhibitor sites that are bound, a and b, respectively, and unbound, c and d, respectively; for SW1 conformations with SP2 inhibitor sites that are bound, e and f, respectively, and unbound, g and h, respectively; for SW2 conformations with SP12 inhibitor sites that are bound, i and j, respectively, and unbound, k and l, respectively; for SW2 conformations with SP2 inhibitor sites that are bound, m and n, respectively, and unbound, o and p, respectively. Only conformations with more than three structures with pockets are included

Supplementary Data
Supplementary Data 1 | Available human KRAS, NRAS, and HRAS structures in the PDB annotated by SW1 and SW2 conformations and molecular contents.