Towards simple kinetic models of functional dynamics for a kinase subfamily

Kinases are ubiquitous enzymes involved in the regulation of critical cellular pathways. However, in silico modelling of the conformational ensembles of these enzymes is difficult due to inherent limitations and the cost of computational approaches. Recent algorithmic advances combined with homology modelling and parallel simulations have enabled researchers to address this computational sampling bottleneck. Here, we present the results of molecular dynamics studies for seven Src family kinase (SFK) members: Fyn, Lyn, Lck, Hck, Fgr, Yes and Blk. We present a sequence invariant extension to Markov state models, which allows us to quantitatively compare the structural ensembles of the seven kinases. Our findings indicate that in the absence of their regulatory partners, SFK members have similar in silico dynamics with active state populations ranging from 4 to 40% and activation timescales in the hundreds of microseconds. Furthermore, we observe several potentially druggable intermediate states, including a pocket next to the adenosine triphosphate binding site that could potentially be targeted via a small-molecule inhibitor. Molecular dynamics simulations for seven members of the Src kinase family have now revealed a conserved step-wise deactivation process, potentially druggable intermediate states, and quantitatively similar thermodynamics and kinetics across the entire family.


Introduction
Protein kinases are important pharmaceutical targets 1,2 due to their regulatory roles in biochemical pathways in eukaryotic cells 3 . They control a slew of signaling cascades by catalyzing the transfer of γ-phosphate from adenosine triphosphate (ATP) to target substrates 4 . Kinases are thought to account for up to 2% of all encoded genes 5 . Given their physiological role, kinase functionality within cells is tightly monitored. 4,6 Mutations, truncation, and overexpression of kinases have been phenotypically linked to various kinds of cancers [7][8][9] , viral infections 10 , and other diseases 11 . The kinetics and thermodynamics of several prototypical kinases such as protein kinase A (PKA) 12 , Src [13][14][15][16] , Abl 6,17,18 , EGFR 19 , and Aurora 7 kinases have been extensively studied by diverse experimental techniques including crystallography 14,20 , small angle X-ray scattering(SAXS) 16 , nuclear magnetic resonance(NMR) 2 .
Members of the Src family of tyrosine kinases (SFK) regulate signal transduction of cellular receptors 9 . They are ubiquitously expressed with several established oncogenic truncations and mutations 9 . For example, Fyn kinase is required for proper cell growth 21 . Aberrant Fyn signaling has also been implicated in the pathogenesis of Alzheimer's disease 11,21 and Dengue virus RNA replication 10 .
While different kinases have been studied via a range of experimental methods, there is a large degree of variance in the kinase sequences studied and the methods employed, which makes it difficult to systemically compare these observables. However, to effectively and precisely target kinases via small molecule inhibitors requires inference of the differences and similarities between their structural, thermodynamics, and kinetic behavior at the ensemble level. Understanding of the SFK's solution structural ensemble is lacking. For example, human Fgr and Blk have no reported crystal structures, only a single crystal structure has been reported for Fyn 22 , and two for the human Lyn kinase domain. On the other hand, Hck and Src each have close to or more than ten structures available. Such heterogeneity makes it difficult to systemically compare SFK despite their similarities. Members of SFK share a common architecture which includes several regulatory domains 20,23 , connected to the catalytic domain via a linker region. The catalytic domain is functionally active 24 in the absence of its regulatory partners, and form primary drug targets 14 . Therefore, we chose to focus our efforts on the catalytic domain. The SFK catalytic domain (Figure 1) is primarily composed of a beta-sheet-rich N-lobe and an alpha helical C-lobe. The lobes are connected via the Hinge (Figure 1, yellow/gold) and the Activation loop (Figure 1, A-loop, red). The A-loop contains a highly conserved Asp-Phe-Gly (DFG) motif that can be either pointing towards the ATP binding site 3 (DFG in) or rotated away (DFG out). Since all of our present simulations were run in the presence of ATP, we only observed dynamics within the DFG in state. The adenine and ribose rings make hydrogen bonds and van der Waals contacts with the Hinge residues while the phosphate moieties chelate two magnesium ions, 3 which in turn are ligated by multiple residues within the binding pocket. The N-lobe also contains a conserved helix called the catalytic helix ( Figure 1 ,C-helix, orange). Sequence and structural alignment 25,26 have further highlighted the presence of two non-contiguous structural motifs called the regulatory (R) and catalytic (C) spines, which are required for stabilizing the active state. Sequence alignment has also revealed a conserved glycine-rich loop, or P-loop ( Figure 1, green), which connects the β1 and β2 strands 14,27 . The loop typically has the Gly-X-Gly-X-X-Gly sequence where X indicates a random amino acid. The P-loop interacts with the phosphate groups on ATP 25 to position the gamma phosphate for catalysis 28 .
While the deactivation pathways for Src 1, 24,29 , and other select kinases 2,6,18,19 have been extensively studied in the last few years, it remains to be seen whether the deactivation pathways proposed within these prototypical systems can be extended to other kinases. Computational studies of SFK dynamics is challenging due to the prohibitive time-cost of running and analyzing milliseconds of atomistic simulations 30 . Although a few large-scale molecular dynamics (MD) studies 1, 15 have been performed, these have generally relied on biased reaction coordinates for mapping the thermodynamic landscape of kinases 1,31-33 . While these methods could potentially allow for faster converged sampling, there is an inherent risk of biasing the simulation results towards unphysical transition paths 34 . However, given recent work in the development of reaction coordinate free Accelerated Molecular Dynamics (AMD) 35,36 combined with the ability to run many unbiased parallel simulations on distributed computed platforms 37,38 , we can now reach relevant timescales 39 to sample the kinase conformational landscape. These parallel trajectories can be stitched together using the Markov state model (MSM) framework to provide additional insight. Figure 1: Starting structures for the Fyn simulations were generated from a Src-based homology model corresponding to the inactive state and a crystal structure of Fyn in its active state 22 . In the inactive state (a), the A-loop(red) is folded forming a salt bridge between Glu314 and Arg413. The C-helix (orange) is rotated outwards, and the Hinge (yellow) is open. In the active state (b), the Hinge closes by forming backbone hydrogen bond between Met345 and Gly348, the Chelix rotates inwards to form new contact between Glu314 and Lys299 and complete the hydrophobic R-spine (purple), and the A-loop is unfolded. Src-like/Inactive Gly348 Asp408 Glu314 Hinge rendered for the sake of clarity. Secondary structure was assigned using dictionary of secondary structure in proteins (DSSP) 40 . The color scheme shown here is extended unto the rest of the paper.
Previous MD studies on other kinases 9,13,16,25,27,32,41,42 point to a large conformational change that requires breaking of a conserved Glu-Lys salt bridge, rotation of the C-helix, folding of the Activation loop (A-loop), and misalignment of regulatory and catalytic spines to form an inactive state. However, the consistency of the transition paths across the kinome and the differences in the accessible energy landscapes for different kinases remain unanswered. Differences in pathways and energy landscapes are central to the rational design of selective type I,II and III inhibitors 2,43 that select for potentially unique intermediate or inactive kinase states rather than the putative active crystal conformation.
To answer these questions within the context of the SFKs, we started by sampling the entire deactivation pathway for Fyn starting from a single crystal structure 22 Figure 3) of aggregate MD data on the Folding@home distributed computing platform 37 .The aggregate simulation time was necessary to accurately capture slow conformational transitions (C-helix rotation, A-loop folding), and their relative kinetics (~ 100 ) and thermodynamics across the 7 sequences.
To analyze this data, we created a multiple sequence extension to MSMs 44-47 ( see Methods for details), utilizing state-of-the art dimensionality reduction 48 , model selection 49 , and model interpretation 50 procedures (Supporting Figures 1-2) from Machine Learning (ML) literature. The multi-sequence extension to MSMs allowed us to explicitly compare the thermodynamics and kinetics across the 7 sequences. Our simulations reveal that in the absence of their regulatory domains, the members of the SFK catalytic subunit have active populations ranging from 4-40% and exchange timescales on the order of hundreds of microseconds. We identified several catalytically inactive intermediate states and describe the transition pathways that separate active from inactive states in an effort to gain insight into the atomistic underpinnings of the SFK kinase domain regulatory mechanism. Furthermore, we hope an enhanced picture that connects structural with functional details can set the stage for inhibitor design strategies that i) can take advantage of intermediate states with druggable pockets, and ii) also allow allosteric inhibition and inducible pocket strategies to be conceived. We will begin our paper by first delineating the common activation pathway across the sub-family, focusing on Fyn, before delving into the rest of the sub-family. We are choosing to focus on single sequence because our individual analysis indicated that several members had qualitatively similar activation dynamics.

Results: ATP bound Fyn's samples a kinome wide "DFG in" conformational landscape
In order to quantify the structural heterogeneity within our ATP bound Fyn MD dataset, we compared our results ( Figure 2) against the kinome-wide classifications scheme of Möbitz 51 . In that work, the author classified several thousand kinase crystal structures based upon the positioning of conserved DFG (Asp408-Gly410) and C-helix Glu314 residues. Our simulations ( Figure 2a) predict that the ATP bound "DFG in" Fyn kinase samples a kinome wide distribution ( Figure 2b). However, the "DFG out" (Figure 2 bottom panel) state is inaccessible due to the presence of ATP. This result, combined with our previous work on the apo BTK 42 catalytic domain, suggests that protein kinases likely possess a conserved conformational landscape, modulated via the exact sequence, posttranslational modifications, and the presence of nucleotides, binding partners and substrates. It also shows how MD is increasingly capable of predicting pharmacologically relevant unseen states that have only been previously observed in related members of the super-family.

Simulations suggest a step-wise activation pathway across the Src sub-family
In order to understand the universality of the kinase activation pathway, we began by simulating the atomistic dynamics of Fyn kinase's catalytic domain. We connected the active Fyn PDB (RCSB id 2DQ7) 22 to a homology model 52  accelerated molecular dynamics (AMD) to find seed structures followed by an aggregate 1500 microseconds of sampling on the Folding@home 37 distributed computing platform. Since neither AMD nor MD requires pre-specifying a reaction coordinate 35,53 , we do not expect homology modeling of the inactive state to bias our dynamics. The resulting trajectories were analyzed by time-structure based independent component analysis (tICA) 48 and MSMs 44,45 . tICA is a dimensionality reduction technique designed to find a set of minimally overlapping linear combinations of the input features, such as dihedrals or inter-residue distances, that de-correlate the slowest 48 . The dominant components, called tICs, are one dimensional projections of large scale and slow protein conformational change (Supporting Figure 4). tICA is used as a dimensionality reduction step in MSMs for defining a kinetically motivated clustering metric. The clustering metric defines a state space which is then used as input for the MSM. MSMs are models of dynamic processes defined over a set of protein states and the transition probabilities connecting them 44,45 . In particular, the intermediate state (Figure 3a I1, Supporting Figure 7) observed within our ensemble is different from either of the starting active (2DQ7) and inactive homology model states. In this state, the Hinge is closed, the C-helix is rotated outwards but the Aloop is relatively unstructured and samples a range of conformations 1 . The outward rotation of the C-helix opens up a pocket between the ATP binding site and the C-helix (Supporting Figure 10) that presents itself as a putatively druggable site for the design novel allosteric Fyn inhibitors (Supporting Figure 10) -not too dissimilar from previous observations that were made in the context of Src 1 .
The activation process is accompanied by the alignment of two hydrophobic spines 25,26 , the Catalytic spine (C-spine; Val285, Ala297, Leu397, Ile396, Val398, Leu350, Leu455, Leu459, and ATP) and the Regulatory spine (R-spine; Leu329, Met318, Phe409, His388). These noncontiguous structural motifs are aligned in the kinase active state. The C-spine is broken by the movement of ATP's adenine and ribose rings towards the protein core 15 while the Rspine is disrupted by movement of the C-helix away from the core (Figure 1 Figure 9 for the complete distribution). The active state population within our model is consistent with the hypothesis that the kinase regulatory domains (SH2 and SH3) are required to regulate SFK 13,27 kinases, and their removal leads to a higher active population. The appreciable active population (Figure 3) within our model also helps to potentially explain the experimental phosphorylation patterns of SFKs. SFKs require phosphorylation of a key tyrosine residue (Tyr420) in the activation loop (A-loop) for activation 27,55 . While we were unable to find phosphorylation data for Fyn for a direct comparison to our model, phosphorylation was shown to have a minimal effect upon Src, featuring an increase in activity of only about 1.5-2.5x [58][59][60] . In contrast, phosphorylation increases the active population in non-SFK extracellular signal regulated kinase 2 (ERK2) by three orders of magnitude 61 . We predict Fyn to behave similarly to Src given their 85% sequence identity and the results of our un-phosphorylated simulations. This increase in activity is consistent with a thermodynamic landscape featuring an appreciably active minimum that is robust to phosphorylation (characterized by population shifts of less than an order of magnitude or 1 kcal/mol). These small shifts in active population are expected to result in relatively minor increases in observed chemical activity.
To gain insight into Fyn's activation kinetics, we calculated the distribution for the median time required to "travel" to a set of active states (Figure 3, Supporting Figure 7) starting from the Src-like inactive states (Supporting Figure 7). According to our model, the median value for Fyn's mean first passage time to either of its active or inactive states is on the hundreds of microsecond timescale (Figure 4b, Supporting Figure 9). Such exceptionally high timescales make it difficult, even with millisecond scale simulations, to provide reliable kinetic estimates. Accordingly, we can only conservatively estimate that the exchange timescales are on the micro to millisecond timescale. Similar to previous experimental work on protein kinase A 62 , ERK2 61 , p38 62 , and Src 63 , the concerted conformational change predicted by our simulations is likely to display NMR signature. In particular, our MSM is highly consistent with the Src study 63 where the authors experimentally showed that Src's catalytic domain exists in at least two conformational states in solution. The major and minor conformational state exchange on the micro to millisecond timescales and are related via hinge motion. Similar to their experimental data, our simulations also predict a two state hinge behavior with the hinge formed states having a higher free energy (<1-2kcal/mol) relative to the hinge broken states. The hinge formed states can be further divided into the active and intermediate. We predict that similar experiments on Fyn can be used to probe the multi-state behavior for the the salient features of our activation model, including allosteric coupling of the hinge residues to the rest of the kinase.

High sequence identity leads to similar in-silico functional dynamics
In order to build a single model for Src sub-family, we ran new simulations for several SFK members. To that end, we extracted several hundred structures from our Fyn simulation set, built homology models of the human sequences for six SFK members (Lyn, Lck, Hck, Fgr, Yes, and Blk), and docked ATP + magnesium ions into each ATP binding site (Figure 1). This large pool of configurations allowed us to seed simulations from high and low free energy regions in the kinase state space, enabling faster convergence of sampling 44,64 . We furthermore validated the active and inactive homology models by comparing them to known crystallographic states (Supporting Figure 11). Based on their high sequence similarity with Fyn (Supporting Note 1, Figure 4a), we conjectured that these systems would be expected to display similar dynamics.
A set of MD simulations was carried out for each of these kinase systems, resulting in six sets of trajectories with an aggregate sampling of 4 ms. A separate MSM was built for each of the six kinase domains, utilizing an identical set of clustering definition. This single domain decomposition allowed us to explicitly compare the thermodynamics and kinetics of Fyn, Lyn, Lck, Hck, Fgr, Yes, and Blk, the results of which are distilled into Figure 4.
Collectively, within the 1kcal/mol restraints of modern force fields, our simulations suggest that the free energy differences between active and inactive ensembles fall within 1-2 kcal/mol across all sequences (Figure 4a and Supporting figure 7-8), with active state populations ranging from 2 /.1 * % for Lyn to 13 -+3 % for Hck to 39 *+ ,-% for Fyn. As before, the sub-and superscript represent 25 th and 75 th percentiles as calculated via 200 rounds of bootstrapping (see Figure 4 and Supporting Figure 9 for the complete distribution). The active states of both Fyn and Hck kinase domains are characterized by elevated populations, which is qualitatively consistent with experimental evidence that both fulllength kinases display higher specific activities when compared to Lyn 65 .
A kinetics analysis for all six kinases (Figure 4b) shows that similar to Fyn, all six SFK members display activation timescales (Figure 4b, blue boxes) on the order of milliseconds and deactivation timescales (Figure 4b, green boxes) on the order of 100-1000 µs. While the distributions show overlap, for most of the simulated sequences, deactivation was faster than activation except for Fyn and Hck (Figure 4b). The asymmetric timescales are consistent with the presence of a slightly higher free-energy active state (Figure 4a). The higher free energy state indicates that phosphorylation might be required for complete kinase activation. The slower Fyn and Hck deactivation is consistent with their higher observed experimental activities. Therefore, while we can predict that SFK will exchange on the hundred microsecond to millisecond timescale, more precise kinetic predictions will likely require simulations on the order of tens to hundreds of milliseconds.  Figure 7 for the exact definition of the active and inactive states, and Supporting Figure 9 for the future statistical validation. ATP and Mg have not been rendered for sake of clarity.

Conclusions:
Unbiased, multi-millisecond MD dynamics of seven SFK members presented here have provided us with detailed atomistic insight into their collective activation mechanisms, revealing the prominence of Hinge motion for activation. Our simulations indicate that SFK sample a conserved free energy landscape which has been previously only observed at the kinome level 51,66 . Furthermore, the activation process further correlates with dampening of dynamics within the P-loop, in addition to large scale conformational changes involving the C-helix and A-loop. Our Fyn MSM suggests that the catalytic domain, in the absence of its We present an extension to traditional MSMs built on top of a set of sequence invariant feature space. The joint phase space-decomposition allows for a comparative analysis of the kinetics and thermodynamics across the entire sub-family. Our calculations predict that the active state is within 1-2 kcal/mol of the inactive ensemble, and that activation is slower than deactivation across the entire sub-family. The transitions between active and inactive states proceed through a number of intermediate conformations that offer new opportunities for the design of kinase subfamily specific inhibitors.
While we have chosen to look at the conserved dynamics within the current dataset, an interesting complementary extension might involve looking at the dynamics of the nonconserved residues in an effort to elucidate differences within the SFK family. More intriguingly, our methodology raises opportunities for creating a single kinome-wide model, allowing researches to compare and contrast the thermodynamic and kinetic effects of sequence changes, oncogenic mutations, and PTMs to a given baseline. Such unbiased comparative structural modeling can aid in the creation of new, specific and potent kinase inhibitors or perhaps even creating personalized medicines conditioned upon the patients' genome.

Methods: Initial Setup and homology modeling:
The crystallographic coordinates of Fyn kinase were downloaded from the protein databank (id: 2DQ7) 22 . The staurosporine drug and crystallographic waters were deleted from the pdb file leaving just the protein coordinates. The structure was mutated to the human WT sequence using Modeller (ver. 9.1). VMD (ver. 1.9) 67 was used to add adenosine tri phosphate(ATP) to the structures using the Src 1 structure as the template. Modeller 52 was further used to build the Fyn's inactive structure (with ATP) using the inactive c-Src structure(id: 2SRC) 1,13,16 as a template. Hydrogen atoms were added to both the active and inactive structure using the t-leap module contained with the Amber tools (ver. 14) suit. [68][69][70] Both the systems were solvated in a water box with 10 Å padding on all side. Chloride and sodium ions were added to neutralize the charge and set up the final ionic concentration to 150mM. The Amber99sb-ildn 71 force field was used to model protein dynamics in conjunction with the TIP3P 72 water model and Amber ATP and Mg parameters 73 .

Simulation Details:
All production runs for the initial accelerated dynamics 35 and regular molecular dynamics simulations were run at a pressure of 1 atm and temperature of 300K. The Langevin integrator with a friction coefficient of 1/ps and a 2fs time step were used. A Monte Carlo barostat with an interval of 25 frames was used to maintain the pressure at 1ATM. Frames were saved every 100ps unless otherwise specified. Long-range electrostatics were dealt with using the Particle mesh Ewald 74 algorithm with a 8Å cutoff for the accelerated molecular dynamics simulation and a 10Å cutoff for the production runs on Folding@home 37 . All simulations were equilibrated for 1ns before production runs unless specified otherwise.

Accelerated Molecular Dynamics Simulations:
To obtain a large number of diverse initial starting structures, accelerated molecular dynamics (AMD) 35 was run on both the active and the inactive starting structures until partial deactivation and activation were observed respectively (~6 μs) as monitored via progress on the two dimensional reaction coordinate defined using the partial unfolding of the A-loop and swinging of the C-helix. For the accelerated dynamics, a boost was applied to the dihedrals of the system with an energy threshold of 3900-4500kcal/mol and an alpha parameter value of 160-210. 1005 structures were randomly pulled from the resulting trajectories to seed simulation on the Folding@Home 37 distributed computing platform.

Production MD Simulations:
The 1005 randomly pulled structures from the AMD trajectories were re-solvated and minimized as described above using the Amber Tools suite 68 . The resulting Amber topology and input coordinates were read into OpenMM( ver. 6.3-7.1) 75 for production runs on FAH. 1.53ms of combined dynamics were obtained in three stages (project id 9101-9102, 9162) consisting of approximately equal sampling. The later stages involved sampling from the lowly populated states.

SFK homology modeling and Production MD simulation:
For Lyn, Lck, Hck, Fgr, Yes, and Blk kinase simulations, we sampled 1000 structures from the Fyn kinase dataset and homology modeled the human sequence onto those structures. ATP, Mg ions, water and counter ions were added as before. We used the same protein force field and water models, and equilibrated the models using the protocol described above. For production MD simulations, we again employed a similar protocol but saved trajectory frames every 200ps to save disk space.

Markov state model:
Before model building, all 7 datasets were subsampled to 1ns to allow for faster analysis.
Building an MSM requires identification of metastable kinetically close conformational states. This splitting of the accessible phase space is followed by counting the transitions between those states as observed in our trajectories at a Markovian (memory free) lag time. After sampling, a total of 13343 trajectories (across 7 sequences) were vectorized using a common set of 1081 protein contacts. This was done by calculating all pair wise contacts for residues W264, E265 , G280, F282, G283, V285, W286, V296, K299, M306,  F311, L312, E314, A315, M318, K319, L321, L326, L329, A331, E336, H388, R389, D390,  L391, D408, F409, G410, L411, A412, R413, I415, D417, E419, Y420, G425, K427, F428,  K431, E436, G441, K446, S451, T461, P466, C491, and P511. The residue numbering here corresponds to Fyn's and we used equivalent contact distances across all proteins. The Supporting Information contains a sequence alignment that was used for this purpose. Our method is completely generic and can be extended to many proteins. For new set of systems, we recommend first performing a sequence alignment and then using a conserved set of residues to define the feature basis. Care must be taken if some sequences have insertions or deletions since that would preclude the use of certain backbone dihedrals as features. We note that the open source software MSMBuilder implements a range of utility functions that make this task easier.
The featurized trajectories were then transformed using time structure independent component analysis (tICA) 48 . tICA seeks to find a set of linear combinations of features that de-correlate the slowest (at a certain lag time) whilst minimizing their overlap. The transformed dataset was clustered using the mini batch K-means. It is worth noting that we built a single tICA and K-Means model for all 7 sequences allowing us to quantitatively compare thermodynamics and kinetics. We use a tICA lagtime of 300ns to reduce the dimensionality of the data from 1081 contacts to 5tICs, which were also scaled using kinetic mapping 76 . For the clustering step, we used the mini batch K-Means algorithm with the number of clusters set to 300. Based upon previous work 1, 38 and the convergence of the implied timescales plot for 300 state model, we chose a lag time of 30ns (Supporting Figure  1) for the MSM. The Markov transition matrix was fit via Maximum likelihood estimation (MLE) with reversibility and ergodicity constraints. This procedure discarded less than 1% of the data for any given sequence indicating converged sampling. See Supporting Figure 1 and Figure 9 for further validation.

Error analysis:
To obtain error bars for the MFPTs, timescales, and equilibrium populations, 200 rounds of bootstrapping were performed for each simulated protein sequence. The bootstrapping was performed over the trajectories so that for each bootstrap sample, we randomly picked N trajectories (with replacement) where N is the total number of trajectories for that sequence. We kept the final state labeling, but fit a new MSM to this resampled trajectory set. This leads to a new estimate of the thermodynamics and kinetics. Repeating this 200 times allows us to estimate error bars for the predicted kinetics and thermodynamics. We also validated our results by restricting each trajectory to between 75 and 100% of its final length (Supporting Figure 9) and re-computing all reported thermodynamic and kinetic statistics and found them to be consistent.
The trajectories were featurized and analyzed using the MDTraj (ver. 1.7-v1.8) 77 package while tICA dimensionality reduction and Markov modeling were performed using MSMbuilder (ver. 3.5+) 78 . Most of the analysis was performed within the IPython scientific environment 79 with extensive use of the MSMExplorer, matplotlib (ver. 1.0+) 80 , and scikitlearn libraries 81 . All protein images were generated using visual molecular dynamics (VMD, ver. 1.9+) 67 and all protein surfaces were rendered using SURF 82 as implemented in VMD.

Model interpretation:
The models were primarily analyzed using techniques laid out in previous papers 83,50 . To further query the model, we inferred the dominant deactivation pathways using Transition path theory (TPT) 84,85 . TPT requires specifying the starting and ending states to find the interconnecting pathways. All states with a formed Hinge(1 st tIC value > 1.5) and Glu314-Lys299 bridge (< 4.45Å) were considered active. For the inactive states, we picked all states whose all heavy atom RMSD was with 4 Å of the homology modeled Src-like inactive state. We also queried the model by generating long trajectories from the MSM. Starting from the inactive state, we propagated the dynamics at a lag time of 30ns using a random number generator and the Markovian transition model as our sampling scheme. At each "timestep", we randomly picked a frame assigned to that state to report the instantaneous observable. Given correct parameterization of the underlying transition matrix, this method produces dynamics comparable to running a single long MD trajectory. The transition pathways were analyzed using a combination of inspection using visual molecular dynamics (VMD) 67 , random forest classifiers 50 , and tICA projections 48,83 . The slowly de-correlating tICA vectors were used to find the degrees of freedom involved in the dynamics 83 . Spectral decomposition of the MSM transition matrix was used to estimate the equilibrium populations and dynamical processes connecting those Markov states. The second eigenvector corresponding to the slowest dynamical process correlates very highly with the active to inactive process (Supporting Figure 2) with an associated timescale of 100μs (Supporting Figure 1), consistent with previous theoretical work 1 .