Bimodal evolution of Src and Abl kinase substrate specificity revealed using mammalian cell extract as substrate pool

The specificity of phosphorylation by protein kinases is essential to the integrity of biological signal transduction. While peptide sequence specificity for individual kinases has been examined previously, here we explore the evolutionary progression that has led to the modern substrate specificity of two non-receptor tyrosine kinases, Abl and Src. To efficiently determine the substrate specificity of modern and reconstructed ancestral kinases, we developed a method using mammalian cell lysate as the substrate pool, thereby representing the naturally occurring substrate proteins. We find that the oldest tyrosine kinase ancestor was a promiscuous enzyme that evolved through a more specific last common ancestor into a specific human Abl. In contrast, the parallel pathway to human Src involved a loss of substrate specificity, leading to general promiscuity. These results add a new facet to our understanding of the evolution of signaling pathways, with both subfunctionalization and neofunctionalization along the evolutionary trajectories.

on the NTRKs' SH2/SH3 domains, which complex with either phosphotyrosines or poly-48 prolines, respectively [19][20][21][22] . The differences in the active site of NRTK kinase domains 49 results in specificity where only a subset of substrates can bind and thus get 50 phosphorylated. 51 Unlike the serine/threonine family of kinases, NRTKs possess relatively 52 promiscuous active-site peptide specificities with a broad range of potential substrates 5 . 53 In high-throughput substrate screens, catalytic domains of NRTK members 54 phosphorylated hundreds of distinct peptide sequences, highlighting the promiscuity of 55 these kinases. Nevertheless, comparisons within the family show unique sequence 56 preferences and a consequent range of substrate selectivity. For two of the most well-57 studied members, Abl and Src, narrow and broad selectivities are reported, 58 respectively 13,17 . Because substrates bind at the active site in an elongated fashion, the 59 primary peptide sequence largely dictates the description of selectivity 5 . Abl has a clear 60 preference for hydrophobic residues flanking the tyrosine of interest (I/L/V-1, A+1, and 61 P+3) 14,23 . In contrast, little sequence selectivity is observed for Src other than the relatively 3 weaker preferences for a bulky aliphatic residue (I/V/L-1) at the residue preceding the 63 phosphoacceptor, a phenylalanine three residues away from the phosphoacceptor (F+3), 64 and a negatively charged residue on the N-terminal side of the phosphoacceptor (D/E-4, -65 3, -2) 17,24,25 . 66 As Src and Abl are sister clades within the NRTK phylogeny, their distinct 67 sequence preferences beget the question: how did peptide selectivity arise throughout 68 evolution? Herein we answer this question using ancestral sequence reconstruction 69 (ASR) for the catalytic domains. ASR uses modern sequences and an evolutionary model oldest ancestor only has ~65% similarity with Src ( Figure 1B). 75 Src and Abl display lower sequence selectivity than other kinases such as Aurora 76 A and B-RAF serine/threonine kinases 5,24,32 , and, consequently, obtaining an accurate 77 description of the primary sequence determinants for each tyrosine kinase is a greater 78 statistical challenge 33 . Comparison of ancestral and modern kinases requires a 79 comprehensive library of substrates since ancestral kinases likely refined their sequence 80 preferences over time. To ensure biological relevance, the peptide library should ideally 81 be composed of naturally occurring proteins. To construct such a library, we took 82 advantage of the diversity of sequences present in mammalian whole cell lysate (HEK293 83 cell line) 34,35 . After endogenous kinases are covalently inhibited, the proteome of 84 mammalian cells presents a convenient substrate library containing thousands of 85 potential protein substrates. Here we use this comprehensive library to examine the 86 evolution of substrate selectivity in Abl/Src tyrosine kinases. We find that kinase substrate 87 preferences evolved in a complex manner involving two different modes: a promiscuous 88 progenitor specialized into the modern specific Abl, whereas evolution of Src involved 89 relaxing selectivity via a specific ancestral intermediate. We find that kinase substrate 90 preferences evolved in a complex manner involving two different modes: a promiscuous 91 progenitor specialized into the modern specific Abl (subfunctionalization), whereas 92 evolution of Src involved relaxing selectivity via a specific ancestral intermediate 93 In the data set where Src was added to the cell lysate, 8208 unique phosphorylated 125 sequences were identified (Figure 2A). These peptides include the characteristic 126 preferences for large aliphatic residues directly preceding the phosphotyrosine (V-1), 127 negatively charged residues in multiple positions N-terminal to the phosphotyrosine (D/E-128 3, -2), and a glycine following the phosphorylation site (G+1). An inclination for other 129 aliphatic residues preceding the phosphotyrosine (I/P/T-1) is also seen ( Figure 2B). 130 Notably, a preference for proline at the -1 position identified in our data was not observed 131 previously. For Abl, specificity for large aliphatic residues preceding the phosphotyrosine 132 is found (I/V-1), as well as the canonical proline at the +3 position ( Figure 2B). Additionally, 133 Abl exhibits a high preference for proline at the -2 position, which had not been identified 134

previously. 135
We can compare the results obtained here with the substrate specificity that is 136 observed when only natural substrates are considered. PhosphoSitePlus is a database 137 which annotates all known phosphorylation sites in vivo and in vitro that a given kinase 138 phosphorylates 24 . Overall, our results confirm the previously found descriptions for both 139 Src and Abl's substrate specificities based on substrates in the PhosphoSitePlus 140 database, but additionally identify a few new preferences ( Figure 2B, C). Our HEK293 141 lysate has a much larger number of phosphorylated substrates than the PhosphoSitePlus 142 database, which allows us to ascertain the residues dictating phosphorylation specificity 143 with greater accuracy and statistical significance than is possible with the 144 PhosphoSitePlus data ( Figure 2B,C). Some of the differences may be due to the larger 145 number of substrates in our whole cell lysate. However, many of the observed 146 discrepancies likely result from differences in experimental design. PhosphoSitePlus is 147 based on in vivo substrates for the full-length kinase, whereas we are interested in the 148 intrinsic specificity of the kinase domain. In our experiments, we do not have the full-149 length kinases and, therefore, we only find substrates that are selected by the kinase 150 domain itself. In contrast, phosphorylation within the cellular framework, as reported by 151 PhosphoSitePlus, is strongly determined by regulation and co-localization events, and 152 intrinsic kinase domain specificity plays a relatively smaller role. Indeed, Shah et al. 153 studied the specificity of NRTK kinase domains with a high-throughput, cell-surface based 154 6 experiment and found similar discrepancies between PhosphoSitePlus based logos and 155 their experimentally determined sequence determinants 17 . 156

Evolution of specificity between Src and Abl 157
Having established now the accuracy and statistics of our methodology on the 158 modern kinases, we next chose to determine the sequence specificity of three resurrected 159 ancestral kinases ( Figure 1A). Anc-AS and Anc-S1 were previously resurrected for 160 investigating the mechanism of Abl selectivity for Gleevec 29 , while the newly resurrected of Src and Abl, Anc-AS, phosphorylated the least number of substrates (2495), which was 166 comparable to Abl (3073). The relative dearth of substrates for Anc-AS hinted that this 167 ancestor might be more specific than the promiscuous Src, which phosphorylated a total 168 of 8208 substrates. In contrast, the ancestors preceding (Anc-AST) and following (Anc-169 S1) the common ancestor of Src and Abl each phosphorylated a significantly greater (Anc-AS) possessed entropy akin to Abl, albeit with a higher entropy at +3 and lower 182 entropy at positions -2 and +5. The two additional ancestors, Anc-AST and Anc-S1, both 183 exhibited 'hybrid' specificity with higher entropy than Abl, but less than Src. Notably, only 184 Src lacks specificity at the +3 position.

7
To analyze each enzyme's positional specificity in more detail, specificity heat 186 maps were created to illustrate the relative specificity for each amino acid at every position 187 in the 15-residue window (shown as a 20x15 matrix of positional normalized amino acid 188 log probabilities, Figure 3C) 17,38 . Qualitatively, Abl's specificity is apparent from the high-189 intensity signals for both preferred residues (red, P+3 and I-1) and unfavorable residues 190 (blue, S-1, P+1, and D/E+3), while Src has more white space and overall less intense signal. 191 We note that under substrate saturating conditions the enzyme would phosphorylate even 192 the less favorable substrates, which could result in an apparent low specificity. To ensure 193 that the observed promiscuity is not due to substrate saturation, experiments with Src 194 were repeated with a much shorter incubation period of the cell extract and the kinase 195 (10 minutes versus 4 hours). In this control, less phosphorylation was observed ( Figure  196 1 C,D); however, the same primary sequence determinants were found ( AS node (P-2 and A/V-1) or in the final transition to Abl (A+1 and P+3). In contrast, the 204 pathway from the more specific Anc-AS to Src involves a corresponding loss of specificity 205 for each of these residues, with Anc-S1 possessing intermediary preferences ( Figure 4A). 206 The evolution of the few positions favored by Src followed a different trend. The 207 preference for P-1 appears late, only in Src. Such recent evolution is in agreement with 208 the lack of preference for proline at position -1 of its close homolog Lck 17 . Other Src 209 sequence preferences were already present in the oldest ancestor, then lost in Anc-AS 210 and regained in Anc-S1 and Src ( Figure 4B). 211 The increased promiscuity of Src is revealed in overall lower log probability values 212 than that of Abl's specific residues. Despite these differences in substrate specificity, there 213 are multiple positions where all modern and ancestral kinases prefer the same residues, 214 most of which are well-known features of NRTKs (e.g., I-1, D/E-3, and S+2) ( Figure 4C). As 215 these characteristics are observed in all ancestral and modern proteins in our study and 216 8 are common among most NRTKs, they likely represent the oldest features of substrate 217 specificity for the NRTK family. 218

Validation of evolutionary trends of primary sequence determinants via enzyme 219 kinetics 220
Having determined the primary sequence determinants for the ancestral and 221 modern kinases, in vitro peptide enzyme turnover experiments were performed to relate 222 these bulk specificity experiments to quantitative enzymatic parameters. Compressing 223 thousands of substrates into residue-by-residue descriptions is compelling (i.e., 224 preference for P+3), but how these preferences relate to enzymatic properties remains 225 unclear. We therefore measured the Michaelis-Menten kinetics of four distinct peptide 226 substrates with each of the five ancestral and modern kinases. 227

Previous microarray specificity experiments had determined optimized substrates 228
for Src and Abl, known as Srctide and Abltide, respectively 13,18,25 . While both substrates 229 are ideal for their respective modern kinases, each also has residues favored by the 230 opposite kinase, which allow it to be phosphorylated to a certain degree by each kinase. 231 Therefore, we designed modified versions of Srctide and Abltide, called Srctide2 and 232 Abltide2, to test our evolutionary trends ( Figure 5A,B). Srctide2 was intended to be 233 favored by both Src and Anc-S1, with changes made to Srctide to include residues that 234 occurred more frequently in substrates for these two kinases (D-2, A+2, and I+3). Abltide2 235 was designed to be preferred by both Abl and Anc-AS by mutating the alanine at position 236 -2 into a proline. P-2 is favored by all the kinases, except for Src ( Figure 5A, B). 237 As substrates in signaling cascades are generally present at low concentrations in 238 vivo, the kcat/KM likely represents a more fundamentally important parameter than kcat for 239 substrate specificity. As can be seen from the measured Michaelis-Menten curves ( Figure  240 5C), the measured differences in the kinetics corroborate the evolutionary trends found 241 before but suggest additional features for substrate specificities. Starting with a 242 promiscuous Anc-AST, Anc-AS becomes more selective, particularly for substrates with 243 P-2, Ablitide2 (which was identified as a highly preferred residue for Anc-AS and Abl from 244 our phosphoproteomics data, Figure 5B). Moving to Abl, the specificity for Abltide and 245 Abltide2 further increases as seen with high increases in kcat/KM. This is primarily due to 246 the strong preference for P+3 ( Figure 5B), which are both present in Abltide and Abltide2.

9
At higher concentrations of these two well-optimized peptides we observe partial 248 inhibition, due to the negative cooperativity of ATP and peptide substrate found for Abl 39 . 249 Following the evolutionary branch towards Src, Anc-S1 becomes more 250 promiscuous, mainly due to its ability to catalyze Src-preferred substrates in addition to 251 the Abltide substrates. Furthermore, the strong preference of I+3 observed in the 252 proteomics data ( Figure 5B) can be directly recapitulated by the preference for Srctide2 253 ( Figure 5C). The strong preference of Srctide and Srctide2 over the Abltide substrates 254 only appears in Src, primarily due to a complete loss of preference for P+3 leading to poor 255 activity for the Abltide substrates, combined with a subtle preference for F+3. member, Lck (I412 and I450 in Lck). Interestingly, when comparing the Anc-S1 substrate 289 preference to that of Lck, as investigated by Shah et al. 17 , we see a high similarity in the 290 +3 position preferences. Both Anc-S1 and Lck show a strong preference for L+3 and P+3, 291 suggesting that Anc-S1 is more Lck-like, and that the substitution to the less bulky A437 292 in Src causes its preference for F+3 ( Figure 5figure supplement 1). These structural 293 differences between Src and the ancestors explain why the Srctide is less effective at 294 being phosphorylated by other kinases. Anc-S1 was unable to effectively phosphorylate 295 Srctide potentially due to F+3, wherein its substitution to the less bulky isoleucine in 296 Srctide2 is favored by Anc-S1. We elect to steer clear from additional structural 297 explanations for other detected specificities, as these are just coarse models of 298 kinase/substrate complexes, and more collective, long-range effects often underlie such 299 specificity changes. For example, in an appealing study of the evolution of CMGC 300 kinases, Howard et al. identified a key residue for imparting specificity at the +1 position 28 . 301 Tests of their hypothesis via mutations in the corresponding modern kinases resulted in 302 partial changes in specificities. Since the authors were unable to achieve a full swap in 303 specificity, they concluded that there must be additional residues in play that are not 304 readily apparent by looking at the differences in active-site residues. 305 The different trajectories we find in the evolution of Src and Abl substrate specificity promiscuous descendants (ANC-S1 followed by modern Src). In fact, the particular 340 lineage leading from ANC-AST to modern Src appears to involve both mechanisms. 341 In the toxin-antitoxin signaling systems of bacteria, Aakre et al. 57

Specificity Calculations 442
The results from MaxQuant 73 were analyzed with an in-house script written in 443 python. For each kinase, a set of substrate sequences was generated from the 444 phosphorylated peptides found in at least one of the three trials. To generate the set of 445 substrates, first the substrate peptides would be extended or shortened to 7 residues on 446 each side of the phosphorylation site. If the sequence was too close to the beginning or 447 end of a protein it would be rejected immediately. Next, if the sequence is already in the 448 set of substrate sequences or was found in the control experiment, where no kinase was 449 added, it would be rejected. Lastly, the localization probability must be greater than or 450 equal to 70% and the MS intensity must be greater than 0. A background dataset was 451 generated by applying the same rules to tyrosine containing peptides, from samples that 452 were not enriched for phosphorylation. The background sequence logo was generated 453 using WebLogo 75 .

Homology Model of Bound Peptide 471
An initial homology model was created from a crystal structure of Abl bound to an 472 ATP-peptide conjugate (PDB: 2G2I). Using PyRosetta 78 , the initial structure was mutated 473 to the sequence for either Src or Anc-S1. The bound peptide was then mutated to the 474 sequence of Srctide. The backbone of the protein and peptide was set to be constrained 475 before running the Fast Relax protocol using the ref2015 score function.        The background used in their study was all tyrosines in the human proteome. Only P +3 was found to be statistically significant.

Src Preferred
Evolutionary trajectories of sequence specificity show both subfunctionalization and neofunctionalization. The normalized log probability of an amino acid occurring in a pool of substrates demonstrates the evolutionary progression for (A) Abl specific residues, (B) Src specific residues, and (C) residue determinantscommon to all all kinases in this study.

Anc-AS
Anc-S1 Abl Src Evolutionary Trajectory Normalized Log Probability

Anc-AS
Anc-S1  ) was used to build homology models of Src and Anc-S1 bound to Srctide. The bound peptide in PDB 2G2I already contained F +3 , but the rest of the Srctide sequence was modelled in. A Fast Relax protocol was ran in Rosetta with full constraints on the backbone. The zoom-in indicates how the L475I and A437L substitutions in Anc-S1 could prohibit to bind F +3 in the same pocket as Src.