Reprogramming protein kinase substrate specificity through synthetic mutations

Protein kinase specificity is largely imparted through substrate binding pocket motifs. Missense mutations in these regions are frequently associated with human disease, and in some cases can alter substrate specificity. However, current efforts at decoding the influence of mutations on substrate specificity have been focused on disease-associated mutations. Here, we adapted the Proteomic Peptide Library (ProPeL) approach for determining kinase specificity to the task of exploring structure-function relationships in kinase specificity by interrogating the effects of synthetic mutation. We established a specificity model for the wild-type DYRK1A kinase with unprecedented resolution. Using existing crystallographic and sequence homology data, we rationally designed mutations that precisely reprogrammed the DYRK1A kinase at the P+1 position to mimic the substrate preferences of a related kinase, CK II. This study illustrates a new synthetic biological approach to reprogram kinase specificity by design, and a powerful new paradigm to investigate structure-function relationships underpinning kinase substrate specificity.


15
Through their role in the covalent transfer of phosphate from a donor ATP molecule to a 16 phosphoacceptor serine, threonine or tyrosine in a substrate protein, protein kinases in 17 eukaryotes play key roles in cellular signal transduction, and function as gatekeepers for 18 important events such as cell cycle checkpoints, apoptosis, and the immune response (1,2).

19
There are several levels of specificity that allow an individual protein kinase to navigate the 20 daunting number of potential substrates, target the correct subset of proteins, and the correct 21 residues within the appropriate protein for phosphorylation. Beyond temporal and spatial co-22 localization, protein kinases also attain substrate specificity through pattern recognition of 23 distinctive residues proximal to the phosphoacceptor residue (the "P-site"). This pattern is 24 referred to as a kinase specificity motif (or simply "motif"), and is a model of substrates that are 25 compatible with the kinase's substrate binding pocket and can thus be phosphorylated. Motifs 26 are primarily inferred from known physiological substrates (3), and are sometimes modeled as a 27 string of allowable residues, as a position weight matrix, or as a combination of these. The 28 presumed motif is a well-established starting point for in silico prediction of putative 29 substrates (4); however, for nearly all protein kinases, the numbers of known substrates are 30 very few in number resulting in poorly defined, low-resolution motif models.

31
Recently, Creixell and colleagues demonstrated several cancer mutations within kinase 32 domains that modulated catalytic activity, and in some cases altered substrate specificity (5).

33
These results, along with previous work that traced evolutionary changes in substrate specificity 34 to amino acid substitutions (6), and the identification of potential specificity-determining 35 positions (7, 8), suggest that a thorough investigation of amino acid structure-function 36 relationships will be necessary to achieve a principled understanding of kinase specificity. At 37 present, these studies have largely evaluated individual, naturally occurring kinase mutations. In 38 4 this work, we sought to explore the potential for rational reprogramming of kinase substrate 39 specificity through multiple directed synthetic mutations.

40
Here, we used the Proteomic Peptide Library (ProPeL) method (9) to accurately 41 measure the specific motifs of both wild-type and mutated kinases (Fig. 1). Using this approach, 42 first a heterologous kinase of interest is expressed in E. coli. The kinase phosphorylates 43 bacterial proteins consistent with its endogenous kinase specificity motif. The extremely low 44 activity of serine/threonine/tyrosine kinases and phosphatases in E. coli (10) allows for a high 45 signal-to-noise ratio, and the absence of confounding human kinase cascades ensures a direct 46 link between expressed kinase and observed phosphorylation event. After cell lysis and 47 proteolysis, the resulting phosphopeptides are identified by tandem mass spectrometry. This

48
can provide hundreds to thousands of kinase-specific phosphopeptides from which a 49 high-resolution motif model is generated. In this case, the motif model is a position weight matrix 50 with constant residues at one or more positions, which are easily visualized using the 51 pLogo (11) graphical representation. That these bacterial substrates are not physiological is 52 irrelevant -the identified motif can be used to accurately model kinase substrate specificity, and 53 predict human substrates (9). Here, we have repeatedly utilized ProPeL to generate and 54 compare motifs for wildtype and synthetic mutant kinases.

55
We chose the Down's syndrome associated Dual specificity tyrosine-phosphorylation-56 regulated kinase 1A (DYRK1A) to act as a model kinase. Although the number of known human 57 DYRK1A substrates is low (only 31, (12)), the specificity motif for wild-type DYRK1A has been 58 partially characterized as including basophilic determinants, and a preference in the P+1 59 position for proline (13,14), where P+n denotes the nth residue towards the C-terminus of the 60 phosphoacceptor P-site, and P-n denotes the nth residue towards the N-terminus.

61
Mechanistically, the region of the substrate binding pocket spanning the conserved DFG and 62 5 APE residues within the kinase sub-domains VII -VIII is termed the "activation segment", and 63 has been implicated through X-ray crystallography to confer substrate specificity by interacting 64 with the amino acids flanking the substrate's P-site (reviewed in Kannan and Neuwald, 65 2004, (15)). DYRK1A is a member of the CMGC (CDK/MAPK/GSK3/CLK) kinase family, and it 66 has been suggested that the P+1 proline specificity typical of this family is imparted by a 67 hydrogen bond with a CMGC-conserved arginine in the activation segment ( Fig. 2A, (15, 16)).

72
CK II instead codes for lysine (residue K198, (15)). Therefore, we predicted that the mutant 73 DYRK1A R328K (mimicking CK II at the CMGC arginine position) would re-position the lysine side-74 chain ε-amino group to allow for an electrostatic interaction with substrates containing a P+1 75 acidic residue.

76
In this work, we demonstrate the ability to generate kinase specificity models of 77 unprecedented resolution using the ProPeL method. Using existing structural data and 78 sequence homology, we successfully engineered the DYRK1A kinase to exhibit an unnatural 79 substrate specificity, using both individual and multiple directed mutations. Overall, this study 80 illustrates the effects of synthetic activation segment mutations upon substrate specificity, and 81 introduces a new approach for the rational creation of designer kinases.

84
High-resolution determination of wild-type DYRK1A substrate specificity 6 Before attempting to reprogram DYRK1A, we first needed to create a sufficiently high-resolution 86 model of wild-type DYRK1A substrate specificity to serve as a reference. We created bacterial

95
Using the ProPeL method, we identified 6,059 unique DYRK1A phosphorylation sites 96 (3,089 pSer, 2,412 pThr, and 558 pTyr) on bacterial proteins. Note that this data set is an order 97 of magnitude larger than that of the human kinase with the largest number of known natural 98 substrates (CDK2, with 514 substrates (12)). Therefore, DYRK1A WT ProPeL data results in

104
Our data confirms the recent suggestion in the literature that DYRK1A can phosphorylate 105 substrates with alternative residues in the P+1 position (14); beyond a strong P+1 preference for 106 proline, DYRK1A WT also efficiently phosphorylates substrates with hydrophobic residues 107 (particularly valine and alanine), or arginine in the P+1 position. While these residues are 108 statistically significant in the P+1 position for serine P-sites, they fail to occur at statistical 7 significance for threonine P-sites (Fig. S4B). Threonine P-site substrate specificity, therefore, 110 may be more dependent on the P+1 proline than are serine P-site substrates. There is also a 111 strong, previously unreported hydrophobic cluster present at P+2 for both serine and threonine 112 substrates. DYRK1A does not exhibit a significant phosphotyrosine motif (Fig. S4C), however 113 our 558 unique tyrosine phosphorylation sites in E. coli clearly indicate that DYRK1A is capable 114 of phosphorylating tyrosine substrates in trans, and that phosphotyrosine activity is not 115 restricted to autophosphorylation, as previously thought (13, 18).

116
Using the pLogo tool and an internal version of the motif-x program (19, 20), we 117 evaluated dependence between motif positions. It is important to note that while we identify 118 tryptic peptides by tandem mass spectrometry, the kinase-substrate interaction that produced 119 the phosphorylation event occurred in the context of full-length substrate proteins. Therefore, we 120 are able to map tryptic fragments back to the known E. coli proteome and extend sequences 121 beyond the detected tryptic fragment, allowing us to analyze the presence (or absence) of 122 multiple upstream basic residues. This analysis revealed that although there is a strong 123 correlation between P+1 proline and upstream basic residues (Fig. S5A, S5B), there is no 124 significant correlation between multiple upstream basic residues ( Fig. S5C to S5F). Therefore, 125 the optimal motif sequence is actually RxxS*P, and not RRRRxS*P, which is the broad motif 126 without respect to any interdependent substrate residues. Substrates conforming to RxS*P, 127 RxxxS*P, and RxxxxS*P with single arginines are thus also favored, but less so than those with 128 RxxS*P. We note that multiple arginines are actually not favored for substrate recognition, 129 although they do not appear to be clearly disfavored either. The complete list of statistically 130 significant motif classes for DYRK1A (and all kinases within this study) identified by motif-x can 131 be found in Table S2.

8
To verify the specificity of our DYRK1A model amongst other known kinases, we 133 performed an in silico analysis using our high-resolution DYRK1A motif. We scored known 134 human DYRK1A substrates, an equivalent number of substrates randomly selected from the 135 human proteome, as well as known substrates for other kinases from the remaining 136 serine/threonine kinase families (12). Our motif was able to accurately discriminate known 137 DYRK1A substrates from random substrates and also performed well in discriminating against 138 non-DYRK1A kinase substrates (Fig. S6).

141
As introduced earlier, DYRK1A P+1 substrate preference is hypothesized to be imparted by a 142 hydrogen bond between the side-chain nitrogen of the CMGC-conserved arginine (DYRK1A R328 ) 143 and the main-chain oxygen of a non-glycine residue (DYRK1A Q323 ) undergoing torsional strain 144 ( Fig. 2A, (15, 16)). This hydrogen bond should thus neutralize the main-chain oxygen's dipole   background data set. The resulting "differential pLogos" display residues that are over-and 199 underrepresented in the respective mutant DYRK1A substrate pool relative to the wild-type 200 kinase (DYRK1A WT ) substrates, rather than the background proteome.

201
As already noted, DYRK1A Q323G differs from wild-type by favoring proline less than 202 DYRK1A WT at P+1 and shifting P+1 preference to alanine, and in the differential pLogo, that 203 11 shift is abundantly clear (Fig. 3D and Fig. S7D). Next, the differential pLogo for DYRK1A R328K 204 ( Fig. 3E and Fig. S7E) indicates no shift to disfavor proline at P+1 relative to DYRK1A WT , which 205 further confirms the standard pLogo results. A striking feature of the differential pLogo (that is increases slightly but not uniformly in the mutants (note the more significant P-2 arginine in 213 differential pLogos Fig. 3E and Fig. S7D, S7E, but not in Fig. 3D, Fig. 3F nor Fig. S7F). This

214
suggests that in addition to reprogramming the P+1 substrate preference, there may be some

286
Over the course of many mass spectrometry runs, many different phosphoenrichment strategies 287 were evaluated. Ultimately, we concluded that the most efficient sample preparation was a 288 simple TiO 2 enrichment step, as described below. However, data from the other methods were 289 collected and accumulated for the DYRK1A WT pLogo, and as such is summarized below. All 290 mutant DYRK1A data were obtained using simple bulk TiO 2 enrichment.

293
Phosphopeptide enrichment using bulk TiO2 beads (Titansphere 5 µm, GL Sciences) was 294 modified from Kettenbach and Gerber [16]. Beads were conditioned in bulk using Binding Buffer  to be extended to a P-site centered 15mer due to proximity to either the N-or C-terminus was 362 unable to be scored.      388 Table S1. Mass spectrometry data.