STK32A is a dual-specificity AGC kinase with a preference for acidic substrates

The STK32 kinases are a small subfamily of three uncharacterised serine/threonine kinases from the AGC kinase family whose functional role is so far unknown. Here, we analyse the consensus peptide sequence for STK32A phosphorylation, showing that STK32A is directed towards acidic substrate sequences and exhibits dual-specificity for serine/threonine and tyrosine residues. A crystal structure of STK32A reveals an overall structure typical of the AGC protein kinase family but with significant and unique features including an altered binding mode of the hydrophobic motif to the N-terminal lobe of the kinase domain, and a novel alpha-helix in between the turn motif and the hydrophobic motif. The crystal structure combined with phylogenetic analysis reveals the evolutionary conservation of the acidic substrate preference. In vitro binding assays demonstrated that the STK32 kinases bind significant numbers of clinically used kinase inhibitors.


Introduction
The protein kinases STK32A, STK32B and STK32C, also known as YANK1, YANK2 and YANK3 (Yet Another Novel Kinase), are adjacent to the well-studied Aurora kinases on the kinase phylogenetic tree [1] but by contrast very little is known about their functions. Currently the available published data largely comprises genetic associations, cellular localisation studies, and mouse knockout data.
STK32A RNA is highly expressed in brain and endocrine tissues, while STK32A protein is localized to the centrosome [2]. STK32B is highly expressed in kidney and localized to microtubules, while STK32C protein is ubiquitously expressed [2]. STK32A knockout mice show increased monocytes and circulating LDL cholesterol, while STK32C knockout mice show abnormal locomotor behaviour [3].
Despite their poor functional annotation, single nucleotide polymorphisms (SNPs) point to important physiological functions of these kinases. For instance, polymorphisms in STK32A were linked to coeliac disease [4,5]. A methylation site within the STK32A gene was linked to smoking [6], while in a genome-wide association study (GWAS) the PPP2R2B-STK32A-DPYSL3 locus was found to be a susceptibility locus for lung cancer [7]. In an RNAseq study identifying genes regulated by the Wnt3a/β-catenin pathway, STK32A was slightly upregulated (1.8x) in response to Wnt3 treatment in hippocampal neurons [8]. In one study STK32B protein levels were found to be upregulated in more aggressive breast cancers [9], while another study found them to be downregulated in oral squamous cell carcinoma [10]. Driver mutations in various cancers in both STK32A and STK32B have been identified [11], including a G35E missense mutation on the glycine-rich phosphate-binding loop of STK32B detected in melanoma and an S89F missense mutation of STK32A, also in melanoma.
Of significant interest as the only study looking at chemical inhibition of these kinases, by examining the effects of a large number of compounds on the THP-1 cell line, STK32C was identified as a potential anti-target for small molecule kinase inhibitor development, due to a possible relationship with cytotoxicity [12]. STK32B and STK32C have both been linked in genetic studies to several neurological disorders.
In a genome-wide association study (GWAS) for essential tremor in a European population a single nucleotide polymorphism (SNP) in STK32B was associated with essential tremor [13], an observation replicated in a Chinese population [14]. The European study also demonstrated increased expression of STK32B in the cerebellar cortex in essential tremor patients [13]. Some patients with Ellis-van Creveld syndrome have a deletion of an entire 520-kb region including STK32B in addition to EVC and EVC2; it is unknown to what, if any, extent the loss of STK32B contributes to the syndrome [15]. In the same genomic region of chromosome 4p16, an association between SNPs in STK32B and EVC with on-syndromic oral clefts was observed [16]. In a study of monozygotic twins a differentially methylated position in STK32C was found to be associated with adolescent depression [17]. An SNP within STK32C was also linked to risk of psychiatric disorders in Down Syndrome patients [18].
As well as being largely unstudied, the human STK32s are not close in sequence to other human kinases. For example, STK32A has 70% and 67% sequence identity over the kinase domain to STK32B and STK32C respectively, but only 36% sequence identity to the next closest homologues (PRKACG or RSK2). To initiate the functional characterisation of these proteins we determined the X-ray crystal structure of STK32A and a solution-phase envelope using small angle X-ray scattering (SAXS), identified a consensus substrate motif for STK32A, and analysed the binding of a set of kinase inhibitors, including some that have been used in the clinic.

Conservation and evolution of STK32A, STK32B and STK32C
Sequence analysis identified homologues of STK32A in fungi, including Magnaporthe oryzae (rice blast fungus), Neurospora crassa (red bread mould) and Schizosaccharomyces pombe (fission yeast), indicating it is conserved across opisthokonts ( Figure 1A). STK32B likely evolved more recently, with homologues indicated in bilateral animals including nematodes (Caenorhabditis elegans), jawless vertebrates (such as Eptatretus burgeri (hagfish) and Petromyzon marinus (Lamprey)) and some insects (for example Drosophila melanogaster (fruit fly) and Acyrthosiphon pisum (pea aphid)). STK32C appears to be the newest gene, present only in bony vertebrates. The classification of Manning et al. placed the STK32s adjacent to the Aurora kinases on the phylogenetic tree [1], but unlike Aurora A/B/C the STK32 kinases are genuine AGC kinases, having the full Cterminal extension typical of the AGC family. This C-terminal region contains the hydrophobic motif (φ-X-X-φ-S/T-φ, φ=hydrophobic), an important regulatory element known to stabilise the active conformation of AGC kinases. The serine or threonine residue within the canonical hydrophobic motif is a site of activating phosphorylation in most AGC kinases, however in the STK32 family this site corresponds to an asparagine in vertebrates and in some bilateral animals such as Drosophila, or an aspartate in fungi ( Figure 1B). N-terminal to the hydrophobic motif is the turn motif which is well-conserved from humans to nematodes ( Figure 1B). In fungi, however, the potential phosphorylatable serine residue in the turn motif (highlighted in bold in Figure 1B) is exchanged for an acidic glutamate residue, which may mimic the phosphorylated residue, and in fission yeast (S.p. PPK33) the entire C-terminal extension is truncated prior to the turn motif. Between the turn motif and the hydrophobic motif secondary structure prediction indicates the presence of a long α-helical element that is unique to the STK32s and well conserved across the entire family from fungi to humans (residues 346-370 in human STK32A), which here we have termed the "HF motif helix" ( Figure 1B).
As well as these AGC kinase-specific motifs, the STK32 kinases contain the typical sequence motifs of active protein kinases, such as a glycine-rich phosphate-binding loop ("G-rich loop"), the HRD (His-Arg-Asp) motif vital for phosphate transfer, and the APE (Ala-Pro-Glu) motif that secures the C-terminal end of the activation segment. The DFG (Asp-Phe-Gly) motif, required for proper positioning of ATP for phosphate transfer, is also present in the STK32 family, although the glycine is substituted for a less conformationally flexible asparagine residue which is remarkably conserved.
While red bread mould (N.c. NCU07062) shows relatively high sequence conservation to the STK32 family across the C-terminal half of the kinase domain, the turn motif and hydrophobic motif, the N-terminus is truncated so that the αC helix and glycine-rich phosphate-binding loop are completely absent suggesting that kinase activity may have been lost. The glycine-rich loop is also absent from some vertebrates, for example Tasmanian devil (S. harrisii STK32A/B/C), fugu (T. rubripes STK32C) and mainland tiger snake (N. scutatus STK32A/C), which otherwise share high similarity (60-80%) with the canonical human STK32 sequences. Human STK32A shares 69% and 65% sequence identity with STK32B and STK32C, respectively, and the key motifs are highly conserved. A notable difference is the presence of a 67-residue N-terminal extension in STK32C isoform 1 that is rich in proline, alanine, arginine and serine, therefore likely to be highly flexible, which is not present in any human STK32A or STK32B isoform. Several of the serine residues in this extension are predicted to be phospho-sites. Several other isoforms of both human STK32B and STK32C have been described with N-terminal truncations that remove the glycine-rich loop, similar to the homologues described above, and it may be that these isoforms are incapable of ATP binding and catalysis.
One feature that appears to be specific to STK32 kinases over the rest of the human kinome is a highly basic region immediately following the turn motif. In vertebrates this region (residues 324-333 in human STK32A) consists of the conserved sequence HKKKKRLA(K/R)(X/-)(K/R), whereas in more primitive organisms this region is less tightly conserved but generally contains a higher proportion of arginine and fewer lysine residues ( Figure 1B). The function of this basic patch is currently unknown.

STK32A is an active dual-specificity Ser/Thr and Tyr kinase that preferentially phosphorylates acidic substrates
Most AGC kinases have been reported to preferentially phosphorylate substrates containing basic residues such as arginine or lysine N-terminal to the serine/threonine phosphorylation site, and even share multiple common substrates [19]. Several crystal structures of AGC kinases in complex with their substrate peptides exist, for example AKT2 in complex with a GSK3 peptide motif [20], therefore it is possible to predict the residues in STK32A that may be involved in substrate binding through sequence homology ( Figure 1B, blue stars). The residues in the AKT2 substrate binding cleft that directly contact the substrate (annotated in blue in Figure 1B) appear to be relatively well conserved in STK32A/B/C and therefore based on the sequence alone it would have been a reasonable hypothesis that the STK32 family also preferentially phosphorylates basic substrates, similar to other AGC kinases. STK32A kinase domain protein was expressed in insect cells using baculoviruses. Expression yields were good and the protein was efficiently purified by nickel-affinity chromatography followed by size exclusion chromatography (SEC). The protein eluted from the SEC column at the expected volume for a monomer. The resultant protein was auto-phosphorylated following expression, but lambda phosphatase treatment enabled the production of unphosphorylated protein for use in assays and structural studies (Figure 2A).
The ability of purified, dephosphorylated, STK32A kinase domain to auto-phosphorylate in vitro was assessed by mass spectrometry. Auto-phosphorylation occurred more rapidly in the presence of manganese than magnesium; after one hour incubation with 5 mM ATP, STK32A was autophosphorylated on up to five residues in the presence of Mn 2+ , compared to a single site in the presence of Mg 2+ ( Figure 2B). The sites of phosphorylation were mapped following digestion of the protein using elastase which indicated that STK32A auto-phosphorylates at multiple sites: S33, Y49, T163, S227, T229, S230, S231, S320, S354 and T383 (Supplementary Figure 1). These sites include phosphorylation on the typical AGC kinase turn motif site (S320) and close to the hydrophobic motif at S354. A cluster of serine and threonine residues predicted to be situated on the αF-αG loop, a structural element usually important for interaction with substrates, was also found to be autophosphorylated in vitro (S227, T229, S230, S231).
We incubated purified STK32A kinase domain with peptides derived from proteolysed and stringently dephosphorylated HeLa cell lysates and analysed the resulting phosphorylation patterns by mass spectrometry [21]. This revealed a set of consensus substrate recognition sequences for STK32A ( Figure 2C). The data revealed that STK32A phosphorylates serine or threonine residues, and has a general preference for acidic residues (aspartate or glutamate) at position P-4, P+1, P+2 and P+3, and no other significant preferences.
Based on these consensus substrate recognition motifs, two synthetic peptides were designed to test the ability of STK32A to phosphorylate a specific substrate with an acidic residue in positions P-4, P+1, P+2 and P+3. These peptides were designated "STK32tide-1, corresponding to the amino acid sequence ASEALVSEEDAD, and "STK32tide-2", corresponding to ADELLSEVEAKK. After incubation of 250 µM of each peptide for 18 hours with STK32A and ATP, species indicative of the phosphorylated STK32tide-1/2 sequences were observed by LC-MS ( Figure 2D and Supplementary Figure 1A). Given that phosphorylated products were only detectable after a relatively long incubation time, the phosphorylation of these peptides by STK32A appears to be quite inefficient, but nevertheless confirms that STK32A is able to phosphorylate in vitro even peptides with multiple acidic residues across positions -4 and +1 to +3. Similar to the observation that auto-phosphorylation proceeds faster in the presence of manganese than magnesium, the peptides were also phosphorylated more rapidly in the presence of manganese (data not shown), substantiating a general preference for manganese as the counter-ion.
Next, a small in-house library of kinase substrate peptides was tested against STK32A at a concentration of 500 µM. CDC25Ctide, designed to correspond to residues 193-205 of M-phase inducer phosphatase 3 (also known as dual specificity phosphatase CDC25C), covering the known phosphorylation site of PLK1 and PLK3 (ELMEFSLKDQEAK), was found to be weakly phosphorylated by STK32A after a long incubation time ( Supplementary Figure 2A), similar to the designed STK32tide peptides, and contains an acidic residue in the P+3 position. Caseintide, a synthetic peptide corresponding to the acidic sequence KKEKFQSEEQQQ and designed to mimic part of bovine betacasein, was phosphorylated by STK32A after only 2 hours, and contains acidic residues in positions P-4, P+1 and P+2 ( Figure 2E, bottom). Also after 2 hours a species corresponding to phosphorylated p38MAPKtide was detected ( Figure 2E, top). The sequence of p38MAPKtide corresponds to part of the activation loop of p38α (MAPK14) (AGAGLARHTDDEMTGYVA) and contains three potential phosphorylation sites (corresponding to p38α residues T175, T180 and Y182). Based on the consensus sequence data we hypothesized that the two most likely STK32A phosphorylation sites in this peptide corresponded to T175 (containing acidic residues in positions P+1, P+2 and P+3) and T180 (containing an acidic residue in position P-4). Interestingly, however, mass spectrometry analysis revealed that sites T180 and Y182 were phosphorylated by STK32A, and there was no evidence of T175 phosphorylation (Supplementary Figure 2B and Supplementary Table 1). Several peptides were detected that contained dual phosphorylation both at the T180 and Y182 sites. This indicated that STK32A is a dual specificity serine/threonine and tyrosine kinase.
After seeing the ability of STK32A to phosphorylate both caseintide and the p38α synthetic peptides, we tested whether STK32A was also capable of phosphorylating the corresponding fulllength proteins in vitro. A sample of dephosphorylated casein extracted from bovine milk, containing an approximate ratio of 5:3:1 of beta-casein to alpha-S1-casein to kappa-casein, was purchased. After 2 hours incubation of the casein extract in the presence of STK32A, ATP and a mixture of manganese/magnesium chloride, up to two additions of 80 Da mass to beta-casein was evident, indicating STK32A was capable of phosphorylating full-length beta-casein ( Figure 2F).
Conversely, STK32A did not appear to be capable of phosphorylating full-length p38 in vitro.
Recombinantly expressed p38α (MAPK14) was mixed with STK32A, ATP and either manganese chloride or a mixture of manganese and magnesium chloride, both in the presence and absence of the specific p38α inhibitor Skepinone-L. After 24 hours in the absence of Skepinone-L, there was evidence of phosphorylation of p38α, but not when Skepinone-L was added, indicating that the observed activity was likely due to auto-phosphorylation by p38α itself rather than STK32A (Supplementary Figure 2C). STK32A was capable of auto-phosphorylation in the presence of Skepinone-L, confirming that Skepinone-L did not affect the activity of STK32A (Supplementary Figure 2D). Therefore, although STK32A was able to phosphorylate the synthetic peptide derived from the p38α activation loop, full-length p38α did not appear to be a substrate for STK32A in vitro.
The crystal structure of human STK32A demonstrates canonical AGC-kinase as well as

STK32-specific structural features
Crystallisation trials were conducted using STK32A (residues P9-D390) pre-mixed with the nonspecific kinase inhibitor staurosporine. It was necessary to use a relatively high STK32A concentration of 27 mg/mL (0.6 mM) to obtain optimal crystals. STK32A:staurosporine crystallised in space group P32 with six STK32A molecules in the asymmetric unit, each bound to one molecule of staurosporine. Each of the six molecules in the asymmetric unit superimposed well, with minor differences around the flexible C-terminal extension turn motif (described in more detail below).
Data collection and refinement statistics are shown in Table 1. In the final model, refined to a resolution of 2.29 Å, residues 15-373 were resolved in the electron density, except for a flexible loop containing the basic patch (residues 323-332) which was disordered.
The model revealed that STK32A shares canonical structural features of protein kinases, with an N-terminal lobe consisting of five beta-strands and two alpha-helices, and a mostly helical Cterminal lobe. The ATP binding site, occupied by the inhibitor staurosporine is between the N-lobe and the C-lobe ( Figure 3A). STK32A also retains the canonical feature of AGC kinases, a C-terminal extension that wraps around the N-terminal kinase lobe, secured by the interaction of the hydrophobic motif (HF motif) with the groove next to the αC-helix, stabilising the αC-helix in an active conformation ( Figure 3B). Figure 1B). The lack of the canonical phosphorylation site and the presence of the HF bound to the N-lobe showed that unlike some other AGC kinases STK32A/B/C do not require an activating phosphorylation on the C-terminal tail to form an active conformation. During the activation process of many AGC kinases such as SGKs, S6Ks, RSKs or PKCs, the phosphorylated hydrophobic motif binds to the PIF pocket for PDK1, promoting phosphorylation of the activation loop by PDK1 [19]; it is currently unknown whether PDK1 is involved in the activation of STK32A/B/C. The binding mode of the HF motif to the N-lobe of STK32A is significantly different to that of other AGC kinases. Compared to the binding of example canonical HF motifs such as PKCβ2 or PKCι the binding positions of the key hydrophobic interactions are altered. In PKCβ2 F656 is the first hydrophobic residue of the HF motif and slots into a conserved pocket on the αC helix to help position this important element into the active conformation ( Figure 3B). The first HF motif hydrophobic residue in STK32A is F359, which however binds in the equivalent location to PKCβ2 F659, the second hydrophobic residue of the HF motif. Phosphorylated S660 in PKCβ2 points away from the kinase N-lobe ( Figure 3B) and in STK32A this position is occupied by I361 ( Figure 3B).
Situated just after the HF motif are a series of residues that form polar interactions with αA and αC to secure the C-terminal region across the top of the N-lobe. STK32A K366 is well conserved across the STK32 family and forms a hydrogen bond to D17 from the N-terminus. Nearby, R369 interacts with the backbone carbonyl of E14 and R364 forms a hydrogen bond with the backbone of L78 and with the side chain of E79, further securing this region in position ( Figure 3C).
Two conformations of the C-terminal tail around the turn motif are evident in the crystal structure: half of the molecules in the asymmetric unit (Chains D,E,F) favour one conformation where residues P309-S320 form an ordered α-helix that packs against the N-lobe β-sheet formed of strands β1/β2/β3, and the other molecules (Chains A,B,C) form a loop that packs against the αC and HF-motif helices and activation loop of a nearby symmetry-related monomer (Supplementary Figure   3A). Part of the basic turn motif, comprising multiple lysine residues, is unresolved in the electron density (a.a. 323-333) in all chains, indicating the high flexibility of this region. Other typical AGC kinases such as PKCβ2 (PDB 2I0E) and PKCι (PDB 1ZRZ) contain a positively charged binding pocket for the phosphorylated turn motif, normally required for full kinase activity [22], but this appears to be lacking in STK32A, which instead contains acidic or neutral residues at the equivalent site ( Figure   3D). It is possible that STK32s do not contain a true turn motif, although we have mapped one autophosphorylation site of STK32A to S320 at the usual turn motif site (Supplementary Figure 1). It seems plausible that the phosphorylated turn motif binds in an alternative binding pocket in STK32s, for example the basic patch immediately on top of the glycine-rich loop, adjacent to the normal turn motif binding site in other AGC kinases. There is a possibility that the basic sequence (HKKKKRLAKK) in STK32s immediately following the traditional turn motif ( Figure 1) interacts with the acidic patch observed on the N-lobe ( Figure 3D) as a further means of securing the C-terminus in position, however this region was disordered in the crystal structure and, therefore, appeared to be highly flexible. Upon turn motif phosphorylation in STK32s it is possible that this region would adopt a more ordered conformation.
At the ATP binding site, the relatively small gatekeeper residue (V100) affords larger than average capacity at the back of the binding pocket, potentially explaining the ability of STK32A to bind inhibitors designed for gatekeeper mutant kinases (see inhibitor binding analysis below).
Otherwise STK32A maintains the typical set of hydrophobic residues in β1 and β2 of the N-lobe required to bind the adenosine moiety of ATP (I29, G30, V37 and A50), as well as the highly conserved residues at the hinge and C-lobe (L103, I152, L153, L154) ( Figure 3E) that form the base of the adenosine binding pocket in the structures of most active protein kinases [23]. Highly conserved polar residues that stabilise the γ-phosphate of ATP and divalent cation are also conserved in STK32A (K52, E71, D146, K148, N151, D164) ( Figure 3E). Other conserved residues found throughout the structural core of active-conformation kinases, including the AGC kinases AKT2 and PKA [23] are also retained in STK32A, although larger variation is evident compared to AKT2/PKA (Supplementary Table 3).
Interestingly, the key sites that have been shown to be responsible for the interaction of AKT2 with peptide substrates seem to be conserved in STK32A (Figure 1), at first glance indicating that STK32A should be capable of interacting with basic substrates similar to AKT2 and other AGC kinases, however inspection of the residues surrounding the substrate binding groove in the crystal structure indicates a potential explanation for the observed preference for acidic substrates:  Figure 3C) and therefore all may bind acidic substrates in a similar manner to STK32A.

Solution phase analysis of STK32A
We performed a Small Angle X-ray Scattering (SAXS) analysis of the same unphosphorylated STK32A protein that was crystallised to assess the solution phase behaviour of the protein, and also compared this to the auto-phosphorylated protein (Figure 4, Supplementary Table 3). The proteins were passed through a size-exclusion chromatography column immediately before SAXS analysis.
The proteins eluted as a single, sharp peak with consistent radius of gyration (Rg) calculated across the peak for both unphosphorylated and phosphorylated protein ( Figure 4A, 4B). In general, the solution scattering behaviour of both samples was very similar as seen in the similar scattering curves, Guinier plots and overlapping real space p(r) distributions of each ( Figure 4C-E). A plateau in the Porod-Debye plot ( Figure 4F) and a Porod exponent approaching 4 indicated that the proteins were relatively compact and rigid, as expected for a globular protein kinase domain. The normalised Kratky plots for both samples had a peak maximum approaching that expected by Guinier's approximation (√3 with magnitude 1.04) and returning to baseline at higher values of q x Rg, indicating that they are generally globular and folded, but the slight shift of the peak to the right indicated that they are slightly elongated ( Figure 4G). Measurements revealed that the protein was monomeric in solution, both in its unphosphorylated form and following auto-phosphorylation. A mass of 47-51 kDa was determined by SAXS by the QR method [24] for the unphosphorylated protein, compared to an intact mass of 45.4 kDa observed by denaturing LC-MS (Supplementary Table 3). A small mass increase was detectable by SAXS following auto-phosphorylation, consistent with addition of multiple phosphate groups, as observed by LC-MS (Supplementary Table 3). The experimental SAXS data was compared to calculated SAXS curves based on the STK32A crystal structure which had missing loops modelled (Model 1), and resulted in an excellent fit (Chi 2 = 1.18), indicating that the conformation observed for each inhibitor-bound monomer in the crystal structure was not very dissimilar to the conformation in solution ( Figure 4H). Ab initio dummy residue models generated using the SAXS data could also be superimposed well with the envelope of the crystal structure ( Figure 4I, 4J). The ab initio model calculated for the phosphorylated protein sample retained a generally similar overall shape to that of the unphosphorylated protein ( Figure   4K). Altogether, the data indicates that STK32A functions as a monomer in both its unphosphorylated and auto-phosphorylated state.

STK32A binds multiple clinically-used kinase inhibitors
Differential scanning fluorimetry (DSF) measurements were used to assess inhibitor binding to STK32A by measuring the increase in melting temperature (ΔTm) of STK32A in the presence of a set of kinase inhibitors many of which have been used in clinical trials. STK32A bound several broadspectrum type-I kinase inhibitors such as Staurosporine, TAE684, AZD7762 and K252a, but also many clinically-tested type-I inhibitors including the ALK inhibitor Ceritinib (LDK378), the BRAF inhibitor Dabrafenib (GSK2118436), the PAK4 inhibitor PF-03758309, the SYK inhibitor PRT062607, the p38 MAPK inhibitor TAK 715, and the Aurora A/B/C inhibitor Danusertib ( Figure 5A). Interestingly, the compound 1NM-PP1, developed to target "analogue-sensitive" kinases [25] such as v-Src-as1 that harbours an I338G gatekeeper mutation, bound STK32A with ΔTm = 8°C, the third highest thermal stabilisation observed with this set of inhibitors ( Figure 5B). STK32A also bound several other inhibitors designed to target analogue sensitive kinases, such as PP-121, 2,3-dMB-PP1 and 1-Na-PP1 ( Figure 5B). Several crystal structures exist for human and parasitic kinases in complex with similar small molecules that shed light on the likely binding mode of 1NM-PP1 in STK32A (for example 4LGH, 5W8R, 3I7B, 3NCG, 3MA6). In a crystal structure of analogue sensitive Src (PDB 4LGH), the pyrazolopyrimidine core binds to the kinase hinge and the naphthalene group extends into the buried back pocket of the ATP site. In STK32A, there is a small gatekeeper residue (Val100), conserved across the family (Supplementary Figure 3C), that leaves a small pocket at the back of the ATP-binding site that is unoccupied in the structure with Staurosporine ( Figure 5C, circled), but potentially allows it to accommodate the bulky naphthalene group of 1NM-PP1 ( Figure 5D). Out of 420 kinase sequences analysed, most human kinases contain a large or branched hydrophobic residue as the gatekeeper (Met (40%), Leu (17%), Phe (15%)), whereas only 22% contain a small residue (such as Thr, Ser, Val, Ala, Gly) and out of these valine was found to be the gatekeeper in only 3% of cases, predominantly consisting of tyrosine kinases, therefore the STK32 family is relatively unusual in this respect [26]. A similar binding mode is observed for PP-121 in Src (PDB 3EN4), however PP-121 is also able to form additional polar interactions between the nitrogen atoms of the pyrrolo-pyridine substituent and the side chain of Glu310 and the backbone of Asp404 in the back pocket of Src, residues that are conserved in STK32A (Glu71 and Asp164) further explaining the ability of STK32A to bind PP-121 relatively strongly (ΔTm = 6°C).

Discussion
The data shows that STK32A is an acidophilic, dual-specificity AGC-family kinase.
Serine/threonine kinases that have been found to have a preference for acidic substrates are relatively rare, but include the casein kinases (CKs) and polo-like kinases (PLKs), all of which are able to use casein as a substrate. Similarly STK32A is able to phosphorylate casein in vitro; both our designed synthetic substrate corresponding to part of the bovine beta-casein peptide sequence as well as a sample of beta-casein extracted from bovine milk were phosphorylated by STK32A.
Inspection of crystal structures of acidophilic kinases sheds light on their preference for negatively charged residues surrounding the substrate phosphorylation site, for example CK2α is known to phosphorylate a large number of substrates with acidic residues C-terminal to the phosphorylation site [27] and examination of crystal structures of CK2α reveals a large patch of highly positive charge on the surface close to the ATP-binding site that drives this preference (Supplementary Figure 4A). Similar patches of positive charge can be seen in the PLKs and other casein kinases as well as in our structure of STK32A (Supplementary Figure 4A). In CK2, a phosphorylated residue can substitute for acidic glutamate or aspartate residues in the substrate. The tyrosine kinase epidermal growth factor receptor (EGFR), which has also been found to be acidophilic in substrate preference in the P-3 to P-1 region, is similarly capable of binding primed, phosphorylated substrates [27,28]. EGFR contains a large basic patch across the substrate binding groove that could complement the binding of peptides with acidic residues N-terminal to the phosphorylation site, and a crystal structure solved with a primed phosphorylated substrate shows that the large phosphorylated tyrosine in the P+1 position is able to also occupy this pocket in a binding mode distinct from non-primed substrates (Supplementary Figure 4B) [29]. The phospho-mapping data for STK32A indicates that it may also be capable of binding primed substrates since there were found to be doubly-phosphorylated peptides situated both on auto-phosphorylated STK32A (S228/T229) and on the p38 synthetic peptide (residues equivalent to T180/Y182 in full-length p38), with sites within one or two amino acids of each other ( Supplementary Figures 1 and 2). STK32B and STK32C have 69% and 65% sequence identity, respectively, to STK32A. In particular, the residues lining the substrate binding groove of STK32A are highly conserved in STK32B/C, including Arg109, Arg221 and Arg304. Therefore, the general preference of STK32A for acidic substrates is expected to be conserved in STK32B/C. However, the αG helix and the central portion of the activation segment, also important for substrate recognition, are not well conserved (Supplementary Figure 3C) and there will be a range of different substrates for the three proteins.
The expression patterns and subcellular localizations of STK32A, STK32B and STK32C will significantly affect the available substrates. Data in the Human Protein Atlas suggests that STK32A is located to the centrosome, while STK32B is mainly localised to microtubules and vesicles [30], while preliminary localisation studies show that with high level ectopic expression the STK32 proteins remain primarily cytosolic ( Supplementary Figures 5 and 6).
The ATP-binding site is well conserved within the STK32 family (Supplementary Figure 3C), therefore a similar pattern of inhibitor binding is likely. In particular, the small valine gatekeeper residue is conserved and therefore inhibitors such as 1NM-PP1 and PP-121 ( Figure 5B) are also expected to bind to STK32B and STK32C as well as STK32A. This may offer a convenient route to obtaining selective inhibitors of the STK32 family. We also observed that many well-known kinase inhibitors bind STK32A. Given that there is currently no information on the mechanistic roles of the STK32 kinases in the cell it is unclear if any of the observable phenotypes with these inhibitors are due in part to STK32 inhibition.

Methods
Cloning DNA for residues 9-390 of human STK32A isoform 1 (NCBI reference NP_001106195.1) was PCR amplified and subcloned into baculovirus transfer vector pFB-LIC-Bse by ligation independent cloning. The resulting construct expressed the desired protein with an N-terminal hexahistidine purification tag and TEV (tobacco etch virus) protease tag cleavage site (extension MGHHHHHHSSGVDLGTENLYFQ*SM where * represents the TEV protease digestion site). The construct was verified by DNA sequencing.

Protein Expression and Purification
The expression construct was transformed into E. coli DH10Bac competent cells, which were

Consensus substrate identification
Peptides derived from HeLa cell lysate digests that were phosphorylated by STK32A were identified as in published procedures [21].

Crystallisation and data collection
The fractions containing pure STK32A were pooled, and staurosporine was added. The sample was concentrated to 36 mg/mL, diluted to 5 mL volume with GF Buffer and then re-concentrated to 36 mg/mL. This sample was then mixed 3:1 with peptide SEALVSEEDAD (at 34.4 mM in 80 mM Tris·HCl pH7.8) to give a final STK32A concentration of 27 mg/mL and a final peptide concentration of 8.6 mM. This sample was used for crystallisation. Note that although included in the crystallisation, the peptide was not visible in the electron density for the structure.
Crystals were obtained using the sitting drop vapour diffusion method at 4 °C. Crystals of grew from a mixture of 50 nL STK32A:staurosporine and 100 nL of a well solution containing 0.2M sodium acetate, 20% PEG 3350 and 10% ethylene glycol. Crystals were equilibrated into reservoir solution plus 25% ethylene glycol before freezing in liquid nitrogen. The data was collected at 100K at the Diamond Synchrotron, beamline I04-1. Data collection statistics can be found in Table 1. Protein concentrations were measured by UV absorbance, using the calculated molecular weights and estimated extinction coefficients using a NanoDrop spectrophotometer.

Structure determination
The diffraction data was indexed and integrated using MOSFLM [31] and scaled using SCALA [32]. The structure was solved by molecular replacement using PHASER [33] and an ensemble search model created using PHENIX [34] from the kinase domain structures of human PKCiota (PDB 3A8X), cAMP-dependent protein kinase catalytic subunit alpha (PDB 3L9N) and p90 ribosomal S6 kinase 2 (PDB 3G51). There were six molecules of YANK1 in the asymmetric unit. The model was built using Coot [35] and refined with REFMAC5 [36] for 300 cycles of jellybody refinement prior to further refinement in PHENIX [34]. Refinement statistics are given in Table 1. Coordinates were deposited in the PDB under accession code 4FR4.

HPLC-SAXS
Sample details, structural parameters and modelling results are given in Supplementary Table   1. Purified, unphosphorylated, STK32A (as used for the crystal structure determination) was concentrated to 10.5 mg/mL (~230 µM) and flash frozen in liquid nitrogen. For preparation of phosphorylated STK32A, adenosine triphosphate (ATP) and magnesium chloride were added to a final concentration of 5 mM in 80 µL of STK32A at 9.5 mg/mL (210 µM). The reaction was monitored by denaturing LC-MS and then the sample was flash frozen. Inline HPLC-SAXS was performed at Diamond beamline b21. The samples were thawed and centrifuged and 45 µL injected onto a Shodex KW-403 column pre-equilibrated in buffer A (25 mM HEPES pH 7.5, 200 mM NaCl, 0.5 mM TCEP, 2.5% glycerol, 1% sucrose), connected to an Agilent 1200 HPLC system. Data was processed and analysed using ScÅtter software [37]. Buffer subtraction was performed separately for a range of stable Rg values across each intensity peak using the SEC flow-through prior to elution of the protein.
Radius of gyration (Rg) was determined using both Guinier fitting and by analysis of the P(r) distribution. Maximum particle dimension (dmax) and volume of correlation (Vc) were also determined. Molecular mass of the proteins was determined using the QR method [24]. The missing loops from the STK32A crystal structure (PDB 4FR4) were modelled using MODELLER [38] and then FoXS [39] was used to calculate scattering curves from the generated model and compare them to the experimental data. The reduced experimental SAXS data were input into Gasbor [40] and a dummy residue model generated. SUPCOMB [41] was then run to superpose the final models against the crystal structure.

Inhibitor screening
A library of kinase inhibitors was screened at 10 µM concentration against the purified STK32A protein at 2 µM by differential scanning fluorimetry (DSF), using published procedures [42].
Compounds that significantly increased the melting temperature of the protein were considered hits.

Cell Lines
SKOv3, HeLa, MCF7 and HEK-293T cell lines were obtained from the American Tissue Type

Gateway cloning
Lenti-viral and GFP vectors were generated using the Gateway System (Invitrogen) according to manufacturer's instructions. STK32A, STK32B and STK32C cDNAs were subcloned into pDONR221 vector (Invitrogen) according to manufacturer's instruction and then transferred to pLX302 lentiviral destination vector (addgene) or pDEST-CMV-N-EGFP vector (pDEST-CMV-N-EGFP was a gift from Robin Ketteler (Addgene plasmid # 122842) using recombination utilising the LR-Clonase

Transfections
GFP plasmid transfections were conducted using Fugene HD (Promega) according to the manufacturer's instructions. Briefly, plasmid DNA and Fugene HD were diluted in Optimem (Invitrogen) in a 1:3 ratio (DNA:Fugene) and incubated for 20 minutes. Cells were transfected and grown at 37°C and 5% CO2 for 48 h or 72 h before they were analysed on the Zeiss Observer Z1 Microscope using a 40x oil objective or a 20x objective. Images were analysed on ImageJ software.

Lenti-virus mediated stable cell lines generation
Packaging cells (HEK-293T) were co-transfected using Fugene HD (Promega) according to the

Immunofluorescence
Cells were grown to 80% confluence on cover slips in 12-well plates prior to fixation with 0.5ml 4% methanol-free formaldehyde (ThermoFisher Scientific) for 4 minutes at room temperature.
Permeabilisation was achieved using 0.5 mL 100% ice-cold ethanol and incubation at -20°C overnight. Ethanol was removed and 1ml wash buffer (1x TBS, 0.2% Triton X100 and 0.04% SDS, Sigma) was added to the wells for 5 minutes. Cells were blocked in 1ml blocking buffer (3% bovine serum albumin diluted in 1xTBS) for 1 hour at room temperature. Cover slips were incubated with 60μl primary antibody (rabbit anti-V5-tag (Abcam) diluted in blocking buffer for 1 hour at room temperature. Three washes were performed with 1ml wash buffer for 5 minutes each. Secondary antibodies (Alexa Fluor 488 donkey anti-rabbit, Invitrogen) were diluted in blocking buffer and 60μl was added to each cover slip for 1 hour at room temperature in the dark. Cells were washed as before prior to mounting the coverslips with mounting medium containing DAPI (Vectashield) on microscope slides (ThermoScientific) sealed with nail varnish at the edges. Slides were stored at 4°C in the dark until analysis. Images were analysed on the Zeiss Observer Z1 Microscope using a 40x oil objective or a 20x objective. Images were analysed on ImageJ software.