Revealing the principles of inter- and intra-domain regulation in a signaling enzyme via scanning mutagenesis

Multi-domain enzymes can be regulated by both inter-domain interactions and structural features intrinsic to the catalytic domain. The tyrosine phosphatase SHP2 is a quintessential example of a multi-domain protein that is regulated by inter-domain interactions. This enzyme has a protein tyrosine phosphatase (PTP) domain and two phosphotyrosine-recognition domains (N-SH2 and C-SH2) that regulate phosphatase activity through autoinhibitory interactions. SHP2 is canonically activated by phosphoprotein binding to the SH2 domains, which causes large inter-domain rearrangements, but autoinhibition can also be disrupted by disease-associated mutations. Many details of the SHP2 activation mechanism are still unclear, the physiologically-relevant active conformations remain elusive, and hundreds of human variants of SHP2 have not been functionally characterized. Here, we perform deep mutational scanning on both full-length SHP2 and its isolated PTP domain to examine mutational effects on inter-domain regulation and catalytic activity. Our experiments provide a comprehensive map of SHP2 mutational sensitivity, both in the presence and absence of inter-domain regulation. Coupled with molecular dynamics simulations, our investigation reveals novel structural features that govern the stability of the autoinhibited and active states of SHP2. Our analysis also identifies key residues beyond the SHP2 active site that control PTP domain dynamics and intrinsic catalytic activity. This work expands our understanding of SHP2 regulation and provides new insights into SHP2 pathogenicity.


Data availability
Deep sequencing data and molecular dynamics trajectory files will be made available via the following Dryad repository: https://doi.org/10.5061/dryad.83bk3jb18.

SHP2 mutagenesis library preparation
Saturation mutagenesis libraries of SHP2 were prepared with Mutagenesis by Integrated TilEs (MITE) method 1 .The 1782 bp (593AA + stop codon) full-length SHP2 gene we used was first optimized to yeast favorable codons.Then, the full-length sequence was divided into 15 separate tiles, each spanning around 40 amino acids (Table S1).Two saturation mutagenesis oligo pools of alternating tiles were designed and acquired from Twist Bioscience.Each single amino acid substitution was encoded by a single oligo, thus there is no degeneracy in the library.The oligo sequences include invariant overhang sequences on each end designed for Gibson assembly to replace the wild-type sequence of each tile of a pET-28 plasmid containing the yeast-optimized human SHP2FL coding sequence.With PCR primers annealing to their overhang sequences, individual tiles were amplified from their oligo pool.Backbone amplification primers complementary to the tile amplification primers were used to amplify Gibson backbone DNA for each tile from the wild-type pET-28 yeast-optimized SHP2FL plasmid.Tile mutagenesis library inserts were then cloned onto their corresponding backbones using Gibson assembly to generate 15 separate plasmid libraries, each containing all single mutants of one tile region.For each tile, SHP2 full-length construct was PCR amplified with primers annealing right outside of the SHP2 gene on the plasmid, carrying overhangs for homologous recombination onto the yeast expression plasmid PWJ1781.For tiles 7-13 in the PTP domain, isolated PTP domain constructs spanning residues 235-539 were also amplified with homologous recombination overhangs.

Yeast expression plasmids construction and transformation
For expression of the kinase and phosphatase libraries in yeast, we used a galactose induced expression plasmid PWJ1781 2 .For the purposes of double transformation in yeast, we switched the LEU2 marker to a URA3 marker to generate PWJ1781-URA.v-SrcFL and c-SrcKD DNA sequences were then integrated into PWJ1781-URA by Gibson assembly to make kinase expression plasmids PWJ-1781-URA-c-SrcKD and PWJ-1781-URAv-SrcFL. SHP2 expression plasmids were constructed with homologous recombination in yeast for both the SHP2FL construct and SHP2PTP.In both cases, inserts bearing each mutagenized tile were integrated into PWJ1781 separately.PWJ1781 was first digested with Hpa1 to yield linear backbone DNA.Then, a molar ratio 1:2 of digested backbone and amplified insert was cotransformed into yeast strain YPH499 with the LiAc/PEG/ssDNA method 3 .Specifically, 2 μg of the two DNA pieces, total, were transformed into ~3 x 10 8 YPH499 cells.The transformed cells were then grown in 500 mL synthetic complete media without leucine supplemented with 4% glucose at 30 °C with shaking for ~40 hours to reach a high concentration (OD600 ~6-10, 6-10 x 10 7 cells/mL).The cells from each transformation were collected as stock for one tile.Cells from each stock were directly subjected to kinase transformation.In the kinase plasmid transformation, 2 μg of PWJ-1781-URA-c-SrcKD and PWJ-1781-URAv-SrcFL was transformed into ~6 x 10 7 YPH499 cell bearing phosphatase plasmid with the LiAc/PEG/ssDNA method.The doubly transformed cells were grown in 100 mL of synthetic complete media without leucine and uracil supplemented with 4% glucose at 30 °C with shaking for ~28 hours to reach a high OD (OD600 ~4, 4 x 10 7 cells/mL).Each transformation yielded a cell stock transformed with one of the v-SrcFL or c-SrcKD expression plasmids and one of the SHP2FL or SHP2PTP plasmids bearing one mutagenized tile.Each stock was then subjected to outgrowth and selection individually.

Tile library selection with the yeast growth assay
Each cell stock bearing one kinase expression plasmid and a SHP2 library with one variable tile was inoculated into synthetic complete media without uracil and leucine supplemented with glycerol lactate (2% lactic acid, 3% glycerol, 0.05% glucose in media) at a starting OD600 = 0.1.The culture was grown at 30 °C with shaking for ~16 hours and then used to inoculate synthetic complete media without uracil and leucine supplemented with 4% galactose for co-expression and selection.The rest of the cells grown in glycerol lactate media were harvested as unselected samples.For each outgrowth culture, two parallel selection cultures were made with a starting OD600 = 0.05.The cultures were allowed to grow in 30 °C with shaking for 24 hours, and the cells were then harvested as post-select samples.

Deep sequencing
For cell stocks with each tile before and after selection, plasmid DNA was extracted with Zymoprep Yeast Plasmid Miniprep II kit from ~1 x 10 8 cells.Each mutagenized tile DNA library was PCR amplified from its corresponding SHP2 plasmid with tile specific primers bearing overhangs for the addition of Illumina sequencing adapters.The PCR mix was directly followed by another round of PCR amplification with Illumina barcoding primers to append Illumina sequencing adaptors and 5' and 3' indices (D700 and D500 series primers).The PCR products were gel purified and quantified with QuantiFluor® dsDNA System (Promega).Then, samples were pooled at ratios that all mutants are theoretically equally represented, and the pooled libraries were sequenced on a MiSeq using V2 300 cycle reagent kits.Each sequencing run contained no more than 24 samples to ensure good read counts.

Sequencing data analysis
Paired-end sequencing reads were first merged with FLASH 4 , followed by trimming with Cutadapt to remove constant sequences outside of the mutagenized libraries 5 .Then, read counts for each mutant in the libraries were calculated using in-house Python scripts (https://github.com/nshahlab/2024_Jianget-al_SHP2-DMS).For each sequenced library, frequencies of the mutants (fmut) were first calculated by taking the ratio of the mutants' read counts (nmut) over total reads (ntotal) in the library (euation 1).Then, enrichment (Emut) were calculated through dividing after selection (fselected) by frequencies before selection (funselected) (equation 2).The enrichment scores data (Scoremut) that we are presenting on the heat maps are log10-transformed enrichment normalized to WT (equation 3). (1)

Purification of full-length SHP2 proteins
pET28-His-TEV plasmid encoding the human SHP2FL sequence (human cDNA) was used for QuikChange mutagenesis to generate SHP2FL mutants. 6For purification of wild-type and mutant SHP2FL constructs, plasmids were first transformed into chemically competent BL21(DE3) cells and grown on LB agar plates supplemented with 50 μg/mL kanamycin.Then, colonies scraped off the plates were inoculated into 100 mL LB with kanamycin and grew at 37 °C to OD600 = 1.50mL of this culture was used to inoculate 1 L LB cultures with kanamycin at a starting OD600 = 0.1, and the 1 L cultures were incubated at 37 °C until their OD600 reached 0.5.0.5 mM IPTG was supplemented to induce expression, and the expression cultures were grown overnight at 18 °C.The cultures were then spun down at 4000xg for 30min, and resuspended in lysis buffer (50 mM Tris pH 8.0, 300 mM NaCl, 20 mM imidazole, 10% glycerol, and freshly added 2 mM β-mercaptoethanol).The cell suspensions were lysed using sonication (Fisherbrand Sonic Dismembrator), and spun down at 14,000 x g for 45 minutes.The His-tagged SHP2FL constructs were extracted from the supernatant with a 5 mL Ni-NTA column (Cytiva).The column was subsequently washed with 50 mL lysis buffer and 50 mL wash buffer (50 mM Tris pH 8.5, 50 mM NaCl, 20 mM imidazole, 10% glycerol, and freshly added 2 mM β-mercaptoethanol), followed by elution of the tagged SHP2 with a mixture of 25 mL wash buffer + 25 mL elution buffer (50 mM Tris pH 8.5, 50 mM NaCl, 500 mM imidazole, 10% glycerol) directly onto a 5 mL HiTrap Q Anion exchange column (Cytiva).The Q column was washed once with 40 mL anion exchange buffer A (50 mM Tris pH 8.5, 50 mM NaCl, 1 mM TCEP), and the protein was eluted off with a salt gradient between Anion A buffer and Anion B buffer (50 mM Tris pH 8.5, 1 M NaCl, 1 mM TCEP).The eluted protein fractions were collected and cleaved with 0.10 mg/mL His6-tagged TEV protease at 4 °C overnight to remove the His tag.The cleavage mixture was applied through 2 mL of Ni-NTA gravity column (ThermoFisher) to remove uncleaved protein and TEV protease, and the flow through was concentrated to less than 1 mL.Finally, the concentrated protein solution was loaded onto a Superdex 200 16/600 gel filtration column (Cytiva) equilibrated with SEC buffer (20 mM HEPES pH 7.5, 150 mM NaCl, and 10% glycerol) for size exclusion purification.Pure fractions were pooled and concentrated, and flash frozen in liquid N2 for long-term storage at −80 °C.

Purification of SHP2 SH2 domains
pET28-His6-SUMO-NSH2-Avi and pET28-His6-SUMO-CSH2-Avi plasmids in our lab were used as templates for wild-type and mutant SH2 domain purification 6 .The Avi tags were first removed in one cloning step, and the resulting pET28-His6-SUMO-N/C-SH2 plasmids were applied to QuikChange mutagenesis to generate desired SH2 mutants.For purification of wild-type and mutant SH2 constructs, plasmids were first transformed into chemically competent BL21(DE3) cells and grown on LB agar plates supplemented with 50 μg/mL kanamycin.Then, colonies scraped off the plates were inoculated into 100 mL LB with kanamycin and grew at 37 °C to OD600 = 1.50 mL of this culture was used to inoculate 1 L LB cultures with kanamycin at a starting OD600 = 0.1, and the 1L cultures were at 37 °C until their OD reached 0.5.0.5mM IPTG was supplemented to induce expression, and the expression cultures were grown overnight at 18 °C.The cultures were then spun down at 4000 x g for 30 min, and resuspended in lysis buffer (50 mM Tris pH 7.5, 300 mM NaCl, 20 mM imidazole, 10% glycerol, and freshly added 2 mM β-mercaptoethanol).The cell suspensions were lysed using sonication (Fisherbrand Sonic Dismembrator), and spun down at 14,000xg for 45 minutes.The His-tagged SH2 constructs were extracted from the supernatant with a 5 mL Ni-NTA column (Cytiva).The column was subsequently washed with 50 mL lysis buffer and 50 mL wash buffer (50 mM Tris pH 7.5, 50 mM NaCl, 20 mM imidazole, 10% glycerol, and freshly added 2 mM β-mercaptoethanol), followed by elution of the tagged SH2 with a mixture of 25mL wash buffer + 25mL elution buffer (50 mM Tris pH 7.5, 50 mM NaCl, 500 mM imidazole, 10% glycerol) directly onto a 5 mL HiTrap Q Anion exchange column (Cytiva).The Q column was washed once with 40 mL anion exchange buffer A (50 mM Tris pH 7.5, 50 mM NaCl, 1 mM TCEP), and the protein was eluted off with a salt gradient between Anion A buffer and Anion B buffer (50 mM Tris pH 7.5, 1 M NaCl, 1 mM TCEP).The eluted protein fractions were collected and cleaved with 0.05mg/mL His6-tagged Ulp1 protease at 4°C overnight to remove the His tag.The cleavage mixture was applied through 2mL of Ni-NTA gravity column (ThermoFisher) to remove uncleaved protein and TEV protease, and the flow through was concentrated to less than 1mL.Finally, the concentrated protein solution was loaded onto a Superdex 75 16/600 gel filtration column (Cytiva) equilibrated with SH2-SEC buffer (20 mM HEPES pH 7.4, 150 mM NaCl, and 10% glycerol) for size exclusion purification.Pure fractions were pooled and concentrated, and flash frozen in liquid N2 for long-term storage at −80°C.

SHP2 basal activity measurements
Basal activities of wild-type and mutant SHP2 were measured against the fluorogenic substrate 6,8-difluoro-4-methylumbelliferyl phosphate (DiFMUP).Initial DiFMUP dephosphorylation rates by SHP2 variants were measured at 37 °C.Reactions were done in black polystyrene flat bottom half area 96-well plates at a working volume of 50 μL.Initial rates of each protein were measured in a set of 3 replicates.In each replicate, the reaction mix contains a fixed concentration of protein (see below) and a DiFUP concentration series of 4000, 2000, 1000, 500, 250, 125, 62.5 and 31.25 μM.With each plate the absorbance of the dephosphorylation product DiFMU at a concentration series of 200, 100, 50, 25, 12.5, 6.25, 3.125 and 0 µM was measured as a standard curve to convert absorbance values to product concentrations.Reactions were started by the addition of the protein, and emitted fluorescence at 455 nm was measured every 25 seconds within 50 min with a BioTek Synergy Neo2 multi-mode reader.Fluorescence values were converted into DiFMU concentrations, and initial rates were determined by the slope of the first 5 minutes on the reaction curves.All the initial rates were fitted onto Michaelis-Menten curves using GraphPad Prism to determine kcat and KM values. 7lting temperature measurements via differential scanning fluorimetry (DSF) Melting temperature measurements were conducted in DSF buffer (20 mM HEPES pH 7.5, 50 mM NaCl, 0.4% DMSO) on MicroAmp Fast Optical 96-well Reaction plates (Applied Biosystems, # 4346906) at working volumes of 20µL.The mixtures contained 10 µM protein and 25x SYPRO Orange Protein Gel Stain (Thermo Fisher, catalog no.S-6650).Melting curves were measured on an Applied Biosystems Step-One Plus RT-PCR thermocycler between 25 °C and 95 °C with a gradient of +0.5 °C per minute (excitation: 472 nm; emission: 570 nm).Fluorescence reads under each temperature were analyzed using DSFworld and melting temperatures were calculated with dRFU 8 .
EGF stimulation experiments 2.2 x 10 6 HEK 293 cells were seeded in a 10 cm plate.The next day, cells were transfected overnight with 5 μg Gab1 and 5μg SHP2 using 30 μg of polyethyleneimine in 1 mL DMEM.Transfection media was refreshed and cells were serum-starved for 24 hours in DMEM.Cells were harvested by scraping, washed 3 times in 1 mL PBS at room temperature.Prior to stimulation, an aliquot of cells was taken as an unstimulated control (t = 0).Cells were then resuspended in 25 ng/mL EGF in pre-heated PBS and placed in a 37 °C heatblock.Aliquots were taken at 2, 10 and 30 minutes; placed on ice and spun down in a 4 °C tabletop centrifuge at 1000 g for 5 minutes.Supernatant was aspirated and cells were lysed in 75 μL lysis buffer (20 mM Tris pH 8.0, 137 mM NaCl, 2 mM EDTA, 10% glycerol, 0.5% NP-40, with freshly added phosphatase-and protease inhibitors) for 25 minutes on ice.Lysates were spun down for 15 minutes at 17,000 g in a 4 °C tabletop centrifuge.Supernatant was used in a bicinchoninic acid (BCA) assay to determine protein concentration.15 μg of total protein was loaded on a 12% acrylamide gel and transferred to a 0.45 micron nitrocellulose membrane using the StandardSD protocol on the Bio-Rad Trans-Blot Turbo.Membranes were blocked for 1 hour at room temperature using 5% Bovine Serum Albumin (BSA) in TBS.Primary antibodies were stained overnight at 4 °C (Erk 1:1000, p-Erk 1:2000, Vinculin 1:1000, Myc 1:5000, FLAG 1:5000) in 5% BSA in TBST.Membranes were washed 3 times in 5 mL TBST for 5 minutes each.Secondary antibodies were incubated in 5% BSA in TBST for 1 hour at room temperature (1:10,000).Membranes were imaged on a LiCor Odyssey and bands were quantified using ImageStudio.

Ras dephosphorylation assay (Rassay)
0.8 x 10 6 HEK 293 cells were seeded in a 6 cm plate.The next day, cells were transfected with Ras, Ras and Src, or Ras, Src and SHP2; to a total of 3 μg (a pEF vector with no open reading frame was used to make up the difference between conditions) in 300 μL DMEM with 9 μg of polyethyleneimine.Transfection medium was refreshed the next morning and replaced with warm DMEM with 10% FBS.36 hours after transfection, cells were harvested by scraping and washed 3 times in 1 mL cold PBS.Cells were lysed in 150 μL lysis buffer (20 mM Tris pH 8.0, 137 mM NaCl, 2 mM EDTA, 10% glycerol, 0.5% NP-40, with freshly added phosphatase-and protease inhibitors) for 25 minutes on ice.Lysates were spun down for 15 minutes at 17,000g in a 4 °C tabletop centrifuge.Supernatant was used in a BCA assay to determine protein concentration.80 μg of protein was used in an immuno-precipitation (IP) with 5 μg of packed Pierce anti-HA magnetic beads (Fisher, #88836) in a total volume of 350 μL.Samples were incubated at 4 °C overnight while rotating.The next morning, beads were washed 3 times on a magnetic racks using 1 mL of lysis buffer, and finally resuspended in 65 μL of 1x Laemmli buffer.All samples were boiled at 100 °C for 8 minutes, and 15 μg of total protein (total cell lysate) or 15 μL of each sample (IP) was loaded onto a 12% acrylamide gel.Proteins were transferred to a 0.45 micron nitrocellulose membrane using the StandardSD protocol on the Bio-Rad Trans-Blot Turbo.Membranes were blocked for 1 hour at room temperature using 5% BSA in TBS.Primary antibodies were stained for 2 hours at room temperature (TCL: Src 1:1000, β-actin 1:5000, Myc 1:5000, HA 1:1000; IP: HA 1:1000, pTyr 1:2000) in 5% BSA in TBST.Membranes were washed 3 times in 5 mL TBST for 5 minutes each.Secondary antibodies were incubated in 5% BSA in TBST for 1 hour at room temperature (1:10,000).Membranes were imaged on a LiCor Odyssey and bands were quantified using ImageStudio.The specific antibodies used for this study are listed in the

Sequence alignments and conservation score calculations
A list of 137 metazoan SHP2 sequences was compiled through a series BLAST searches in diverse organisms spanning the metazoan clade, using human SHP2 (Uniprot Q06124) as the query sequence 9 .For each organism tested, if there were multiple hits without clear annotation, off-target proteins were generally SHP1, which could be readily identified by comparison with the human SHP1 sequence.Thus, all sequences used for this alignment are unambiguously SHP2 orthologs.Typically, the isoform that most closely matched the canonical human isoform was used.A multiple sequence alignment was prepared using T-COFFEE with the default settings and used without any further manual curation 10 .For the human classical PTP domain alignment, we first obtained a list of all known proteins containing this domain 11 .Protein sequences were obtained from Uniprot, using the annotated domain boundries 12 .An alignment of all 49 domains, derived from 38 unique proteins, was made using T-COFFEE 10 .Conservation at each position in both alignments was calculated as the Jensen-Shannon divergence, using source code from the Capra group 13 .In both cases, conservation was only calculated for positions in the alignment where human SHP2 had occupancy.Calculated conservation scores and corresponding average enrichments from deep mutational scanning are provided in Table S4.

Preparation of Structural Models for Simulations
We built and simulated the full-length SHP2 in the open, active state and in the closed, autoinhibited state.For both conformations we considered the wild-type protein, as well as the protein with Glu 76 mutated to Lys.For the simulations of the protein in the autoinhibited state, we used the crystal structure 4DGP as the starting structure 14 .Missing residues were modeled as follows.Residues 1-3 at the N-terminal end were built using PyMOL 15 .Residues 235-245 were taken from a model of the full-length SHP2 that was predicted by AlphaFold2 16 , since this region is missing in all crystal structures.Residues 293-303 were taken from the crystal structure 6CRF 17 and residues 314-232 were taken from the crystal structure 4RDD 18 .For simulations of the protein with the E76K mutation, Glu 76 was mutated to Lys using PyMOL.For the protein in the open, active state, we used the crystal structure 6CRF as the starting structure 17 .In this structure Glu 76 is mutated to Lys, so for the wild-type simulations, this residue was mutated to Glu using PyMOL.Missing residues 89-93, 140-145, 154-166 and 203-209 were taken from 4DGP 14 .Residues 237-244 were taken from a model generated by AlphaFold2 16 .Residues 313-324 were taken from 4RDD 18 , and missing C-terminal residues 526-528 were built in using PyMOL.Cys 459 is deprotonated in all systems.Both N-and C-terminii were capped with acetyl and amide groups, respectively, in all systems.

Simulation Protocol
Each system was solvated with TIP3P water 19 , and ions were added such that the final ionic strength of the system was 100 mM using the tleap package in AmberTools22 20 .The energy of each system was minimized first for 5000 steps while holding the protein atoms and crystalline waters fixed, followed by minimization for 5000 steps while allowing all the atoms to move.Following minimization, three individual trajectories were generated for each system, with distinct initial velocities for each.The temperature of each system was raised in two stages -first to 100 K over 1 ns and then to 300 K over 1 ns.The protein atoms and crystalline waters were held fixed during the heating stage.Each system was then equilibrated for 2 ns, followed by production runs.Three production trajectories, each 2.5 µs long, were generated for each system.All equilibration runs and production runs were performed at constant temperature (300 K) and pressure (1 bar).The simulations were carried out with the Amber package 21 using the ff14SB force field for proteins 22 using an integration timestep of 2 fs.The Particle Mesh Ewald approximation was used to calculate long-range electrostatic energies 23 .All hydrogens bonded to heavy atoms were constrained with the SHAKE algorithm 24 .The Langevin thermostat was used to control the temperature with a collision frequency of 1 ps -1 .Pressure was controlled while maintaining periodic boundary conditions.

Analyses
MD trajectories were compiled from the raw data using the CPPTRAJ module of AmberTools22 20 .Structures were extracted from the trajectories in both 1 ns and 10 ns increments for analysis and visualization.All measurements and calculations were done using the PDB module in Biopython 25 .For most calculations reported in the main text, trajectories were sampled every 10 ns.The measurements from all three replicates of each system were combined to determine the reported distributions.In cases where distance calculations involved a redundant atom (e.g.distances between two possible nitrogens and two possible oxygens in a Glu/Arg ion pair), all combinations of distance measurements were calculated, then the shortest distance at each frame was determined and used for the distribution plots.For visualization, trajectories were sampled every 10 ns.All structure visualization and rendering in this study was done using PyMOL 15 .
Figure S2.Scanning mutagenesis and selection assays with SHP2FL.pg Figure S3.Hotspot residues at the N-SH2/PTP interface.pg Figure S4.Destabilization of the N-SH2 domain by core mutations.pg Figure S5.Mutational sensitivity and dynamics at the C-SH2/PTP interface.pg Figure S6.Unique mutational effects at linker and C-SH2 residues.pg Figure S7.Scanning mutagenesis and selection assays with SHP2PTP.pg Figure S8.Mutational effects at I282, Q506, and Q510.pg Figure S9.WPD loop motions and conformational constraints in SHP2.pg Figure S10.Mutational sensitivity at allosteric inhibitor binding sites in SHP2.pg Supplementary References pg