Identification of phosphosites that alter protein thermal stability

Proteomics has enabled the cataloguing of 100,000s of protein phosphorylation sites 1, however we lack methods to systematically annotate their function. Phosphorylation has numerous biological functions, yet biochemically all involve changes in protein structure and interactions. These biochemical changes can be recapitulated by measuring the difference in stability between the protein and the phosphoprotein. Building on recent work, we present a method to infer phosphosite functionality by reliably measuring such differences at the proteomic scale.

To minimize technical noise derived from sample preparation, peptide samples should be labeled and mixed prior to phosphopeptide enrichment (Supplementary Discussion and accompanying manuscript 3 ). Because scaling-up isobaric chemical labeling increases reagent costs substantially, we have developed an alternative approach to identify phosphosites that alter thermal stability, that we call Dali (Fig. 1c). Dali applies the Proteome Integral Stability Alteration (PISA) method 4 , a simplified version of thermal proteome profiling 5 , in which the soluble protein from the different temperature points are combined to provide an estimation of the area under the protein melting curve. To reliably compare phosphopeptides to proteins, we normalize each measurement to a 30 o C treated proteome reference that is labeled with heavy lysine, obtaining a relative stability (R s ) measurement for phosphopeptides and proteins. This 30 o C reference is mixed in with the temperature gradient treated samples prior to protein digestion, and it is present during phosphopeptide enrichment and mass spectrometry (MS) measurement of peptides and phosphopeptides.
We applied Dali to the S. cerevisiae proteome and obtained reproducible R s measurements for proteins (average R 2 = 0.76) and phosphopeptides (average R 2 = 0.65) ( Supplementary Fig. 1a). In contrast to the Huang et al. dataset, we find that the stability of phosphopeptides correlates well with the stability of their respective proteins (R 2 =0.79 for mean R s comparisons) (Fig. 1d), suggesting that most phosphosites do not alter protein stability as also observed by Potel et al. 3 . As expected, the stability of non-modified peptides present in the phosphopeptide-enriched samples also correlated well with their proteins (R 2 = 0.90 for mean R s comparisons), indicating that R s measurements in the phosphopeptide samples and protein samples can be reliably compared ( Supplementary Fig. 1c). Finally, our analysis yielded 71 phosphopeptide isoforms out of 2,345 (3%) with significantly different thermal stability than the unmodified protein (Fig. 1e, Dataset S2). We detected several non-modified peptides with significant differences in stability, yet this set constituted a much smaller fraction than found in Huang et al. (Dataset S3). Many of these peptides (7 out of 16) were cases where the protein is known to be post-translationally processed via cleavage (e.g. RPS31 6 ) or splicing (e.g. VMA1 7 ), resulting in proteins and/or proteoforms of different thermal stability as our method measured ( Supplementary Fig. 2).
Among phosphosites that decreased protein thermal stability, we identified four sites located at protein interfaces (Ser56 on PUP2, Ser59 on ARO8, Ser79 on TPI1 and Ser201 on GAPDH) (Fig. 2a, Supplementary Fig. 3) that may act by disrupting protein-protein interactions. For example, PUP2 is the alpha 5 subunit of the 20S proteasome, and Ser56 is a known Cdc28 substrate 8 located at the protein interaction interface with PRE6, the 20S proteasome alpha 4 subunit (Fig. 2a). The stability measured for the phosphopeptide spanning Ser56 is significantly lower than the stability of PUP2, which is similar to other proteins in the 20S proteasome, suggesting Ser56 phosphorylation may dissociate PUP2 from the 20S proteasome.
We identified stabilizing phosphosites that may play a role in the protein translation process. For example, we found that phosphorylation at Ser38 on ribosomal protein RPL12/uL11 significantly increased protein stability (Fig. 2b). This phosphosite is an evolutionarily-conserved Cdc28 substrate 8 that is regulated during the cell cycle 9 and has been reported to be depleted in polysomes and influence mitotic translation 10 . Due to RPL12 location at the ribosome P-stalk and the proximity of residue Ser38 to elongation factor 2, we hypothesize that this phosphosite may modulate the interaction with EF-2 to aid ribosomal translocation during protein synthesis, and the change in conformation and binding may stabilize RPL12. We also identified a stabilizing phosphorylation on NEW1 at Thr1191 (delta R s = 1.23) (Fig. 2c). NEW1 is a translation factor that binds to the ribosome at a position analogous to eEF3 and fine-tunes the efficiency of translation termination 11 . The identified phosphosite fits the CK2 consensus motif, is located within the acidic C-terminal sequence of NEW1, and is highly conserved. A T1191A mutant has growth defects 12 suggesting that phosphorylation is important for NEW1 function.
We were able to measure many key glycolysis proteins identifying phosphosites that may modulate enzyme kinetics. For example, we measured the stability for six phosphosites on PGK1, of which only Thr331 showed significantly decreased stability (Fig. 2d). This observation agrees with the predicted stability effects of phosphomimetic substitutions on PGK1 13 (Fig. 2d). We identified a stabilizing phosphosite at Ser149 in the three GAPDH isozymes TDH1, TDH2 and TDH3 (Fig 2e). Ser149 is adjacent to catalytic Cys150 and to the binding sites of substrates glyceraldehyde-3-phosphate (G3P) and inorganic phosphate. Interestingly, Ser149 phosphorylation would occupy the inorganic phosphate binding site (Fig 2e). Additionally, it has been recently reported that a TDH3 S149A mutant exhibits a growth defect with doxorubicin compared to wild-type and decreases TDH3 activity to a greater extent than a TDH3 knockout 14 . Our results raise the possibility that S149 phosphorylation may increase the stability of apo-GAPDH, the GAPDH-G3P reaction intermediate and aid phosphate transfer by enhancing product release.
In this communication, we have outlined a novel proteomic method that enables robust thermal stability comparison between proteins and phosphorylated proteoforms. Our method identified 3% phosphosites in the S. cerevisiae proteome that significantly changed protein melting behavior, with several examples potentially altering protein conformation and interactions. Additional experiments will be needed to precisely characterize the function of these phosphosites. One limitation of this method is that the sensitivity to detect changes in stability is lower for proteins with extreme (low or high) melting temperature, which can be circumvented by performing the experiment using different temperature gradients. Our method can be extended to other model organisms and cell culture systems, as well as to other post-translational modifications, expanding the proteomic toolkit to functionally annotate dynamic protein modifications at scale.

Yeast strains
All yeast experiments were conducted on the Saccharomyces cerevisiae haploid strain BY4741 ( MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0 ), a direct descendant from FY2, which is itself a direct descendant of S288C.

S. cerevisiae growth, stable isotope labeling, and cell harvest
Two overnight yeast cultures were grown at 30 o C in synthetic complete media (SCM) containing 6.7g/L yeast nitrogen base, 2g/L of synthetic complete mix minus lysine, 2% glucose, and supplemented with either regular lysine (light culture) or 2 H 4 -lysine (heavy culture) at 0.872 mM final concentration. These cultures were used to seed three 50mL cultures of each light and heavy at OD 600 0.15, which were grown at 30 o C and 45mL were harvested at OD 600 ~ 1 by centrifugation at 7,000 x g for 10min. Yeast pellets were washed by resuspension in 1.5mL ice-cold sterile water and centrifugation in 2mL screw cap tubes at 21,000 x g for 10min; and then snap-frozen in liquid nitrogen and stored at -80 o C.

Cell lysis and protein extract temperature treatment
Frozen yeast cell pellets were resuspended in 700μL of non-denaturing lysis buffer (50mM HEPES pH 7.0, 75mM NaCl) containing 0.5X protease inhibitors (Pierce) and phosphatase inhibitors (50mM β-glycerophosphate, 10mM sodium pyrophosphate, 50mM of NaF, 1mM sodium orthovanadate) on ice. Cells were lysed by bead beating with 0.5mm zirconia/silica beads for 4 cycles of 60sec of mechanical agitation followed by 90sec rest on ice. Lysates were clarified by sequential centrifugation, first at 1,200 x g for 1min to remove the beads and then at 21,000 x g for 10min at 4 o C to remove cell debris. To bring all protein extracts to the same concentration, extract volumes were adjusted to 1 OD 600 unit from a 45mL culture in 1mL.
Each cell extract was aliquoted into 2 strips of 8 PCR tubes each (1x8 for the temperature gradient and 1x8 for the 30 o C) dispensing 50μL of protein extract per tube. All samples were initially equilibrated to 30 o C for 5 min. Temperature gradient samples were subjected to 45.6 o C, 46.8 o C, 48.3 o C, 50 o C, 52 o C, 53.6 o C, 54.9 o C, and 57 o C, one tube to each temperature, for 5min. In parallel, controls were subjected to an additional 30 o C temperature treatment for 5min. All samples were cooled down to room temperature for 10min. For each replicate, temperature gradient samples were all pooled into one tube and 30 o C controls were pooled into a separate tube prior to centrifugation at 21,000 x g for 30min at 4 o C. The soluble protein fractions for the temperature gradient and 30 o C controls were combined 2:1, three replicates with the temperature gradient labeled heavy and the 30 o C controls labeled light, and three additional replicates with the labels swapped. We generated additional controls where heavy and light 30 o C controls were combined to assess potential differences in protein expression due to the different labeling. Protein concentration was measured by the BCA assay.

Protein reduction, alkylation, LysC digestion, and desalting
Samples were diluted 2-fold with a buffer containing 8M urea, 50mM HEPES pH 8.9, 75mM NaCl, 1mM sodium orthovanadate, 50mM β-glycerophosphate, 10mM sodium pyrophosphate, 50mM NaF. Protein samples were subjected to reduction with 7.5mM dithiothreitol (DTT) for 30min at 55 o C and alkylation with iodoacetamide (22.5mM) for 30min at room temperature in the dark with agitation. The alkylation reaction was quenched with an additional 7.5mM DTT at room temperature for 30min with agitation. The pH was adjusted to 8.5 with 1M Tris pH 8.9. Lysyl endopeptidase (LysC; Wako Chemicals) was added at a 1:100 enzyme to protein ratio and protein samples were incubated overnight with agitation at room temperature. LysC digestion was quenched by addition of trifluoroacetic acid (TFA) to a final concentration of 1% and pH ~2-3 and the digests were stored at -80 o C.

Fe 3+ -NTA IMAC phosphopeptide enrichment
Phosphopeptide enrichment was conducted by immobilized iron cation affinity chromatography in batch mode and automated in a 96-well format on a KingFisher magnetic particle processor as we described 15 . For each sample, ~200μg peptides were solubilized in 70 μL 0.1% TFA, 80 % acetonitrile and incubated with 80 μL of a 5% slurry of magnetic Fe-NTA beads (Cube Biotech) in the same solvent for 30min. Beads were washed three times with 150μL 0.1% TFA, 80% acetonitrile and phosphopeptides were eluted with 50μL 50% acetonitrile, 0.37M ammonium hydroxide. Eluates were acidified with 30μL 10% formic acid, 75% acetonitrile and filtered over two-layer C18 extraction disks (Empore) packed in 200μL pipette tip, which had been previously conditioned with 50μL100% methanol, 50μL 100% acetonitrile and 50μL 70% acetonitrile, 0.25% acetic acid. Filtered peptides were collected in a mass spectrometer vial and the peptides in the extraction disk were further eluted with 50μL 70% acetonitrile, 0.25% acetic acid and collected into the same mass spectrometry vial. Phosphopeptide-enriched samples were dried by vacuum centrifugation, solubilized in 3% acetonitrile, 4% formic acid, and one third of each sample was analyzed by LC-MS/MS.

Liquid chromatography coupled to tandem mass spectrometry
Peptide samples were analyzed by nLC-MS/MS on a nanoAcquity UPLC (Waters) coupled to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher, San Jose, USA). Samples were loaded on a 100μm x 3-cm trap column packed with 3μm C18 beads (Dr. Maisch), separated on a 100μm x 30-cm capillary analytical column, packed with 1.9μm C18 beads (Dr. Maisch) and set at 50 o C, using a 90-min reversed-phase gradient of acetonitrile in 0.125% formic acid, and online analyzed by mass spectrometry using data-dependent acquisition. Each cycle consisted of 3 sec where one full MS1 scan was acquired on the orbitrap at 120,000 resolution from 300 to 1575 m/z using an AGC of 7e5 and maximum injection time of 50ms followed by MS/MS dependent scans on most intense precursor m/z ions (only considering z = 2 to 5) until exhausting the 3sec cycle time, using 1.6 m/z isolation window, HCD fragmentation at 28 normalized collision energy, and acquired at 15,000 resolution on the orbitrap with an AGC of 5e4 (peptide samples) and AGC of 1e5 (phosphopeptide samples) with a maximum injection time of 22ms. Dynamic exclusion was enabled to exclude fragmented precursors from repeated MS/MS selection for 30sec. To increase coverage, phosphopeptide samples were injected twice, and the data from the two technical replicates were combined.

Database searching, peptide quantification, phosphosite localization, and R s calculation
MS data files for proteome samples were analyzed with MaxQuant 16 (v.1.6.7.0) to obtain peptide identifications and quantifications, using the following parameters: protein sequence database S.cerevisiae downloaded from SGD in July 2014, LysC enzyme specificity (cleavage Ct to K), maximum of 2 missed cleavages, mass tolerance of 20ppm for MS1 and 20ppm for MS2, fixed modification of carbamidomethyl on cysteines, variable modifications of oxidation on methionines and acetylation on protein N-termini. Lysine residues were only allowed to be all light or all 2 H 4 -Lys within the same peptide. Phosphoproteome samples were processed in MaxQuant similarly as above, with additional variable modification of phosphorylation on serine, threonine, and tyrosine residues. All searches were combined for MaxQuant filtering set to 1% FDR at the level of peptide spectral matches and protein.
Quantification values for heavy and light peptide features were extracted from the evidence.txt file. Quantification values for features corresponding to the same peptide sequence (e.g. same peptide identified at multiple charge states or fractions) were summed up. Phosphopeptide quantification features were aggregated to the phosphopeptide isoform level by summing features corresponding to the same peptide sequence (e.g. same peptide identified at multiple charge states or replicate injections) as well as overlapping peptide sequences sharing the 6 same combination of modifications. For each phosphopeptide isoform we required the maximum localization probability to be greater than 75% for at least one site.
R s values were calculated as log 2 ratios of the quantification value for the temperature gradient treated divided by the respective quantification for the 30 o C control. Peptide R s distributions were median normalized to 0, and the same correction value derived for each replicate was applied to normalize the corresponding phosphopeptide isoform R s distributions. Peptides and phosphopeptide isoforms with the 5% highest R s standard deviation across replicates were excluded from the analysis. Protein R s values were calculated as the median of peptide R s for that protein, requiring a minimum of 2 peptides per protein, and each peptide observed in at least two replicates.
To identify phosphopeptide isoforms that have different R s than their unmodified protein counterpart, we performed a t-test comparing phosphopeptide isoform R s values (n=6) compared to protein R s values (n=6) and assuming unequal variance. Phosphopeptide isoforms and protein counterparts were required to be observed in at least two replicates. P-values were corrected for multiple hypothesis testing using the Benjamini-Hochberg method 17 . All data analysis was conducted using R.

Structure visualization and bioinformatics
Protein complex annotations were extracted from the CYC2008 resource 18 . Protein structure coordinates were downloaded from PDB and visualized and manipulated with PyMOL 19 . For PUP2 interface analysis, we extracted 20S proteasome protein structure from PDB 1RYP 20 . Protein interface structures for ARO8, TPI1, and GAPDH were extracted from PDB (4JE5 21 , 1NEY 22 , 3PYM 23 respectively). To assess the stabilizing effect of S149 phosphorylation at the catalytic site of GAPDH, we aligned crystal structures of GAPDH with bound G3P (1NQO 24 ) and inorganic phosphates (1GYP 25 ) to a NAD-bound yeast GAPDH structure (3PYM 23 ). Data on sequence conservation, protein interfaces, and predicted stability effects of mutations (ΔΔG pred ) were obtained from the mutfunc resource 13 .

Reanalysis of the Huang et al. data
Supplementary data from Huang et al. 2 was used to calculate the correlation between T m for phosphopeptides and proteins and learn about their statistical parameters. For data re-analysis, all MS files from the study were downloaded from MassIVE data repository (dataset identifier: MSV000083786), converted to open format mzXML files, and database searched with Comet 26 (v.2018.01.4) to obtain peptide and phosphopeptide identifications, with the exception of "Bulk_6_2" which failed to convert. Database search parameters were: human protein sequence database from UniProt (UP000005640), mass tolerance of 50 ppm for precursor m/z and 0.2 Da for fragment ions, trypsin enzyme specificity (cleavage Ct to K, R, except for KP, RP), maximum of 2 missed cleavages, fixed modification of carbamidomethyl on cysteines and TMT10 (+229.1629) on lysines and peptide N-termini, and variable modifications of oxidation on methionines and acetylation on protein N-termini. Phosphorylation samples included variable modification of phosphorylation at serine, threonine, and tyrosine. Search results were filtered to a PSM 1% FDR with Percolator 27 . Phosphosite localization was conducted with an in-house C++ implementation of Ascore 28 and sites with Ascore > 13 were considered confidently localized (P < 0.05). TMT10 reporter ion intensities were extracted from MS/MS scans using in-house TMT quantification software.
We attempted to replicate the analysis by Huang et al. by following the method description provided in their manuscript. Biological and technical replicates were treated equally. For each replicate, TMT reporter ion intensities for all peptide spectral matches from the proteome files were summed to the protein level, and TMT reporter ion intensities for phosphopeptide spectral matches were summed to the phosphopeptide isoform level. In addition, we used the same strategy to aggregate to peptide-level the TMT signals for PSMs mapping to the same unmodified peptide observed in the phosphopeptide-enriched samples. We implemented the TPP package in R to fit melting curves for proteins, phosphopeptide isoforms, and unmodified peptides in the phosphorylation-enriched sample. To recapitulate the reported results we had to conduct the fitting for all samples together (Supplementary Discussion). Melting curves were filtered for fitting R 2 > 0.8. T-tests were conducted by comparing T m values for phosphopeptide isoforms or unmodified peptides observed in the phosphopeptide-enriched samples to the unmodified protein T m values, assuming equal variances and without multiple hypothesis correction (as implemented by Huang et al.). Of note, our reanalysis revealed that one of the phosphoproteome technical injections for biological replicate 5 was instead a repeated MS analysis of biological replicate 4.

Data availability
The mass spectrometry proteomics data generated for this manuscript have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD016750. Reviewer username is reviewer30568@ebi.ac.uk; password 7aQ1gAbU.  Shown at the right is the structure of the 20S proteasome with PUP2 in blue, Ser56 in red, and PRE6 in grey. b, R s values for RPL12 and RPL12 S38 phosphopeptide isoform. c, R s values for NEW1 and phosphopeptide isoform containing NEW1 T1191. d, R s values for PGK1 and all measured PGK1 phosphopeptide isoforms, with significantly destabilizing phosphosite S331 shown in red. ΔΔG pred for all glutamic acid phosphomimetic substitutions were obtained from mutfunc, with ΔΔG pred > 2 considered likely destabilizing. e, GAPDH S149 phosphopeptide is shared across all GAPDH paralogs (TDH1, TDH2, and TDH3). Boxplot shows R s values and distributions for peptides unique to one isoform (TDH1, TDH2, TDH3), peptides shared among all GAPDH isoforms (all), all peptides for TDH3, and the S149 phosphopeptide isoform. Bottom panel shows localization of S149 on GAPDH structure near the binding site of the enzyme substrate. All boxplots show results from 6 biological replicates. Peptides derived from the proteome samples are colored in gray and significant unmodified peptides found in the phosphopeptide-enriched sample are in red. b, Similar plot to (a) for RPS31, which is cleaved to generate ubiquitin (1-76 amino acid segment) and 40S ribosomal protein S31 (77-152 amino acid segment) proteins 6 . Figure 3. Examples of phosphosites that alter protein thermal stability and are located at protein interfaces. R s boxplots for a, ARO8 S59, b, TPI1 S79, and c, GAPDH S201 phosphopeptide isoforms and their protein counterparts. ARO8 S59, TPI1 S79, and GAPDH S201 reside at dimerization interfaces as shown in the structures to the right (PDB accession: 4JE5, 1NEY, and 3PYM respectively). Phosphomimetic mutations ARO8 S59E and TPI1 S79E are predicted to disrupt protein interfaces (ΔΔG pred = 3.78 and ΔΔG pred =8.04 respectively). Additionally, TPI1 S79E mutation is predicted to alter protein conformational stability (ΔΔG pred = 2.39). ΔΔG pred > 2 is predicted to be destabilizing 12 . Figure 4. Phosphosites significantly altering protein thermal stability using two different statistical settings. Volcano plots showing ΔT m for mean phosphopeptide isoform to mean protein counterpart in the x-axis, and the t-test probability in the y-axis. a, Huang et al. implementation shows a p-value because multiple hypothesis correction was not applied. Significant phosphopeptide isoforms (blue) are defined by p-value < 0.05 . b, Our proposed analysis consolidates data from MS reanalysis prior to statistical testing, which is performed assuming unequal variances between phosphopeptide isoform and proteins and corrects p-values for multiple hypothesis testing. Significant phosphopeptide isoforms (blue) are defined by q-value < 0.05.   ∆T m (phosphoisoform-protein) (ºC)