Abstract
The current emergence of the novel coronavirus pandemic caused by SARS-CoV-2 demands the development of new therapeutic strategies to prevent rapid progress of mortalities. The coronavirus spike (S) protein, which facilitates viral attachment, entry and membrane fusion is heavily glycosylated and plays a critical role in the elicitation of host immune response. The spike protein is comprised of two protein subunits (S1 and S2), which together possess 22 potential N-glycosylation sites. Herein, we report the mapping of glycosylation on spike protein subunits S1 and S2 expressed on human cells through high resolution mass spectrometry. We have characterized the quantitative N-glycosylation profile on spike protein and interestingly, observed two unexpected O-glycosylation modifications on the receptor binding domain (RBD) of spike protein subunit S1. Even though O-glycosylation has been predicted on the spike protein of SARS-Cov-2, this is the first report of experimental data for both the site of O-glycosylation and identity of the O-glycans attached on the subunit S1. Our data on the N- and O-glycosylation is strengthened by extensive manual interpretation of each glycopeptide spectra in addition to using bioinformatics tools to confirm the complexity of glycosylation in the spike protein. The elucidation of the glycan repertoire on the spike protein provides insights into the viral binding studies and more importantly, propels research towards the development of a suitable vaccine candidate.
Introduction
The current major health crisis is caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that rapidly spread globally within weeks in early 2020. This highly transmissible infectious disease causes a respiratory illness named COVID-19.1, 2 As of the 31st of March 2020, 750890 cases of COVID-19 and 36405 COVID-19-related deaths have been confirmed globally by the World Health Organization (WHO). 3
To date, no specific medical treatments or vaccines for COVID-19 have been approved.4, 5 Therefore, the scientific community is expending great effort in compiling data regarding the virus, as well as the respiratory illness caused by it, to find effective ways of dealing with this health crisis.
The pathogenic SARS-CoV-2 enters human target cells via its viral transmembrane spike (S) glycoprotein. The spike protein is a trimeric class I fusion protein, and consists of two subunits, namely S1 and S2. The S1 subunit facilitates the attachment of the virus, and subsequently the S2 subunit allows for the fusion of the viral and human cellular membranes.6-8 The entry receptor for SARS-CoV-2 has been identified as the human angiotensin-converting enzyme 2 (hACE2), and recent studies determined a high binding affinity to hACE2.6, 7, 9 Given its literal key role, the S protein is one of the major targets for the development of specific medical treatments or vaccines: Neutralizing antibodies targeting the spike proteins of SARS-CoV-2 could prevent the virus from binding to the hACE2 entry receptor and therefore from entering the host cell.9
Each monomer of the S protein is highly glycosylated with 22 predicted N-linked glycosylation sites. Furthermore, four O-glycosylation sites were also predicted.10 Cryo-electron microscopy (Cryo-EM) provides evidence for the existence of 14–16 N-glycans on 22 potential sites in the SARS-CoV-2 S protein.7, 11 The glycosylation pattern of the spike protein is a crucial characteristic to be considered regarding steric hindrance, chemical properties and even as a potential target for mutation in the future. The N-glycans on S protein play important roles in proper protein folding and priming by host proteases. Since glycans can shield the amino acid residues and other epitopes from cells and antibody recognition, glycosylation can enable the coronavirus to evade both the innate and adaptive immune responses.7, 12 Elucidating the glycosylation of the viral S protein can aid in understanding viral binding with receptors, fusion, entry, replication and also in designing suitable antigens for vaccine development.13-15 Strategies for vaccine development aim to elicit such adaptive immunity through an antibody response at the sites of viral entry.16
Here, we report the site-specific quantitative N-linked and O-linked glycan profiling on SARS-CoV-2 subunit S1 and S2 protein through glycoproteomics using high resolution LC-MS/MS. We used recombinant SARS-CoV-2 subunit S1 and S2 expressed in human cells, HEK 293 and observed partial N-glycan occupancy on 17 out of 22 N-glycosylation sites. We found that the remaining five N-glycosylation sites were unoccupied. Remarkably, we have unambiguously identified 2 unexpected O-glycosylation sites at the receptor binding domain (RBD) of subunit S1. O-glycosylation on the spike protein of SARS-CoV-2 is predicted in several recent reports and most of these predictions are for sites in proximity to furin cleavage site (S1/S2) as similar sites are O-glycosylated in SARS-CoV-1.10 However, we observed O-glycosylation at two sites on the RBD of spike protein subunit S1 and this is the first report on the evidence for such glycan modification at a crucial viral attachment location. Site-specific analysis of N- and O-glycosylation information of SARS-CoV-2 spike protein provides basic understanding of the viral structure, crucial for the identification of immunogens for vaccine design. This in turn has the potential of leading to future therapeutic intervention or prevention of COVID-19.
Materials and methods
Dithiothreitol (DTT) and iodoacetamide (IAA) were purchased from Sigma Aldrich (St. Louis, MO). Sequencing-grade modified trypsin and chymotrypsin were purchased from Promega (Madison, WI). All other reagents were purchased from Sigma Aldrich unless indicated otherwise. Data analysis was performed using Byonic 3.5 software and manually using Xcalibur 4.2. The SARS-CoV-2 spike protein subunit 1 (Cat No. 230-20407) and subunit 2 (Cat No. 230-20408) were purchased from RayBiotech (Atlanta, GA).
Protease digestion and extraction of peptides from SDS-PAGE
The protein subunits S1 and S2 as HEK 293 culture supernatants were fractionated on separate lanes using SDS-PAGE. The gel was stained by Coomassie dye and the bands corresponding to subunit 1 (200 to 100 kDa) and subunit 2 (150 to 80 kDa) were cut into smaller pieces (1 mm squares approx.) and transferred to clean tubes. The gel pieces were de-stained by adding 100 µL acetonitrile (ACN): 50mM NH4HCO3 (1:1) and incubated at room temperature (RT) for about 30 min. Tubes were centrifuged, the supernatant was discarded, and 100 µL ACN was added before incubation for 30 min. The proteins on gel pieces were reduced by adding 350 µL 25 mM DTT and incubating at 60 °C for 30 min. The tubes were cooled to RT and the supernatant removed. The gels were washed with 500 µL of ACN, 350 µL 90 mM IAA was added, and the mixture was incubated at RT for 20 min in the dark. Proteins were digested by adding sequence-grade trypsin and/or chymotrypsin (we performed digests with both enzymes individually and as a cocktail) in digestion buffer (50mM NH4HCO3) for 18 h at 37 °C separately. The peptides were extracted out from the gel by addition of 1:2 H2O:ACN containing 5% Formic acid (500 µL), and the released peptides were speed-dried. The samples were reconstituted in aqueous 0.1% formic acid for LC-MS/MS experiments.
Data acquisition of protein digest samples using nano-LC-MS/MS
The glycoprotein digests were analyzed on an Orbitrap Fusion Tribrid mass spectrometer equipped with a nanospray ion source and connected to a Dionex binary solvent system (Waltham, MA). Pre-packed nano-LC columns of 15 cm length with 75 µm internal diameter (id), filled with 3 µm C18 material (reverse phase) were used for chromatographic separation of samples. The precursor ion scan was acquired at 120,000 resolution in the Orbitrap analyzer and precursors at a time frame of 3 s were selected for subsequent MS/MS fragmentation in the Orbitrap analyzer at 15,000 resolution. The LC-MS/MS runs of each digest were conducted for both 72 min and 180 min in order to separate the glycopeptides. The threshold for triggering an MS/MS event was set to 1000 counts, and monoisotopic precursor selection was enabled. MS/MS fragmentation was conducted with HCD (Higher-energy Collisional Dissociation) product triggered CID (Collision-Induced Dissociation) (HCDpdCID) program. Charge state screening was enabled, and precursors with unknown charge state or a charge state of +1 were excluded (positive ion mode). Dynamic exclusion was enabled (exclusion duration of 30 secs).
Data analysis of glycoproteins
The LC-MS/MS spectra of tryptic, chymotryptic and combined tryptic /chymotryptic digests of glycoproteins were searched against the .fasta sequence of spike protein S1 and S2 subunit using the Byonic software by choosing appropriate peptide cleavage sites (semi-specific cleavage option enabled). Oxidation of methionine, deamidation of asparagine and glutamine, possible common human N-glycans and O-glycan masses were used as variable modifications. The LC-MS/MS spectra were also analyzed manually for the glycopeptides with the support of the Xcalibur software. The HCDpdCID MS2 spectra of glycopeptides were evaluated for the glycan neutral loss pattern, oxonium ions and glycopeptide fragmentations to assign the sequence and the presence of glycans in the glycopeptides.
Results and Discussion
Studies over the past two decades have shown that glycosylation on the protein antigens can play crucial roles in the adaptive immune response. Thus, it is obvious that the glycosylation on the protein antigen is relevant for the development of vaccines, and it is widely accepted that the lack of information about the glycosylation sites hampers the design of such vaccines.17
Mapping N-glycosylation on SARS-CoV-2 spike protein
We have procured culture supernatants of HEK 293 cells expressing SARS-CoV-2 subunit 1 and subunit 2 separately. The proteins were expressed with a His tag with Val16 to Gln690 for subunit 1 and Met697 to Pro1213 for subunit 2. According to manufacturers, SDS-PAGE of the proteins showed a higher molecular weight than the predicted 75 and 60 kDa, respectively, due to glycosylation. Since the proteins were unpurified, we fractionated them through SDS-PAGE on separate lanes and cut the bands corresponding to subunit 1 and subunit 2. The gels were stained with Coomassie dye, and gel bands were cut into small pieces, de-stained, reduced, alkylated and subjected to in-gel protease digestion. We employed trypsin, chymotrypsin, and both trypsin-chymotrypsin in combination to generate glycopeptides that contain a single N-linked glycan site. The glycopeptides were further analyzed by high resolution LC-MS/MS, using a glycan oxonium ion product dependent HCD triggered CID program. The LC-MS/MS data were analyzed using Byonic software, each detected spectrum was manually validated, and false detections eliminated.
We identified the glycan compositions at 17 out of the 22 predicted N-glycosylation sites of the SARS-CoV-2 S1 and S2 proteins and found the remaining five sites unoccupied (Figure 2, 3). We observed both high mannose and complex-type glycans across the N-glycosylation sites but found no hybrid type N-glycans. We quantified the relative intensities of glycans at each site by comparing the area under the curve of each glycopeptide peak on the LC-MS chromatogram. A recent preprint investigated the N-glycosylation on SARS-CoV-2 and reported prevalence of hybrid-type glycans.18 In contrast, we observed a combination of high mannose and complex-type, but no hybrid-type glycans on most of the sites. We discovered predominantly highly processed sialylated complex-type glycans on sites N165, N282, N801, and N1098 (Figure 3, 4). The highly sialylated glycans at N234 and N282, which are at the RBD can act as determinant in viral binding with hACE2 receptors.6, 7, 19 Similar to one recent report, we observed Man5GlcNAc2 as a predominant structure across all sites.18 However, we observed significant unoccupied sites on both S1 and S2 subunits. Sites N17, N603, N1134, N1158 and N1173 were completely unoccupied, although further studies with higher concentration and purity of proteins are required to validate this finding (Figure 1, 2). On subunit S2, the detection of N-glycosylation at sites N709, N717, and N1134 was ambiguous as the quality of the MS/MS spectra was not satisfactory and we are currently evaluating the possibilities of other post translational modifications adjacent to these sites.
Identification of O-Glycosylation on SARS-CoV-2 spike protein
O-glycans, which are involved in protein stability and function, have been observed on some viral proteins and have been suggested to play roles in the biological activity of viral proteins.10, 20 A comparative study of human SARS-CoV-2 S protein with other coronavirus S proteins has shown that Ser673, Thr678, and Ser686 are conserved O-glycosylation locations, and SARS-CoV-2 S1 protein was suggested to be O-glycosylated at these locations.10 Although it is unclear what function these predicted O-linked glycans perform, they have been suggested to create a ‘mucin-like domain’ which could shield SARS-CoV-2 spike protein epitopes or key residues.20 Since some viruses can utilize mucin-like domains as glycan shields for immunoevasion, researchers have highlighted the importance of experimental studies for the determination of predicted O-linked glycosylation sites.10, 20 We evaluated the O-glycosylation site prediction using widely accepted tool Net-O-Gly server 4.0.21 However, did not find any strong prediction for the O-glycosylation except for the sites Ser673, Thr678 and Ser686.
Nevertheless, we have identified O-glycosylation at sites Thr323 and Ser325 on the S1 subunit of SARS-CoV-2 spike protein through high resolution mass spectrometry glycoproteomic profiling (Figure 5, 6). Since O-glycosylation at Thr323 and Ser325 on the spike protein has not been reported before and is not indicated based on cryo-electron microscopy data of SARS-CoV-2 S protein, we evaluated the detected O-glycopeptide manually.7 We observed very strong evidence for the presence of O-glycosylation at sites Thr323 and Ser325 as b and y ions of the peptide 320VQPTESIVR328 with high mass accuracy. Upon manual validation of the fragment ions of O-glycopeptide 320VQPTESIVR328 we observed that Thr323 is the predominantly occupied site. This conclusion was based on the b (b1-m/z 228.13) and y (y1-m/z 175.12, y2-m/z 274.19, y4-474.30, y5-603.34 m/z, and y7+glycan-1748.77 m/z) ions we detected upon fragmentation of the glycopeptide (Figure 6). In addition, neutral losses and the detection of oxonium ions also confirmed the presence of glycosylation on these peptides. Core-1 mucin type O-glycans such as HexNAc, HexNAcHex and HexNAcHexNeuAc2 were observed on site Thr323. Interestingly, Ser325 is found occupied with HexNAcHexNeuAc only when Thr323 is glycosylated with the same HexNAxHexNeuAc glycan (Figure 5). Intriguingly, the possible O-glycosylation at Thr323 of SARS-CoV-2 subunit 1 glycoprotein has been predicted by computational analysis in a very recent preprint report.22 The accuracy of our observation of O-glycosylation at Thr323 is further confirmed by the presence of proline at location 322, considering the well-established fact that the frequency of occurrence of proline residues is higher adjacent to O-glycosylation sites.23
Cryo-EM studies on SARS-CoV-2 indicate that the binding of S protein to the hACE2 receptor primarily involves extensive polar residue interactions between RBD and the peptidase domain of hACE2.6, 7 The S protein RBD located in the S1 subunit of SARS-CoV-2 undergoes a hinge-like dynamic movement to enhance the capture of the receptor RBD with hACE2, displaying 10– 20-fold higher affinity for the human ACE2 receptor than SARS-CoV-1, which partially explains the higher transmissibility of the new virus.11, 24 The residues Thr323 and Ser325 are located at the RBD of the S1 subunit of SARS-Cov-2, and thus the O-glycosylation at this location could play a critical role in viral binding with hACE2 receptors.6, 10 Our observation will pave the way for future studies to understand the implication of O-glycosylation at the RBD of S1 protein in viral attachment with hACE2 receptors.
Two very recent preprints reporting N-glycosylation on spike protein of SARS-CoV-2 showed different glycosylation profiles, and both reports were different from our results although all three studies utilized recombinant S protein from an HEK 293-based expression system.18, 25 This indicates the caution to be exercised in determining the glycosylation on viral antigens generated from various sources as the changes in glycosylation pattern can influence the efficacy of potential vaccine candidates.26
Our comprehensive N- and O-glycosylation characterization of SARS-CoV-2 expressed in a human cell system provides insights into site-specific N- and O-glycan decoration on the trimeric spike protein. We have employed extensive manual interpretation strategy for the assignment of each glycopeptide structure in order to eliminate possibilities of ambiguous software based annotation. We are currently working on elucidating other potential post translational modifications on SARS-CoV-2 spike protein as understanding the protein modifications in detail is important to guide future researches on disease interventions involving spike protein. Detailed glycan analysis is important for the development of glycoprotein-based vaccine candidates as a means to correlate the structural variation with immunogenicity. Glycosylation can serve as a measure to evaluate antigen quality as various expression systems and production processes are employed in vaccine manufacture. The understanding of complex sialylated N-glycans and sialylated mucin type O-glycans, particularly in the RBD domain of the spike protein of SARS-CoV-2, provides basic knowledge useful for elucidating the viral infection pathology in future therapeutic possibilities, as well as in the design of suitable immunogens for vaccine development.
Contributions
A.S. and P.A. conceived of the paper; A.S., N.S. and A.G. contributed equally and performed experiments; everyone contributed toward writing the paper; P.A. monitored the project.
Competing interests
The authors certify that they have no competing interests.
Supplementary material
Annotated MS/MS spectra are incorporated as a separate supplementary material file.
Acknowledgements
Financial support from the US National Institutes of Health (S10OD018530) is gratefully acknowledged. This work was also supported in part by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, under Award DE-SC0015662 to DOE - Center for Plant and Microbial Complex Carbohydrates at the Complex Carbohydrate Research Center, USA.