ABSTRACT
The global pandemic of severe acute pneumonia syndrome (COVID-19) caused by SARS-CoV-2 urgently calls for prevention and intervention strategies. The densely glycosylated spike (S) protein highly exposed on the viral surface is a determinant for virus binding and invasion into host cells as well as elicitation of a protective host immune response. Herein, we characterized the site-specific N-glycosylation of SARS-CoV-2 S protein using stepped collision energy (SCE) mass spectrometry (MS). Following digestion with two complementary proteases to cover all potential N-glycosylation sequons and integrated N-glycoproteomics analysis, we revealed the N-glycosylation profile of SARS-CoV-2 S proteins at the levels of intact N-glycopeptides and glycosites, along with the glycan composition and site-specific number of glycans. All 22 potential canonical N-glycosites were identified in S protein protomer. Of those, 18 N-glycosites were conserved between SARS-CoV and SARS-CoV-2 S proteins. Nearly all glycosites were preserved among the 753 SARS-CoV-2 genome sequences available in the public influenza database Global Initiative on Sharing All Influenza Data. By comparison, insect cell-expressed SARS-CoV-2 S protein contained 38 N-glycans, which were primarily assigned to the high-mannose type N-glycans, whereas the human cell-produced protein possessed up to 140 N-glycans largely belonging to the complex type. In particular, two N-glycosites located in the structurally exposed receptor-binding domain of S protein exhibited a relatively conserved N-glycan composition in human cells. This N-glycosylation profiling and determination of differences between distinct expression systems could shed light on the infection mechanism and promote development of vaccines and targeted drugs.
Introduction
In the last month of 2019, a novel coronavirus (SARS-CoV-2) emerged and rapidly spread, developing into an epidemic of severe acute pneumonia syndrome (COVID-19), which engulfed the city of Wuhan and Hubei Province of China. The virus quickly swept through the entirety of China and subsequently the outbreak came out in Asia and the entire world within a couple of months. Similar to SARS-CoV and MERS-CoV that emerged in 2002 and 2013, respectively, SARS-CoV-2 is highly transmissible from infected individuals, even without symptoms, to healthy humans and can cause lethal respiratory symptoms1–3. The World Health Organization has declared the spread of SARS-CoV-2 a Public Health Emergency of International Concern as over 160 countries have reported confirmed cases. From SARS-CoV to SARS-CoV-2, the periodic outbreak of coronavirus infection in humans urgently calls for prevention and intervention measures. However, there are no approved vaccines or effective antiviral drugs for either SARS-CoV or SARS-CoV-2.
Decoding the critical component and molecular characteristics of the virus is the key to developing a cure strategy. SARS-CoV-2 is a single-stranded RNA virus. RNA sequencing revealed that SARS-CoV-2 belongs to the beta-coronavirus genus and is most closely related to SARS-CoV, with a genome size of approximately 30 kb encoding 15 non-structural proteins, 4 structural proteins, and 8 auxiliary proteins4,5. The structural proteins of mature SARS-CoV-2 include spike (S) protein, envelope (E) protein, membrane (M) protein, and nucleocapsid (N) protein2. Theoretically, all structural proteins can serve as antigens for vaccine development or targets for anti-viral treatment. Of these proteins, the transmembrane S protein protruding from the virus surface is highly exposed and responsible for invasion into host cells, which has attracted the special attention of researchers. S protein is homotrimeric and is highly glycosylated on the virus surface, allowing for binding to the angiotensin converting enzyme II (ACE2) receptor on host cells to promote the fusion of viral and host cellular membranes6,7. Given its indispensable role in virus entry and infectivity, S protein is a promising target for vaccine design and drug discovery to block the interaction between the virus and host cells. S protein has been revealed as a crucial antigen for raising neutralizing antibodies and eliciting protective humoral as well as cellular immunity upon infection or vaccination with S protein-based vaccines8–11.
Typically, S protein has an ectodomain linked to a single-pass transmembrane anchor and a short C-terminal intracellular tail12. The ectodomain comprises a receptor-binding S1 subunit and a membrane-fusion S2 subunit. Following attachment to the host cell surface via S1, SARS-CoV-2 S protein is cleaved at the S1/S2 boundary sites together with the junction site and the S2’ site by host cellular proteases for S protein priming, consequently mediating membrane fusion driven by S2 and making way for the viral genetic materials to enter the host cell10. Based on cryoelectron microscopy (Cryo-EM) observations, recognition of S protein to the ACE2 receptor primarily involves extensive polar residue interactions between the SARS-CoV-2 receptor binding domain (RBD) and the peptidase domain of ACE27,13. The S protein RBD is located in the S1 subunit and undergoes a hinge-like dynamic movement to capture the receptor through the interaction of three group residue clusters between the RBD and ACE2. Compared to S protein of SARS-CoV, that of SARS-CoV-2 displays up to 10–20-fold higher affinity for the human ACE2 receptor, which partially explains the higher transmissibility of this new virus7,13.
In addition to the structural information and core amino acid residues for receptor binding, SARS-CoV-2 S protein possesses 22 potential N-linked glycosylation motifs (N-X-S/T, X≠P) in each monomer. The N-glycans on S protein play a pivotal role in proper protein folding and protein priming by host proteases. Importantly, glycosylation is an underlying mechanism for coronavirus to evade both the innate and adaptive immune responses, as the glycans might shield the amino acid residues from cell and antibody recognition10,11,14. Cryo-EM observations provided evidence of the existence of glycans on 14–16 of 22 potential sites in SARS-CoV-2 S protein7,10. However, these glycosites and glycans need to be experimentally identified in detail. Glycosylation analysis via glycopeptides can provide insight into the N-glycan microheterogeneity of a specific site, as variation in site-specific glycosylation levels can be greater than that at the protein level15. Therefore, further identification of site-specific N-glycosylation information of SARS-CoV-2 S protein, including intact N-glycopeptides, glycosites, glycan composition, and the site-specific number of glycans, could be meaningful to obtain a deeper understanding of the mechanism of virus invasion, providing guidance for vaccine design and antiviral therapeutics development10,16
Herein, we characterized the site-specific N-glycosylation of recombinant SARS-CoV-2 S proteins by combined analysis of intact and deglycosylated N-glycopeptides using tandem mass spectrometry (MS/MS). Based on this integrated method, we identified 22 potential canonical N-glycosites and their corresponding N-glycans from the recombinant ectodomain (residues 16–1213) expressed in insect cells. For comparison, glycosylation of the recombinant S1 subunit (residues 16–685) expressed in human cells was resolved in parallel. All of these glycosites were found to be highly conserved among 753 SARS-CoV-2 genome sequences from the Global Initiative on Sharing All Influenza Data (GISAID) database. These detailed glycosylation profiles decoded from MS/MS analysis are complementary to those observed from Cryo-EM and might help in the development of vaccines and therapeutic drugs. The raw MS data are publicly accessible at ProteomeXchange (ProteomeXchange.com) under accession number PXD018068.
Materials and Methods
Materials and chemicals
Dithiothreitol (DTT), iodoacetamide (IAA), formic acid (FA), trifluoroacetic acid (TFA), Tris base, and urea were purchased from Sigma (St. Louis, MO, USA). Acetonitrile (ACN) was purchased from Merck (Darmstadt, Germany). The zwitterionic hydrophilic interaction liquid chromatography (Zic-HILIC) materials were purchased from Fresh Bioscience (Shanghai, China). Commercially available recombinant SARS-CoV-2 S protein (S1+S2 ECD, His tag) expressed in insect cells via baculovirus and S protein (S1, His tag) expressed in human embryonic kidney (HEK293) cells were purchased from Sino Biological (Beijing, China). Sequencing-grade trypsin and Glu-C were obtained from Enzyme & Spectrum (Beijing, China). The quantitative colorimetric peptide assay kit was purchased from Thermo Fisher Scientific (Waltham, MA, USA). Deionized water was prepared using a Milli-Q system (Millipore, Bedford, MA, USA). All other chemicals and reagents of the best available grade were purchased from Sigma-Aldrich or Thermo Fisher Scientific.
Protein digestion
The recombinant S proteins were proteolyzed using an in-solution protease digestion protocol. In brief, 50 μg of protein in a tube was denatured for 10 min at 95 °C. After reduction by DTT (20 mM) for 45 min at 56 °C and alkylating with IAA (50 mM) for 1 h at 25 °C in the dark, 2 μg of protease (trypsin or/and Glu-C) was added to the tube and incubated for 16 h at 37 °C. After desalting using a pipette tip packed with a C18 membrane, the peptide concentration was determined using a peptide assay kit based on the absorbance measured at 480 nm. The peptide mixtures (intact N-glycopeptides before enrichment) were freeze-dried for further analysis.
Selective enrichment of intact N-glycopeptides
Intact N-glycopeptides were enriched with Zic-HILIC materials (Fresh Bioscience, Shanghai, China). Specifically, 20 μg of peptides was resuspended in 100 μL of 80% ACN/0.2% TFA solution, and 2 mg of processed Zic-HILIC was added to the peptide solution and rotated for 2 h at 37 °C. Finally, the mixture was transferred to a 200-μL pipette tip packed with the C8 membrane and washed twice with 80% ACN/0.2% TFA. After enrichment, intact N-glycopeptides were eluted three times with 70 μL of 0.1% TFA and dried using SpeedVac for further analysis.
Deglycosylation
Enriched intact N-glycopeptides were digested using 1 U PNGase F dissolved in 50 μL of 50 mM NH4HCO3 for 2 h at 37 °C. The reaction was terminated by the addition of 0.1% FA. The deglycosylated peptides were dried using SpeedVac for further analysis.
Liquid chromatography-MS/MS analysis
All samples were analyzed by SCE-higher-energy collisional dissociation (HCD)-MS/MS using an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific). In brief, intact N-glycopeptides before or after enrichment and deglycosylated peptides were dissolved in 0.1% FA and separated on a column (ReproSil-Pur C18-AQ, 1.9 μm, 75 μm inner diameter, length 20 cm; Dr Maisch) over a 78-min gradient (buffer A, 0.1% FA in water; buffer B, 0.1% FA in 80% ACN) at a flow rate of 300 nL/min. MS1 was analyzed with a scan range (m/z) of 800–2000 (intact N-glycopeptides before or after enrichment) or 350–1550 (deglycosylated peptides) at an Orbitrap resolution of 120,000. The RF lens, AGC target, maximum injection time, and exclusion duration were 30%, 2.0 e4, 100 ms, and 15 s, respectively. MS2 was analyzed with an isolation window (m/z) of 2 at an Orbitrap resolution of 15,000. The AGC target, maximum injection time, and HCD type were standard, 250 ms, and 30%, respectively. The stepped collision mode was turned on with an energy difference of ±10%.
Data analysis
The raw data files were searched against the SARS-CoV-2 S protein sequence using Byonic software (version 3.6.0, Protein Metrics, Inc.)17 with the mass tolerance for precursors and fragment ions set at ±10 ppm and ±20 ppm, respectively. Two missed cleavage sites were allowed for trypsin or/and Glu-C digestion. The fixed modification was carbamidomethyl (C), and variable modifications included oxidation (M), acetyl (protein N-term), and deamidation (N). In addition, 38 insect N-glycans or 182 human N-glycans were specified as N-glycan modifications for intact N-glycopeptides before or after enrichment. We then checked the protein database options, including the decoy database. All other parameters were set at the default values, and protein groups were filtered to a 1% false discovery rate based on the number of hits obtained for searches against these databases. Stricter quality control methods for intact N-glycopeptides and peptide identification were implemented, requiring a score of no less than 200 and at least 7 amino acids to be identified. Furthermore, all of these peptide spectrum matches (PSMs) and glycopeptide-spectrum matches (GPSMs) were examined manually and filtered using the following standard criteria: PSMs were accepted if there were at least 3 b/y ions in the peptide backbone, and GPSMs were accepted if there were at least two glycan oxonium ions and at least 3 b/y ions in the peptide backbone. N-glycosite conservation analysis was performed using R software packages. Model building based on the Cryo-EM structure (PDB: 6VSB) of SARS-CoV-2 S protein was performed using PyMOL.
Results and Discussion
Strategy for site-specific N-glycosylation characterization
Previous studies have revealed that glycosylated coronavirus S protein plays a critical role in the induction of neutralizing antibodies and protective immunity. However, the glycans decorated on S protein might also shield the protein surface and lead to virus evasion from the immune system8,11,14. Herein, we aimed to decode the detailed site-specific N-glycosylation profile of SARS-CoV-2 S protein. A commercial S protein ectodomain expressed by the baculovirus expression vector in insect cells was first used to analyze the glycosylation patterns, since the baculovirus vector can express a large protein without resulting in splicing of the native S protein via host proteases10,18. The recombinant SARS-CoV-2 S ectodomain contains 1209 amino acids (residues 16–1213) that were translated from the complete genome (GenBank: MN908947.3)19 and 22 putative N-glycosites (motif N-X-S/T, X≠P). Analysis of the theoretical enzymatic peptides showed that trypsin (hydrolyzing proteins at K and R) alone did not produce a sufficient amount of appropriate peptides to cover all potential N-glycosites. The missing potential N-glycosites could be found by introducing the endoproteinase Glu-C (hydrolyzing proteins at D in ammonium bicarbonate solution)20. Hence, we took advantage of this complementary trypsin and Glu-C digestion approach (Figure 1). Since the N-glycan compositions of S protein expressed in insect cells would be different from those of the native S protein expressed in human host cells, despite insect cells mimicking the process of mammalian cell glycosylation18, the recombinant SARS-CoV-2 S protein S1 subunit expressed in human cells was obtained for comparison. S2 subunit N-glycosylation sequons are conserved among SARS-CoV-2 and SARS-CoV and have been confirmed in previous studies10,11. The S1 subunit contains 681 amino acids (residues 16–685) and 13 potential N-glycosites. Analysis of the theoretical enzymatic peptides showed that trypsin digestion alone would produce a sufficient amount of appropriate peptides to cover all potential N-glycosites on the S1 subunit.
In general, the relative content of N-glycosylated peptides in a glycoprotein is low; hence, enrichment of intact N-glycopeptides is necessary21. For this purpose, we used Zic-HILIC materials to enrich intact glycopeptides through hydrophilic interactions owing to their high selectivity and reproducibility22. However, due to the microheterogeneity (different glycans attached to the same glycosite) and macroheterogeneity (glycosite occupancy) of glycosylation23, there are no materials available that can capture all glycopeptides without preference24,25. For these reasons, site-specific glycosylation was determined based on a combined analysis of the intact N-glycopeptides before and after enrichment and the deglycosylated peptides following enrichment using SCE-HCD-MS/MS26,27 (Figure 1). Analysis of intact N-glycopeptides before enrichment can retrieve the missing intact N-glycopeptides from Zic-HILIC materials, while detection of deglycosylated peptides can simultaneously confirm the N-glycosites. Therefore, integration of complementary digestion and N-glycoproteomics analysis from three levels is a promising approach to comprehensively and confidently profile the site-specific N-glycosylation of recombinant SARS-CoV-2 S proteins.
N-glycosite profiling of recombinant SARS-CoV-2 S proteins
S protein produced by the baculovirus insect cell expression system contains 22 potential N-glycosites. Using our integrated analysis method described above, 20 N-glycosites were assigned unambiguously with high-quality (score ≥ 200) spectral evidence (Figure 2A, Table S1 and Table S2). Two N-glycosites (N17 and N1134) were ambiguously assigned with relatively lower spectral scores (score < 200) (Figure S1). However, the N-glycosite N1134 has been reported in the Cryo-EM structure of SARS-CoV-2 S protein10. In addition, three non-canonical motifs of N-glycosites (N164, N334, and N536) involving N-X-C sequons were not N-glycosylated. Before enrichment, 11 N-glycosites from trypsin-digested peptides and 9 N-glycosites from Glu-C-digested peptides were assigned unambiguously, whereas hydrophilic enrichment resulted in an increase of these glycosites to 14 and 11, respectively (Table S1).
To further assess the necessity for enrichment, we compared the spectra of two intact N-glycopeptides (N61 and N74) before and after enrichment. Without interference from the non-glycosylated peptides, the intact N-glycopeptide had more fragmented ions assigned to N-glycosites after enrichment (Figure S2). Exceptionally, the intact N-glycopeptide containing an N-glycosite (N17) was missed after enrichment, presumably because of the selectivity of Zic-HILIC (Table S1).
Complementary digestion with trypsin and Glu-C promoted the confident identification of two N-glycosites (N709 and N717) on an intact N-glycopeptide (Figure S3). The introduction of Glu-C digestion resulted in the production of a short intact N-glycopeptide containing 23 amino acids, which is more suitable for achieving good fragmentation than the long peptide of 48 amino acids obtained from trypsin digestion (Figure S3). Similarly, the N-glycosylation analysis strategy combining intact N-glycopeptides with deglycosylated peptides improved the identification of N-glycosite N234, which was ambiguously assigned in the spectrum of the intact N-glycopeptides alone (Figure S4).
For the recombinant protein S1 subunit expressed in human cells, 12 out of 13 N-glycosites were assigned unambiguously with high-quality spectral evidence. One N-glycosite (N17) was assigned ambiguously with a relatively low spectral score (Figure 2B, Table S2 and Table S3). The relatively low spectral evidence of two N-glycosites (N17 and N1134) indicate the existence of low-frequency glycosylation on the these ambiguous glycosites since deglycosylation failed to improve the identification of all the two sites. Finally, using this strategy, we profiled all 22 potential N-glycosites of S protein. These sites were preferentially distributed in the S1 subunit of the N-terminus and in the S2 subunit of the C-terminus, including two sites in the RBD (Figure 2A). To visualize N-glycosylation on the protein structure, all of the experimentally determined N-glycosites were hand-marked on the surface of trimeric S protein following refinement of the recently reported SARS-CoV-2 S protein Cryo-EM structure (PDB: 6VSB) (Figure 2C)7.
Based on these findings, we further analyzed the conservation of the glycosites among 753 SARS-CoV-2 genome sequences from the GISAID database. After removal of redundant sequences of S protein at the amino acid residue level, we found a very low frequency of alterations in 38 residue sites uniformly spanning over the full length of S protein among 145 protein variants, with the exception of the substitution G614D, which was found at relatively high frequency in 47 variants (Table S4). However, nearly all of the 22 N-glycosylated sequons were conserved in S protein, except for loss of the N717 glycosite due to the T719A substitution in only one S protein variant. Compared to SARS-CoV S protein, 18 of the 22 N-glycosites were found to be conserved in SARS-CoV-2 S protein, indicating the importance of glycosylation for the virus. Four newly arised N-glycosites (N17, N74, N149, and N657) are located in the SARS-CoV-2 S protein S1 subunit away from the RBD. Moreover, four confirmed N-glycosites (N29, N73, N109, and N357) in SARS-CoV S protein were missing in SARS-CoV-2 S, one of which (N357) lies in the RBD11,28.
Intact N-glycopeptides of recombinant SARS-CoV-2 S proteins
Precise characterization of intact N-glycopeptides is critical for understanding biological functions29. Although intact N-gly copeptide analysis is more challenging than analysis of separate N-glycosites or N-glycans, it can provide more comprehensive information, including N-glycosites, N-glycan compositions, and the number of N-glycans30–32. The potential N-glycopeptides in the S protein sequence are shown in Figure S5. Comparison of the intact N-glycopeptides spectra to the total spectra showed that the average enrichment efficiency of the Zic-HILIC materials reached up to 97%. Ultimately, 646 non-redundant intact N-glycopeptides were identified from SARS-CoV-2 S proteins (Table S1), and 410 non-redundant intact N-glycopeptides were identified from the recombinant S1 subunit (Table S3). Representative and high-quality spectra of intact N-glycopeptides are shown in Figure S6. The number of intact N-glycopeptides and N-glycans significantly increased after glycopeptide enrichment (Figure 3A and Figure 3B).
Regarding the N-glycan composition, N-glycopeptides of S protein expressed in insect cells had smaller and fewer complex N-glycans compared with those of the S1 subunit produced in human cells. Both recombinant products contained the common N-acetylglucosamine as a canonical N-glycan characteristic (Figure 3C and 3D). N-glycopeptides of S protein expressed in insect cells were decorated with 38 N-glycans, with the majority preferentially comprising paucimannose- and fucose-type oligosaccharides (Figure 3C and Table S1). By contrast, N-glycopeptides of the S1 subunit expressed in human cells were attached with up to 140 N-glycans, mainly containing fucose-type and unique sialic acid-type oligosaccharides (Figure 3D and Table S3). Returning to the glycosite level, most of the N-glycosites in S protein were modified with 17–35 types of N-glycans, with a high proportion of high-mannose N-glycans and a lower proportion of hybrid N-glycans (Figure 3E). For the S1 subunit, three N-glycosites (N122, N282, and N657) were surprisingly decorated with markedly heterogeneous N glycans of up to 113 types, including a high proportion of complex N-glycans and a small proportion of hybrid or high-mannose N-glycans (Figure 3F). These results showed that the two S proteins expressed in different cells displayed different N-glycosylation patterns with a distinctive N-glycan composition (microheterogeneity) and different numbers of N-glycans at the same site, along with distinct site occupancies in intact glycopeptides (macroheterogeneity) (Figure S7).
The glycosylation patterns of proteins expressed in insect cells have been found to be more immunogenic than those produced in human cells33,34, and antigen production from mammalian cells does not always induce a strong humoral immune response35. The complex and highly heterogeneous N-glycans modified on SARS-CoV-2 S protein expressed in human cells may be related to the differential immune response, which could possibly be caused by epitope masking by the glycan shield, although this hypothesis requires further testing and clarification. Therefore, the less complex N-glycans covering S protein expressed in insect cells might favor the development of vaccines to elicit neutralizing antibodies against SARS-CoV-2 virus.
Two potential N-glycosites (N331 and N343) in RBD were confirmed in the N-glycopeptides from both insect cell-expressed S protein and human cell-produced S1 subunit (Figure 3E and 3F). Both sites closely located in the same glycopeptide (Figure 4). Intriguingly, the composition of the N-glycans in human cells exhibited relatively high conservation compared to most of other N-glycosites (Figure 3F). However, the N-glycan composition were more variable in RBD expressed in insect cells (Figure 3E). These results imply that N-glycosylation modification might be associated with receptor binding since the recognition of RBD to ACE2 mainly depends on polar residue interaction13,14. In addition, complex N-glycans are also ligands for galectins, which can engage different glycoproteins and regulate immune cell infiltration and activation upon virus infection36–38. The RBDs of SARS-CoV-2 decorated with distinct N-glycans in different expression systems could be candidates for SARS-CoV-2 vaccine design as an alternative to full-length S protein which can lead to undesired immunopotentiation with respect to increased infectivity and eosinophilic infiltration9. Based on these results, a large-scale intact N-glycopeptides database of recombinant SARS-CoV-2 S proteins was developed. Nevertheless, the implication of S protein site-specific N-glycosylation (including N-glycosites and N-glycans) on receptor binding, viral infectivity, and immunogenicity should be further investigated.
Conclusions
A comprehensive analysis of site-specific N-glycosylation of SARS-CoV-2 S protein was performed at the levels of intact N-glycopeptides, glycosites, glycan composition, and site-specific numbers of N-glycans. By taking advantage of two complementary protease digestion systems and N-gly coproteomics analysis through enrichment and deglycosylation, we provided a global and site-specific profile of N-glycosylation on SARS-CoV-2 S proteins, revealing extensive heterogeneity in N-glycan composition and site occupancy. Almost all of these glycosites were conserved among the 753 published SARS-CoV-2 genome sequences. In particular, two N-glycosites in the S protein RBD produced in human cells showed relative conservation compared to the N-glycan composition of insect cells, suggesting the potential impact of N-glycosylation on receptor binding. Overall, our data indicate that N-glycosylation profiling and identifying differences among distinct expression systems might help to elucidate the infection mechanism toward development of an effective vaccine and targeted drugs.
ASSOCIATED CONTENT
Supporting Information
Figure S1, Spectra of intact N-glycopeptides with ambiguously assigned N-glycosites; Figure S2, Comparison of the spectra of intact N-glycopeptides before (A) and after (B) enrichment; Figure S3, Comparison of the spectra of intact N-glycopeptides from trypsin digestion (A) and Glu-C digestion (B); Figure S4, Comparison of the spectra of intact N-glycopeptides (A) and deglycopeptides (B); Figure S5, Amino acid sequence alignment of recombinant SARS-CoV-2 S proteins expressed in insect cells (A) and human cells (B). Yellow background: putative sequence containing a signal sequence; Red: potential N-glycosites; Red bold: identified N-glycosites; Green: theoretical cleavage sites of trypsin; Blue: theoretical cleavage sites of Glu-C; Figure S6, Representative and high-quality spectra of intact N-glycopeptides and deglycosylated peptides. Figure S7, Microheterogeneity and macroheterogeneity of the N-linked glycopeptides of S protein; Table S1, Site-specific N-glycosylation characterization of recombinant SARS-CoV-2 S protein expressed in insect cells; Table S2, Glycoproteomic identification results of recombinant SARS-CoV-2 spike protein using the combination of trypsin and Glu-C digestion; Table S3, Site-specific N-glycosylation characterization of recombinant SARS-CoV-2 S protein expressed in human cells; Table S4, Mutation frequency of SARS-CoV-2 spike protein.
AUTHOR INFORMATION
Author Contributions
#Y. Zhang and W. Zhao contributed equally to this work.
Notes
The authors declare no competing financial interests.
Supplementary Figures
Supplementary Figure S1. Spectra of intact N-glycopeptides with ambiguously assigned N-glycosites
Supplementary Figure S2. Comparison of the spectra of intact N-glycopeptides before (A) and after (B) enrichment
Supplementary Figure S3. Comparison of the spectra of intact N-glycopeptides from trypsin digestion (A) and Glu-C digestion (B)
Supplementary Figure S4. Comparison of the spectra of intact N-glycopeptides (A) and deglycopeptides (B)
Supplementary Figure S5. Amino acid sequence alignment of recombinant SARS-CoV-2 spike proteins expressed in insect cells (A) and human cells (B). Yellow background: putative sequence containing signal sequence; Red: potential N-glycosites; Red bold: identified N-glycosites; Green: theoretical cleavage sites of trypsin; Blue: theoretical cleavage sites of Glu-C
Supplementary Figure S6. Representative and high-quality spectra of intact N-glycopeptides and deglycosylated peptides
Supplementary Figure S7. Microheterogeneity and macroheterogeneity of the N-linked glycopeptides of S protein
ACKNOWLEDGMENT
This work was funded by grants from the National Natural Science Foundation of China (grant number 31901038), the 1.3.5 Project for Disciplines of Excellence, West China Hospital, Sichuan University (ZYGD18014, CJQ), and the Chengdu Science and Technology Department Foundation (grant number 2020-YF05-00240-SN).