Computational analysis of SARS-CoV-2 S1 protein O-glycosylation and phosphorylation modifications and identifying potential target positions against CD209L-mannose interaction to inhibit initial binding of the virus

Covid-19 outbreak is still threatening the public health. Therefore, in the middle of the pandemic, all kind of knowledge on SARS-CoV-2 may help us to find the solution. Determining the 3D structures of the proteins involved in host-pathogen interactions are of great importance in the fight against infection. Besides, post-translational modifications of the protein on 3D structure should be revealed in order to understand the protein function since these modifications are responsible for the host-pathogen interaction. Based on these, we predicted O-glycosylation and phosphorylation positions using full amino acid sequence of S1 protein. Candidate positions were further analyzed with enzyme binding activity, solvent accessibility, surface area parameters and the positions determined with high accuracy rate were used to design full 3D glycoprotein structure of the S1 protein using carbohydrate force field. In addition, the interaction between the C-type lectin and α-mannose residues was examined and carbohydrate recognition positions were predicted. We suggest these positions as a potential target for the inhibition of the initial binding of SARS-CoV-2 S1 protein to the host cell.


Introduction
SARS-CoV-2 emerged in Wuhan, China in December 2019 and spread quickly across the world.It resulted in 375,498 confirmed cases and 16,362 deaths as to World Health Organization data as of 25 th March 2020.Therefore, in the middle of the pandemic, all kind of knowledge on SARS-CoV-2 may help us to find the solution.
The coronavirus spike protein (S) plays a key role in the early steps of viral infection.S protein comprises of two sub-units S1 and S2 which are responsible for the binding of the host cell receptor and fusion of the cellular membrane, respectively [1,2,3].S protein also contains furin cleavage site at the boundary between the S1-S2 sub-units which mediates the membrane fusion and virus infectivity [1,2].The S1 protein mediates an initial high affinity interaction with a cellular receptor.It has been suggested that different domains within a single S protein could bind multiple alternative receptors.Although ACE2 (angiotensin-converting enzyme 2) is known to be the SARS-CoV receptor [2,4], CD209L, a C-type lectin that binds to high-mannose glycans on glycoproteins, has also been found to be as an alternative receptor for SARS-CoV [5].However, recent studies have mainly focused on the S protein-ACE2 interaction, there is a paucity of information on the SARS-CoV-2 S1 protein-CD209L lectin interaction.It has been found that CD209L lectin binds to SARS-CoV, human coronavirus 229E, Ebola, Hepatitis C, HIV, Influenza virus glycoproteins and may mediate the endocytosis of pathogens [6][7][8][9][10][11]. Thus, identification of the carbohydrate-lectin interaction sites on the 3D structure for understanding of the initial binding of virus to a host cell is crucial.Also, as post-translational modifications regulate the host-pathogen interaction [12], identifying SARS-CoV-2 S1 protein modifications may help us to inhibit initial binding of the virus.3D crystal structure of SARS-CoV-2 S protein with N-glycosylation positions was identified but, sites such as furin cleavage and Oglycosylation are missing.In addition, the detailed structure of N-glycan units was not included in this model [13].As glycan structures typically exist in solution or on proteins, it is a big challenge to characterize the 3D structure of glycoproteins experimentally.However, computational structural biology allows us to generate 3D protein and glycoprotein modelling with high accuracy rate using all amino acid sequence [14][15][16].Herein, we predicted Oglycosylation and phosphorylation positions using full amino acid sequence of S1 protein.
Candidate positions were further analyzed with enzyme binding activity, solvent accessibility, surface area parameters and the positions determined with high accuracy rate were used to design full 3D glycoprotein structure of the S1 protein using carbohydrate force field.In addition, the interaction between the C-type lectin and α-mannose residues was examined and carbohydrate recognition positions were predicted.We suggest these positions as a potential target for the inhibition of the initial binding of SARS-CoV-2 to the host cell.

Results and Discussion
Determining the 3D structures of the proteins involved in host-pathogen interactions are of great importance in the fight against infection.Besides, post-translational modifications of the protein on 3D structure should be revealed in order to understand the protein function since these modifications are responsible for the host-pathogen interaction.Therefore, researchers primarily have focused on the S protein structure of SARS-CoV-2 virus and revealed the 3D structure with N-glycosylation positions.But, the detailed structure of the N-glycan units in these sites and the O-glycosylation positions are not included in the current model [13].Also, it is known that the initial binding of other coronaviruses to host cell occurs via the alternative receptor C-type lectin CD209L besides ACE2 protein [5].Taken together, herein, we first demonstrated SARS-CoV-2 S1 protein high mannose N-glycan structure and its interaction with CD209L.When 3D glycoprotein structure of S1 protein analysed, N-glycosylation positions were found to be located mainly on the N-Terminal Domain (NTD) (Figure 1).Since the Receptor Binding Domain (RBD) is known to be involved in ACE2 binding, we suggest that N-glycan positions on NTD may associated with the binding of C-type lectin.C-type lectins interact with α-linked mannose residues on high mannose N-glycan structures [17].There has been a structural study on C-type lectin-glycan interaction but terminal sugar is not mannose here [18].Thus, in this study, the interaction between the C-type lectin and α-D-mannose was analysed and interface residues were identified.ZDOCK docking and PDBePISA interface interaction analysis showed in two models with high accuracy rate that mannose sugar interacts with C-type lectin at Met282, Lys307, and Ser345 positions with hydrogen bonding (Fig. 2 and Table 1).We suggest these positions as the α-D-mannose recognizing sites having function on the lectin-sugar binding.
O-glycans are known to be involved in protein stability and function [19].The presence of Oglycans in some viral proteins has been demonstrated and suggested that glycans may play a role on biological activity of viral proteins.In the comparative study on human SARS-CoV-2 and other coronaviruses S proteins have showed that Ser673, Thr678, and Ser686 are conserved O-glycosylated positions and have suggested that SARS-CoV-2 S1 protein may show Oglycosylation at these positions [20].In this study, S1 protein was found to be O-GalNAcylated at Thr632, Thr678 and O-β-GlcNAcylated at Thr323, Thr638, Ser686 positions (Table 2).
When compared our results with the previous study, SARS-CoV-2 S1 protein found not to be O-glycosylated at Thr673, but O-GalNAcylated at Thr678 and O-β-GlcNAcylated at Ser686.Besides, we found additional O-glycosylation positions at Thr323, Thr632, and Thr638 on the S1 protein.On the 3D glycoprotein structure, Thr323 was found to be located at RBD and Thr678 and Ser686 located near FRS (Fig. 1 and 3).Since O-glycans are involved in proteinprotein interactions [19], we suggest that O-glycans at Thr323 may play a role on binding to ACE2 and O-glycans at Thr678, Thr686 may responsible for furin protease enzyme binding.It has been known that O-glycans are responsible for the protein stability and creating mucin like domain as glycan shields involved in immunoevasion [21].Based on this, we suggest that the O-glycosylation positions found may be involved in the stability of the S1 protein and immunoevasion of virus to the host cell.
Phosphorylation of viral proteins are catalyzed by host cell enzymes like glycosylation modifications [12,22].In this study, phosphorylation modification of SARS-CoV-2 S1 protein was also examined and 36 positions of Ser/Thr residues were found to be phosphorylated on almost all domains of the protein (Table 3).Davidson et al. have shown that S1 protein is phosphorylated at 7 positions [23].Amongst them, Ser459, Ser637, Ser640 positions were found to be correlated with our results.Phosphorylation modification regulates the protein activity and function in cooperation with glycosylation [24,25].When we examined the phosphorylation and glycosylation locations of S1 protein on the 3D structure; the phosphorylation positions at Ser161 and Ser162 were found to be located near glycosylation position Asn165.Likewise, Thr109 with Asn234; Ser680 with Thr678 and FRS; Thr604 with Asn603; Thr630 with Thr632; Thr637 and Thr640 with Thr638; Ser659 with Asn657 were found to be located close on the 3D structure (Fig. 3).So, we suggest these sites as a critical sites for S1 protein activity.

Conclusion
Covid-19 outbreak is still threatening the public health.At this point, inhibiting the initial binding of virus to the host cell is crucial in order to find treatment.Since post-translational modifications regulate the host-pathogen interaction, identifying SARS-CoV-2 S1 protein modifications may help us to inhibit initial binding of the virus.Therefore, we focused on the CD209L lectin-α-D-mannose interaction of S1 protein and suggested Met282, Lys307, and Ser345 positions as targets for the initial binding.Also, we found that phosphorylation and glycosylation positions where located at close sites may be critical for S1 protein activity.Therefore, we suggest that, these positions can be targeted for drug development against Covid-19.

Prediction of O-Glycosylation and Phosphorylation Positions
The potential O-glycosylation (O-GalNAc) and O-β-GlcNAc sites of S1 protein were analysed via NetOGlyc 4.0 [26] and YinOYang 1.2 Server [27], respectively.The threshold was chosen as 0.5 for both to predict high potential sites.The potential glycosylation positions which are found on NetOGlyc and YinOYang were analysed with the ISOGlyP server for the enzyme binding activity [28].ISOGlyP was used to calculate all potential positions with ppGalNAc transferase isoforms which showed high enzyme binding activity and calculates the enhancement value product (EVP) values as an indication of glycosylation rates [29].The potential phosphorylation positions of S1 protein was analysed with NetPhos 3.1 Server [30].NetSurfP v1.1 was used to assess the surface and solvent accessibility of predicted Ser and Thr positions with glycosylation and phosphorylation positions [31].The relative solvent accessibility of potential glycosylation positions was analysed using Poly-View 2D-SABLE protein structure prediction server [32].

Glycoprotein Building on 3D Structure using Carbohydrate Force Field
3D structure models involving full amino acid sequence of S1 protein was taken with QHD4341 ID from I-TASSER (Iterative Threading ASSEmbly Refinement) [33].The structure model has been generated by the C-I-TASSER pipeline, which utilizes deep convolutional neural-network based contact-map predictions to guide the I-TASSER fragment assembly simulations.GLYCAM-Web Server (AMBER carbohydrate force field) was used to screen highly reliable glycosylation sites on the protein 3D structure.GLYCAM-Web is dedicated to simplifying the prediction of three-dimensional structures of carbohydrates and macromolecular structures involving carbohydrates.Glycoprotein Builder runs SASA (Solvent Accessible Surface Area) prediction and finds the most appropriate glycosylation sites for adding glycan units on the 3D structure using molecular dynamics simulation [34,35].Glycan units skeleton were chosen as Core 1 type for O-glycan units and the common shape of high mannose type for N-glycan units.
Core-1 type is the most common O-glycan motif on proteins.The common shape of high mannose type is a conserved motif that plays an important role on the N-glycan formation [17].
Additionally, all potential phosphorylation sites were analysed for SASA parameters on the 3D protein structure using GLYCAM and the positions with high accuracy rates were chosen as phosphorylation positions.

Lectin-Carbohydrate Docking
In order to find out the CD209L lectinmannose interaction, ZDOCK docking server was used [36].CD209L lectin 3D structure was taken from PDB (ID: 1XPH) [37] and α-D-mannose structure was taken from Glyco3D database [38].PDBePISA server was used for the exploration of macromolecular interfaces between the receptor-ligand [39].All structures were visualized with PyMOL.

Figure 3 .
Figure 3. N-, O-glycosylation and Ser/Thr phosphorylation positions of SARS-CoV-2 S1 protein on the 3D structure.The glycosylation and phosphorylation positions are shown in grey and yellow, respectively.NTD: N-Terminal Domain, RBD: Receptor Binding Domain, FRS: Furin Recognizing Site

Table 1 .
CD209L lectin  α-D-Mannose interaction sites analysis by PDBePISA.ZDOCK docking analysis showed that top 2 models have lectin-mannose interactions by hydrogen bonds on 3 potential positions.

Table 3 .
The predicted phosphorylation positions of SARS-CoV-2 S1 protein.All positions were analysed with the kinase binding, surface, and solvent accessibility parameters.The kinase specificity and protein localizations were also given in the table.