ABSTRACT
The Covid19 pandemic caused by SARS-CoV-2 has created panic around most of the nations. Statistical and epidemiological data from its predecessor SARS-CoV-1 imply temperature sensitivity of the virus. However, we still lack molecular level understanding of the same. Spike protein is the outermost structural protein of the virus which interacts with the human receptor ACE2 and enters the respiratory system. It is also one of largest proteins which have primary exposure to external environmental conditions. In this study, we performed an all atom molecular dynamics simulation to study the effect of temperature on the structure of the Spike protein. After 200ns of simulation at different temperatures, we came across some interesting phenomena exhibited by the protein. We found that the receptor binding motif of the virus is sensitive to the surrounding temperature and behaves differently at altered temperatures. Bioinformatics and structural studies hinted that the N-terminal Domain of the Spike protein is capable of binding to receptors and should not be overlooked. We also observed that at higher temperatures, the structure of the Spike protein is in a more confined conformation. The study would not only prove very beneficial for understanding the fundamental nature of the virus, but will also support in the development of vaccines and therapeutics.
INTRODUCTION
Severe Acute Respiratory Syndrome Coronavirus 2 or SARS-COV-2, attacks the cells of the human respiratory system. Recent studies have found that the virus also interacts with the cells of the digestive system, renal system, liver, pancreas, eyes and brain [1]. It is known to cause severe sickness and is fatal in many cases [2]. It is believed that the virus originated from the Wuhan fish market of China and got transmitted from bats to humans. It then gradually spread across almost all the nations through aerial transmission resulting in one of the worst known global pandemic of this century [3].
SARS-COV-2 is one of the seven forms of coronaviruses that affect the human population. The other known coronaviruses include HCoV-229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1 and MERS-CoV [4,5]. Their infection varies from common cold to SARS, MERS or Covid19 [5]. These viruses have been observed to affect the human population predominantly during a particular season. For instance, the 2002 SARS infections began during the cold winters of November and after eight months, the number of reported cases became almost negligible [5]. Statistics show that countries with hot and humid weather conditions had lesser number of infectious cases of SARS [6]. However, MERS-COV, which was identified in Middle East regions, affected individuals during the summer [5]. Thus, the disease epidemiology suggests that the virus is found to be prominent in certain climatic conditions only.
The viability of SARS-COV-2 was measured on different surface by Chin et. al., who found that the virus droplets survived at 4 °C but quickly deactivated at elevated temperatures of 50 °C[7]. Smooth surfaces, plastics and iron show greater viability of the virus compared to that of paper, tissue, wood or cloth. Surgical masks had detectable viruses even on 7th day [8, 9]. Soaps and disinfectants which disintegrate the virus membrane and structural proteins are a potent example of how the modulation of atmospheric conditions can affect the virus viability. Statistical reports by Cai et. al., and several others had shown that tropical countries like Malaysia, Indonesia or Thailand with high temperature and high relative humidity did not have major community outbreaks of SARS [6, 10-11]. Although viruses cannot be killed like bacteria by autoclaving, temperature sensitivity of virus have been reported several times in the past. Seasonal Rhinoviruses could not replicate at 37°C, whereas 33-35°C is ideal for their survival in nasal cavity [12]. Influenza was found to be effective at a temperature around 37°C, whereas higher temperatures of 41°C resulted in clumping of viruses on cell surfaces [13, 14, 15]. Similarly, the viability of SARS virus that persisted for 5 days at temperatures ranging between 22-25°C and 40-50% humidity, was lost when the temperature was raised to 38°C and 95% humidity [6].
When the virus is exposed to different temperature conditions, the initial interactions of the atmosphere occur with the structural proteins. There are four major structural proteins present on the virus, the Spike glycoprotein, the Envelope protein, the Membrane protein and the Nucleocapsid. Each of the proteins perform specific functions in receptor binding, viral assembly and genome release [16]. One of the first and largest structural proteins of the Coronavirus is the Spike glycoprotein [17]. The protein exists as a homotrimer where each monomer consists of 1273 amino acid residues (Figure 1) and is intertwined with each other. Each monomer has two domains, namely S1 and S2 [18]. The S1 and S2 domains are cleaved at a furin site by a host cell protease [18, 19]. The S1 domain lies predominantly above the lipid bilayer. The S2 domain, which is a class I transmembrane domain, travels across the bilayer and ends towards the inner side of the lipid membrane [18]. Figure 1 shows the two domains of the Spike glycoprotein.
The S1 domain comprises of mostly beta pleated sheets. It can be further classified into Receptor Binding domain (RBD) and N-terminal Domain (NTD). The RBD binds to Angiotensin Converting Enzyme 2 (ACE 2) on the host cells [20]. It lies on the top of the complex, where around 14 residues from the RBD domain bind to the ACE2 receptor on the host protein [21, 22]. The NTD is the outermost domain that is relatively more exposed and lies on the three sides giving a triangular shape to the protein when viewed from top (Figure 1). The NTD has a galectin fold and is known to bind to the sugar moieties [21]. The S2 domain on the other hand is a transmembrane region with strong interchain bonding between the residues. It is mostly α-helical and forms a triangle when viewed from bottom, though there is no overlapping of the top and bottom triangles.
Temperature is a very significant variable for proteins because proteins respond differently in high and low temperature conditions. Many proteins have high thermal stability while others can unfold or even denature at high temperatures [23, 24]. During November, 2019, when the first outbreak of Covid19 was reported, the temperature in Wuhan, China was around 17°C in the morning and 8°C at night. Tropical countries where the disease persisted had over 40°C temperature [25, 26]. Although statistical and experimental evidence show that temperature influences the activity and virulence of the virus, we still lack the understanding of the molecular level changes that are taking place in the virus due to the different weather conditions. Till date, there is no concrete evidence on whether atmospheric conditions actually influence the structure of the virus. Here, by using all atom molecular dynamics (MD) simulations we explore the dynamics of the Spike glycoprotein of SARS-COV-2 at different temperatures. This is the first molecular study on the environmental influence on the protein structure. Results suggest that S1 domain is more flexible than S2. In the S1 domain, we observed the sensitivity of the receptor binding motif to different temperatures. We also found that the N-terminal domain of the protein has the potential of binding to different human receptors. The study will not only help us in understanding the nature of the virus but is also useful to design effective therapeutic strategies.
RESULTS AND DISCUSSION
The crystal structure of the Spike glycoprotein was found to have around 900 missing residues. Thus, for our study we considered the complete model of the trimeric Spike protein generated by Zhang et. al. with a Template modeling score of 0.6 [27]. The model was devoid of N-acetyl glucosamine (NAG) sugar moieties which are known to bind and stabilize the protein. The envelope lipid bilayer was not considered in the work to avoid large system size in atomistic simulations. After initial minimization and equilibration, we generated five different systems having temperatures ranging from 10°C to 50°C. This was done to maintain the uniformity of the simulations, where temperature was the only variable that was different. One more systems at 70 °Cwas also generated for comparison (Table S1). Production run for 200ns was carried out under NPT conditions.
A. Spike glycoproteins are sensitive to temperature
After performing 200ns of classical Molecular dynamics simulations, the root mean square deviation (RMSD) of the trajectory, with respect to the starting structure, was calculated to check if the systems have attained stability. Figure S1 shows the complete RMSD of all the systems at different temperatures. It can be seen that the stability was attained within the first 50ns of the simulation time, thus, indicating that the systems are well equilibrated. The RMS values lie between 0.6 - 0.7nm for all the systems with an exception at 40°C where a marginally higher RMSD was seen after 100ns of simulation time. At temperatures 20°C and 30°C, a small rise in RMSD curves after 100ns of simulation time was observed. This implies that the spike protein was more stable at temperatures 10°C and 50°C.
Since, the protein comprises of two distinct domains S1 and S2, we checked the RMSD of S1 and S2 domains individually, with respect to the starting structure, to understand the cause for higher RMSD values observed at 20°C, 30°C and 40°C (Figure 2). The RMS values of S1 domain at 20°C, 30 °Cand 40 °C were found to be around 0.7 nm, nearly 0.5nm more than simulations at 10 and 50°C respectively. A similar trend was observed in the RMSD of S2 domain, but, the difference in values was only 0.15 nm. Although, in this study, we haven’t considered the bilayer lipid membrane of the SARS-COV-2 envelope inside which the Spike glycoprotein resides, the S2 domain shows remarkable stability in its RMSD values (Figure 2). The stability of the S2 domain can be conferred to the strong interchain interactions among the highly α-helical S2 domain.
Since the spike is a homotrimer, the S1 domain of individual domains was also checked to account for the difference in fluctuations. Figure 2(c) - (e) shows the RMSD of S1 domain of chains A, B and C at different temperatures. In chain A, it can be clearly seen how the RMSD is quite high at temperatures of 30°C and 40°C respectively. At 50°C however, the fluctuations are quite negligible and the system is very stable. Similarly, for chain B at 10°C, 20°C and 50°C, the chains were stable. In the S1 domain of chain C, except for simulation at 40°C, at all other temperatures, the RMSD values were found to be quite low along the length of the simulation time. The above data indicates that the protein chains, especially the S1 domains are quite flexible around the temperatures of 20-40°C in comparison to low temperatures of 10°C or high 50°C of simulation temperature. Irrespective of the presence of the bilayer membrane, at different temperature conditions, the stalk of the spike protein remains stable.
B. Domain flexibility of S1 is more pronounced
In order to identify the region on the Spike protein that cause the deviations in RMSDs, we plotted the root mean square fluctuation (RMSF) of CA atoms of both S1 and S2 domains separately (Figure 3) at different temperatures. Each plot shows the RMSF of each individual chains at different temperatures. The RMSF of individual chains of S1 domain at different temperatures show that the residues ranging from 1-333 and constitute the N-Terminal Domain (NTD) of S1, show more fluctuations compared to the Receptor binding Domain (RBD) ranging from residues 334-680. In the NTD, three distinct peaks could be seen, viz:-residues 85-90, 100-200 and 240-260. The first peak in the NTD was observed around residues 85-90 which constituting the β4-β5 loop directed inwards to the S2 domain (Figure S2). The peak was found to be highest in chain A at 40°C (∼0.8 nm), however, at other temperatures all the chains have approximately 0.5 nm RMS fluctuation of its CA atoms. The residues 100-200 constitute the solvent exposed β sheet (β6-β12) of the NTD of S1 domain (Figure S2). The crystal structure (PDB: 6VXX) had shown as many as three glycosylated groups adjacent to this region of the protein (Figure S3) [28]. Thus, the lack of stabilizing sugar moieties in the simulated complex increases the flexibility of the region around 100-200. The residues 240-260 are solvent exposed loop around β14-β15. No glycan binding sites were observed in the crystal structure. The RBD domain consists of a Receptor Binding Motif (RBM) ranging from residues 458-506 that show flexibility in all the systems. The lowest flexibility was observed at 10°C. At 30 °C, the peaks were found for more number of residues. This indicates differential flexibility of the binding motif at different temperatures.
The RMSF of S2 domain on the other hand shows marked stability compared to domain S1 (Figure S4). This is in good agreement to our earlier observations of the RMSD of the S2 domain. Since it is a triple helical coil, the coiled-coil motif of the S2 domain which is further supported by three shorter helices supports domain stability [29]. However, the C-terminal residues 1125-1273 show greater flexibility compared to the rest of the domain. It should be noted that the C-terminal region of the Spike glycoprotein is exposed towards the inner side of the envelope bilayer and does not participate in the interchain interactions. It also has a more relaxed packing compared to the rest of the S2 [28, 30].
C. NTD of the Spike protein could act as receptor binding domain
Although, the NTD does not bind directly to the receptors, they play a very important role in the viral entry. The NTD of Mouse hepatitis coronavirus has evolved to bind to CEACAM1a receptor [31]. The fold of NTD is known as the galectin fold, capable of binding to glycans [31]. Glycans play important role associated with activation, regulation and immunological response [32]. Studies on mice, where vaccines were developed against the NTD of Spike protein showed that NTD can be a potential therapeutic target [33]. Earlier studies of Bovine coronavirus and Bovine hemagglutinin-esterase enzyme indicated a close evolutionary link between the virus and the host proteins. The similarity facilitates it attachment in the host cells [34]. We performed Multiple Sequence Alignment (MSA) of the SARS-CoV-2 coronavirus NTD with the human proteome to find similarity of NTD with human proteins
Figure 4 shows the phylogenetic tree based on fast minimum evolution. The tree was constructed from the results of protein Blast (Blastp) where the maximum allowed difference was 0.85 (details discussed in Materials and Methods). The results show similarity with EPHRIN proteins that binds to EPHRIN receptors and Leucine aminopeptidase an insulin-regulated aminopeptidase that prepares antigenic epitopes for cross-presentation in dendritic cells, and to some extent with the human GABA receptor chain A. The EPHRIN receptors are one of the largest proteins of receptor tyrosine kinases which are involved in several important functions of human body such as angiogenesis, retinogenic signaling, axon guiding and in the migration of the epithelial cells of the intestines [1]. The possibility of the NTD acting as a receptor binding domain cannot be ruled out completely. There is also an uncanny correlation between the prevailing literature where scientists claim the virus affecting different parts of the human body and the similarity of NTD with human enzyme [1, 35].
Our analysis of the RMSF plots in Figure 3 show high level of instability of the domain at temperatures ranging from ∼20-40°C. The region is relatively more exposed to solvents and more susceptible to external environmental conditions. However the NTD doesn’t have a defined open or closed conformation like the RBD. Thus, based on the Multiple Sequence alignment calculations, we tried to quantify the residues that are more exposed and susceptible to binding at different temperature conditions. The coronavirus NTD is composed of a three layered beta-sheet sandwich with 7, 3 and 6 antiparallel β strands in each layer making it a total of 16 beta stranded sheet with 5 prominent β hairpin loops (Figure S5). The crystal structures of Mouse Hepatitis Coronavirus (MHC) spike protein and its receptor shows that the β1 and β6 of the NTD are the binding motif for CECAM1a protein [31]. However, unlike the MHC NTD, the arrangement of strands in SARS-CoV-2 is in opposite direction. The upper layer of the beta sandwich is composed of beta strands β4, β6, β7, β8, β9, β10, β14 (Figure S4). The three prominent regions which are exposed to the solvent and capable of interacting with potential receptors are regions N-terminal β strand, β8-β9, and β9-β10 loop.
Comparison of the potential receptor binding regions on the NTD at different temperatures (Figure 5), shows differential arrangement of the three exposed loops. The loops are formed by residues from N-terminal β strand, β8-β9, and β9-β10. These loops are oriented overlapping each other where the N-terminal β strand region is on top at 10°C, 20°C, they face each other at 30 °C, at 40°C, the proximity becomes lesser and at 50°C, the β8-β9 loop region lies above the N-terminal β strand, oriented opposite to what was earlier observed at 10°C and 20°C. The residues β9-β10 (shown in green) is a glycan binding domain as observed in crystal structure PDB ID: 6VXX. It was found that the region participates, along with the above loops, in forming a motif that exposes many charged residues at 30°C (Figure S6). Besides, the charged residues N-terminal β strand and β8-β9 loop loops come at close proximity of each other thereby increasing the stability of the motif. However, we did not observe the phenomena at other temperature ranges. Since, there was similarity between the EHPRIN A proteins that binds to the EPHRIN A receptors; we compared the residues involved in protein-protein interaction in the crystal structure of the human EphA4 ectodomain in complex with human ephrin A5 for comparison. (PDB ID: 4BKA). There are three salt bridges and seven hydrogen bonds between the EPHRIN protein and its receptor. Hence, a strong possibility exists for the NTD to act as a receptor binding domain, especially around 30°C.
D. The RBD behaves differently at higher temperatures
The Receptor Binding Domain (RBD) of the Spike glycoprotein is a potential target for vaccine and drug development [36, 37]. It is highly conserved among the human coronaviruses and binds to ACE 2 receptor present on the lung tissues [38]. Residues 458-506 of the RBD domain comprises of the receptor binding motif (RBM). The RBM has 8 residues which are identical and 5 residues with similar biochemical properties between SARS, MERS and SARS-COV-2. This conserved region primarily interacts with the ACE2 receptor and hence, often scientists target the RBD domain of for developing therapeutic agents [36, 37, and 39]. Earlier in Figure 2, we saw that the RBD domain spanning from residues 333-680 shows higher stability when compared to the NTD of the S1 domain at different range of temperatures.
We compared the time averaged conformation of the RBD generated from the last 10ns of the simulation time at different temperatures (Figure 6). The core β pleated sheet was very stable demonstrating no lack of secondary structures at higher temperatures. However, the RBM motif (highlighted in magenta in Figure 6) shows a very dynamic conformation across different temperature ranges. The dynamics was more pronounced at 10°C, 20°C and 30°C whereas at 40 °Cand 50°C of temperature, the RBM had a more confined conformation. The RBD flexibility was more apparent at 20 °Cand 30°C where the three chains moved further away from each other. However, a tighter and well packed structure was found for the protein at 50°C. The figures suggest that although residue wise movements in RBD were not visible in RMSF (Figure 2), the RBD domains and motifs show intrinsic flexibility along particular temperature ranges. Previous studies have indicated that the RBD domain can adopt either an open or a closed conformation in the virus. We compared the conformation of the Spikeprotein-ACE2 crystal structure and found that the RBD adopts an open conformation exposing its RBM residues Phe456, Ala475, Phe486, Asn487, Tyr489, Gln493, Gly496, Gln498, Thr500, Asn501, Gly502 and Tyr505 to facilitate the binding of the ACE2 receptors. It is fascinating to see that at 40°C and more interestingly at 50°C, the RBM motif is in a closed and compact conformation which hinders its association with the partner proteins.
To validate the findings, we ran another simulation of the Spike protein at a higher temperature of 70°C. After 100ns of simulation, we found that significant similarity between the closed conformation observed at 50 °Cand the conformation at 70 °C. The RBM residues, specifically Phe456, Ala475, Phe486, Asn487, Tyr489, Gln493, Gly496, Gln498, Thr500, Asn501, Gly502 and Tyr505 were found to be clearly buried between the interchain subunits at 70°C (Figure 7). However, when compared to the orientation at 30°C the residues are directed towards the solvent. Thus, the reason for very stable RMSD observed in Figure 1, is largely due to the confined architecture of the receptor binding domain at 50 °Cand higher temperatures. The unavailability of RBM residues to bind to ACE2 receptor would nonetheless destabilize virus-protein interactions at higher temperatures.
CONCLUSION
SARS-CoV-2 has affected the human population adversely. The propensity of virus to survive in cold and dry climatic conditions have been speculated by researchers and supported by the statistical evidence from earlier SARS epidemic of 2002. However, we lacked a detail molecular level understanding of the virus behavior in different environments. Here, we used the Spike protein to understand its response at temperatures ranging from 10°C to 50°C. The Spike protein helps in the attachment and entry of the coronaviruses inside the host cells. It exists as a homotrimer and is immersed inside the lipid bilayer of the viral envelope. Our results show that the S2 domain, which is predominantly immersed inside the lipid bilayer, remains stable even without the bilayer membrane, whereas the S1 domain is quite flexible. Moreover, the S1 comprises of two subdomains, namely N-terminal domain and the Receptor binding domain. The simulations studies show that the Receptor Binding Domain is relatively less mobile and the flexibility is mostly limited to a binding motif which interacts with the ACE2 human receptor. However, the N-Terminal Domain was found to be very flexible.
The flexible N-Terminal Domain hosts a large number of charged residues on the top layer of its tri-layered beta sandwich architecture. However, at 40-50°C of temperature, the residues were found to be less solvent exposed. The similarity of the N-Terminal Domain sequence with the several human proteins indicated a possibility of the subdomain to be involved in binding to other human receptors. A Receptor Binding Motif present on the Receptor Binding Domain is very crucial in initial protein-protein interaction between the host and virus. We found that this domain is largely in an open conformation which enables receptor binding at temperatures ranging between 20-40 °C. Surprisingly, at higher temperatures, a closed conformation of the motif was observed. This clearly depicts the sensitivity of the virus protein at different temperatures. However, it is yet to be ascertained if the conformational change is reversible at lower temperatures. Nevertheless, the work would prove very beneficial in the development of vaccines as well as development of therapeutic strategies that target not only the Receptor Binding Domain but also the N-Terminal Domain of the Spike protein.
MATERIALS AND METHODS
The complete model of the spike glycoprotein of SARS-COV-2 was obtained from Zhang lab (GenBank: QHD43416.1). This model was considered because it had modeled the missing 871 residues that were missing in the crystal structure 6VXX. It had a Template Modeling (TM) score of 0.6 [27]. The initial Root Mean Square Deviation (RMSD) between the model and the closed crystal structure of spike glycoprotein (PDB ID 6VXX) was found to be 1.54 Å. The model was devoid of N-acetyl glycosamine (NAG) glycan residues and consisted of the glycoprotein trimer where each monomer had aminoacids ranging from 1 to 1273. The structure was initially solvated with a water box having a cubic box of size 17.9 × 17.9 × 17.9 nm and 569293 atoms with water and ions. The minimum distance between the protein and the edge of the water box was fixed at 13 Å. Particle-Mesh Ewald (PME) method was used for electrostatic interactions using a grids pacing of 0.16nm and a 1.0 nm cutoff. After energy minimization and equilibration, by maintaining harmonic restraints on the protein heavy atoms, the system was gradually heated to 300K in a canonical ensemble. The harmonic restraints were gradually reduced to zero and solvent density was adjusted under isobaric and isothermal conditions at 1 atm and 300 K. This was followed by 500ps NVT and 500ps NPT equilibration with harmonic restraints of 1000 kJ mol-1 nm-2 on the heavy atoms. Production run for all the systems was carried out for 200ns till it reached a stable RMSD. All simulations were carried out in Gromacs 2020 with AMBERff99SB-ILDN forcefield for proteins [40, 41]. The long-range electrostatic interactions were treated by using Particle-Mesh Ewald sum and SHAKE was used to constrain all bonds involving hydrogen atoms. After equilibration, systems were heated or cooled at different temperatures (Table S1) and simulated for 200ns. All analyses were carried out using Gromacs analysis tools [40]. Protein Blast was used to search similar sequences in the human proteome. The Blast Tree View widget helped us generate the phylogenetic tree which is a simple distance based clustering of the sequences based on pairwise alignment results of Blast relative to the query sequence [42]. VMD was used for visualization of results and generation of figures [43].
SUPPORTING INFORMATION
Supporting figures, Figs S1-S7 and Table S1 are provided online.
ACKNOWLEDGEMENTS
This research used resources of the National Energy Research Scientific Computing Center of the Ernest Orlando Lawrence Berkeley National Laboratory, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and used the Extreme Science and Engineering Discovery Environment (XSEDE). The authors are thankful to the Covid19 HPC Consortium for providing resources and helping researchers work for a noble cause. The authors are also thankful to Dr. Suchetana Gupta and Dr. Debakanta Tripathy for critically proof reading the manuscript. We are also grateful to National Institute of Technology Warangal for providing facilities.
Footnotes
* Phone: +91 7978293479, Email: slrath{at}nitw.ac.in; kishant{at}nitw.ac.in
Abbreviations
- RBD
- Receptor Binding Domain;
- NTD
- N-Terminal Domain,
- RBM
- Receptor Binding Motif