Abstract
Intraviral protein-protein interactions are crucial for replication, pathogenicity, and viral assembly. Among these, virus assembly is a critical step as it regulates the arrangements of viral structural proteins and helps in the encapsulation of genomic material. SARS-CoV-2 structural proteins play an essential role in the self-rearrangement, RNA encapsulation, and mature virus particle formation. In SARS-CoV, the membrane protein interacts with the envelope and spike protein in Endoplasmic Reticulum Golgi Intermediate Complex (ERGIC) to form an assembly in the lipid bilayer, followed by membrane-ribonucleoprotein (nucleocapsid) interaction. In this study, we tried to understand the interaction of membrane protein’s interaction with envelope, spike, and nucleocapsid proteins using protein-protein docking. Further, simulation studies performed up to 100 ns to examine the stability of protein-protein complexes of Membrane-Envelope, Membrane-Spike, and Membrane-Nucleocapsid. Prime MM-GBSA showed high binding energy calculations than the docked complex. The interactions identified in our study will be of great importance, as it provides valuable insight into the protein-protein complex, which could be the potential drug targets for future studies.
Introduction
Seven types of coronaviruses infect humans, among which severe acute respiratory syndrome (SARS-CoV), middle east respiratory syndrome (MERS-CoV), and SARS-CoV-2 viruses are primarily focused 1–3. The coronavirus’s structural proteins make up the viral symmetry and enclose the positive-sense single-stranded RNA of ∼30 kb size 1. Briefly, the spike protein (S) has S1 and S2 subunits, which recognizes the human receptor ACE-2 and mediates the viral membrane fusion with the host plasma membrane 4,5. Whereas, the nucleocapsid protein (N) is phosphorylated and highly basic in nature, whose primarily function is associated with the packaging of viral genomic RNA 6,7. The CoV’s N protein contain two RNA-binding domains: the N-terminal domain and the C-terminal domain, linked by a serine/arginine-rich domain (SRD) 8–11. The role of SRD is vital for effective virus replication 12. In comparison the membrane protein (M) is a transmembrane protein consisting of an N-terminal ectodomain and a C-terminal endodomain 13–15.
Viruses use protein-protein interactions (PPI) to reach out and hijack their host cellular network 16,17. The virus-host PPI map is invaluable, as it provides insight into the virus’s behavior to capture host protein network for its meanings 18–20. Recently, targeting of virus (SARS-CoV-2)-host PPI shows 66 druggable human proteins/host factors targeted by 69 compounds 16. Experimental techniques such as biomolecular fluorescence complementation, co-immunoprecipitation, and yeast two-hybrid are extensively used to detect virus-host PPI, which also shed light on the intraviral PPI 21–24. The M protein expressed in higher propensity during infection interact with N protein and plays a vital role in assembling virus particles 25–27. The M-M interaction occurs by the transmembrane domain 28. Further, the N and S proteins interacts with the C-terminal endodomain of M protein, which is the hotspot for protein-protein interaction 27,29–32. Besides the role of M protein’s C-terminal in M-N interactions, multiple regions of M protein are responsible for M-E and M-S interactions 26. In SARS-CoV, the amino acids 168–208 in the N protein are essential for oligomerization and N-M interactions 25. PPI plays a critical role in stabilizing N protein-RNA interactions 33. However, the N protein interaction with the C terminal of M protein involves multiple M endodomain regions 28. But it is not known in the case of SARS-CoV-2 whether these regions interact or not?
On the other side, computational techniques such as protein-protein interaction networks based on phylogeny methods and structure-based protein-protein docking are now very impactful and faster to identify the interaction sites in protein 34,35. In this context, we propose to study the protein-protein interaction of M-E, M-S, and M-N of SARS-CoV-2 with protein-protein docking and molecular dynamics (MD) simulation methods. The primary goal of performing docking is to reveal interaction sites and the generation of protein-protein complexes. Further, atomic-level MD simulations help to characterize the structure and dynamics of protein-protein complexes 36. In this study, MD allows us to understand the association-dissociation propensity of protein complex during a single trajectory. Moreover, the study’s outcome will highlight the mechanistic details, i.e., intermediates and transition state, along with the protein complex’s association-dissociation, which could be used as a potential drug target to counter the pathogenicity associated with SARS-CoV-2.
Material and Methods
Protein structure modeling and preparation
Many SARS-CoV-2 proteins structure, i.e., spike, protease, and RdRp, are reported by X-ray crystallography or Cryo-EM techniques 37–39. However, several other proteins, such as full-length nucleocapsid, envelope, and membrane, do not have structure available yet. Therefore, we have utilized the structure models of the envelope, and membrane proteins using RaptorX web server. We also built the model for the full-length 3D structure of S protein using the I-Tasser web server 40, by applying existing Spike protein structures such as PDB ID: 6VXX as template, as the available 3D structures of S protein lack transmembrane and cytosolic part and used it for protein-protein docking. Moreover, the available 3D structures of spike (PDB ID: 6VXX) and envelope (PDB ID: 7K3G) proteins were retrieved from RCSB-PDB for truncated structure docking. The S protein structure is determined using electron microscopy in closed state formed by three S protein monomers. Whereas, the E protein structure is determined in its pentameric form which only constitutes its transmembrane regions in pentameric form. In last, all protein structures were prepared using the protein preparation wizard for optimizing hydrogens and minimizing potential energy using our previously defined protocols 41,42.
Protein-protein docking
The PIPER program embedded in the BioLuminate module of Schrodinger for protein-protein docking was implemented to docking of M protein with E, S, and N proteins 43,44. A detailed methodology has been given in our previous report 41. Briefly, PIPER performs a global search with Fast-Fourier Transform (FFT) approach and reduces the false-positive results. Among 1000 conformations of input structures, the top 50 clusters were selected with a cluster radius of 9 Å. The docking outcomes based on cluster size were evaluated. With the most massive cluster size, the docked complex out of 5 complexes was selected for molecular dynamics simulation. A total of 70,000 rotations were allowed to generate five docked complexes for all setups.
MD Simulations of protein-protein complexes
For MD simulation of docked protein-protein complex, three setups were generated for M-E, M-N, and M-S proteins. The binding and their interacting stability were observed for a 100 ns timescale. Using our previously reported protocols, simulation of these complexes carried in the Desmond simulation package, which utilizes OPLS 2005 forcefield to calculate bonded and non-bonded parameters and energy parameters 45,46. Previously, the C-terminal region of SARS-CoV M protein was found to interact with N protein in presence of lipids 26. Therefore, in our study, simulation of the M-N protein complex was provided with a lipid bilayer (POPE; 1-Palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine) environment around M’s transmembrane regions (residues 20-40, 51-71, and 80-100). All systems fed up with the TIP4P water model, 0.15 M NaCl salt, neutralizing counterions, and minimized for 5000 iterations using the steepest descent method. Final production run, followed by equilibration with NPT ensemble, carried out at an average temperature of 310K, and 1 bar pressure maintained using Noose-Hover chain thermostat and Martyna-Tobias-Klein barostat methods.
Prime MM-GBSA: Binding energy calculation
Prime module of Schrodinger suite was utilized to calculate binding energy of every protein-protein complex by keeping membrane as a receptor and other three proteins envelope, spike, and nucleocapsid as ligands using VSGB solvation model and OPLS 2005 forcefield. Prime energy and MM-GBSA scores are calculated which refers to the contribution of covalent interaction in the complex and binding energy of protein-protein complex, respectively.
Results
Membrane-Envelope interaction
As shown in figure 1, the protein-protein complex of M and E proteins have been formed by multiple aromatic hydrogen bonds and a π-cation bond through residues Leu51, Thr55, Phe96, and Phe103 towards the N-terminal of Membrane protein (Figure 1 and Table 1). The binding energy calculated for M-E docked complex from the Prime module was found to be 38.96 kcal/mol. On the other hand, the prime energy calculation showed high contribution of covalent interactions with a score of −10906 kcal/mol. Further, the complex was subjected to MD simulations for 100 ns and analyzed for its stability (Supplementary movie 1). We have also calculated the simulated frames’ binding energy at every 25 ns of the trajectory (Figure 5 and Supplementary Table 1). Additionally, the frames at every 25 ns interval are shown in figure 2A-2D and detailed interaction analysis of all four captured snapshots are tabulated in Supplementary Table 2, where multiple residues of M protein such as Asp3, Phe45, Trp55, Phe96, Tyr178, etc. are in contact throughout with the E protein through multiple interactions which makes it a stable complex.
We have observed high binding energy for the M-E complex after simulation i.e., energy from positive to negative scores shows the change in interaction reaction from non-spontaneous to spontaneous. A gradual decrease of ∼20 kcal/mol in prime energy was also observed at regular time interval frame. As shown in figure 2, the M-E complex had shown heavy fluctuations in initial frames till 20 ns but found to be relatively stable with RMSD at ∼8Å throughout rest of the simulation period (Figure 2E). The mean changes of M and E protein residues fluctuations within the interaction site were significantly more compared to the non-interacting region (Figure 2F). Further, the number of hydrogen bonds found increased between both proteins throughout the simulation period, with an average of ∼5 (Figure 2G).
Membrane-Spike interaction
The S protein interact with M in ERGIC, therefore, these two proteins’ docked complex showed promising interactions viz. multiple hydrogen bonds and an aromatic hydrogen bond, π-cation, and π-π stacking, each. The interacting residues of S proteins were found at C-terminal (figure 3, table 1).
The binding energy of the M-S docked complex was calculated to be high, whereas the prime energy calculated to be −49369.2 kcal/mol. We have further investigated the M-S complex’s binding stability through MD simulations up to 100 ns (Supplementary movie 2). From the trajectory, the snapshots at every 25 ns are shown with interacting residues of both proteins which demonstrate that mostly interacting residues are retained during simulations (Figure 4A-4D) and their interactions are illustrated in Supplementary Table 3. As per the interaction analysis, the residues of M protein such as Asn5, Phe96, Arg174, Asn207 are interacting with S protein at multiple regions constantly with multiple strong non-covalent interactions throughout the simulations. The RMSD values from MD simulation trajectory were trending upward from 5 to 18 Å with a little stabilized trajectory in the entire simulation period (Figure 4E). The RMSF plot of the loosely packed S protein model with 1273 residues showed massive fluctuations near 630th-750th residues up to 18Å (Figure 4F). However, the fluctuations in interacting site residues of S protein’s C-terminal were around 18Å. The binding free energy from the simulation trajectory of M-S complexes is represented in Figure 5 (and tabulated in Supplementary Table 1) which has shown a constant decrease in positive binding energy and a gradual increase in prime energy. In final, the average number of hydrogen bonds were ∼12 in M-S complex simulation setup throughout the MD period (Figure 4G).
Membrane-Nucleocapsid interaction
The protein-protein docking of M-N complex showed a total of three residues of M protein viz. Tyr199, Asp209, and His210 are interacting with residues Gly335, Phe314, and Phe286 of N protein via one hydrogen bond and two aromatic hydrogen bonds, respectively (Figure 6 and Table 1). Moreover, the docked complex M-N has attained the high binding energy of −59.8 kcal/mol (Figure 5 and Supplementary Table 1).
Further, the slight changes in interacting residues of the complex are shown in snapshots from the 100 ns simulation trajectory at every 25 ns (Figure 7A-7D; see Supplementary Table 4 for residue interactions). The residues such as Arg150, Asn207, etc. of M protein contribute to the contact establishment with N protein’s residues during simulation period. The M-N protein-protein complex was observed with an average RMSD of approx. 11 Å based on simulation analysis (Figure 7E). However, there was a fluctuating trend in RMSF values throughout the simulation from 2Å to 6Å in N protein residues. These fluctuations may be due to high disorder propensity in N protein and can be seen in the Supplementary movie 3. The RMSF values of interacting residues of M protein were 1.7 Å (Trp58), 1.2 Å (Arg107), 2.1 Å (Asp163) and for N protein 4.9 Å (Lys256), 2.2 Å (Ser184), and 2.9 Å (Tyr268) for 100 ns simulation period (Figure 7F). The number of intermediate hydrogen bonds formed within the simulation setup was ∼ 7 up to 100 ns timescale (Figure 7G). The binding free energy of complexes from the simulation trajectory is higher than the complex (except the frame at 50 ns) obtained from protein-protein docking (Figure 5 and Supplementary Table 1).
Discussion
Intraviral Protein-Protein interactions play an essential role in the coronavirus life cycle, specifically during the replicating complex formation, as elucidated from several structural studies 47–49. The RNA dependent RNA polymerase (nsp12) of SARS-CoV interacts with nsp7 and nsp8 and increases the RNA-synthesizing activity 47. The nsp12-nsp7-nsp8 also associate with the nsp14 (proofreading enzyme) 47. The cryo-EM studies showed that the nsp7 and nsp8 heterodimers stabilize RNA binding regions of nsp12, while the second subunit of nsp8 plays a vital role in polymerase activity 48. Further, structural studies showed that nsp10 interacts with the N-terminal domain of nsp14 to stabilize it and stimulate its activity 49.
Similarly, the SARS-CoV structural proteins have been reported to interact with each other and play an essential role in virus assembly 6,15,28. Therefore, in this study, we report the intraviral PPI among structural proteins of SARS-CoV-2. We have computationally shown that the M proteins interact with other structural proteins to form complexes of M-E, M-S, and M-N, responsible for the proper virus assembly. We have performed protein-protein docking to identify the regions and residues which interact during these bindings. We have investigated these in membrane protein with several interacting structural proteins such as envelope, spike, and nucleocapsid proteins, respectively. Previously, in SARS-CoV, mutation-based studies showed that M protein is vital for virus assembly and interact with other structural proteins 26. The entire C-terminus domain of M proteins was found to interact with N protein 26,29,31. Similarly, two transmembrane domains and the cytoplasmic domain of M protein were reported to interact with E protein 26. There are multiple regions of M protein that interact with spike glycoprotein 26.
Therefore, in this study, we have considered the M protein as a receptor and S, E, and N proteins as protein ligands. We also checked the interaction of M protein with a truncated structure (residues 8-38) of E protein which is in its pentameric form and the interacting residues are shown in Supplementary Figure 1. As revealed in this study, multiple regions of S interacts with M protein, which has also been seen in other coronaviruses 50. The S protein exists in its trimeric form therefore, we have also docked the trimeric crystal structure (PDB ID: 6VXX) with M protein, where, few interactions which include one hydrogen bond and four π-π stacking were observed (Supplementary Figure 2). The M protein is a triple-spanning membrane protein. Its cytosolic region is solely responsible for M-N interaction; therefore, in the case of M-N docking, the cytosolic part of M was targeted for interaction with N protein.
To understand the stability of docked complexes and formed interactions, we have performed 100 ns long MD simulations. The simulation studies showed resilience in docked protein complexes of M-E, M-S, and M-N. The binding energy was found in good agreement with the results and allowed good binding of intraviral structural proteins. Our computational studies agree with previous reports, where particle assembly occurs in the Endoplasmic Reticulum-Golgi intermediate compartment (ERGIC) and finally trafficked for release via exocytosis 51 (Figure 8)
Conclusion
Despite the small genome of viruses, they are highly pathogenic/infectious, and their genome integrity allows them to hijack the host cellular machinery. For rapid infection and replication, viruses follow multiple pathways. In between regulating the host cellular system, it is essential to coordinate among own proteins for proper assembly and genome encapsulation. Here, PPI plays an essential role in coronaviruses where structural protein interacts with each other, encapsulate the genome, and forms mature viruses. It could be a great interest to study these PPIs in drug targeting, as disruption of virus assembly will lead to immature virion formation. In this context, the present study may help to design the mutation-based studies to understand PPI in SARS-CoV-2 and targeting several interacting residues for therapeutic purposes. Also, it would be interesting to investigate these structural proteins’ interaction specifically with several host proteins. Moreover, the driving forces which lead to the formation of proteins assembly and virus particle formations could also be examined. Additional studies on binding mechanism and energy favorable interaction of structural protein could help us in developing new strategies against protein-protein interaction.
Author Contribution
RG, NG: study supervision and designed the experiment. AK and PK acquisition and interpretation of computational data. AK, PK, and RG contributed to paper writing. PK and AK have contributed equally.
Declaration of competing interest
All authors affirm that there are no conflicts of interest.
Conflict of Interest
All authors affirm that there are no conflicts of interest.
Acknowledgments
All the authors would like to thank IIT Mandi for the infrastructure. RG is thankful to IYBA award from DBT, Government of India (BT/11/IYBA/2018/06) and SERB, Govt. Of India (CRG/2019/005603).