Abstract
The clinical manifestation of the recent pandemic COVID-19, caused by novel SARS-CoV-2, varies from mild to severe respiratory illness. Although environmental, demographicand co-morbidity factors have an impact on the severity of the disease, the contribution of mutations in each of the viral genes towards the degree of severity needs to be elucidated for designing better therapeutic approach against COVID-19. Here, we studied the effect of two substitutions D155Y and S171L, of ORF3a protein, found in COVID-19 patients. Using computational simulations we discovered that the substitutions at 155th and 171st positions changed the amino acids involved in salt bridge formation, hydrogen-bond occupancy, interactome clusters, and the stability of the protein. Protein-protein docking using HADDOCK analysis revealed that out of the two observed substitutions, only the substitution of D155Y, weakened the binding affinity of ORF3a with caveolin-1. The increased fluctuation in the simulated ORF3a-caveolin-1 complex suggested a change in the virulence property of SARS-CoV-2.
Importance The binding interaction of viral ORF3a protein to host caveolin-1 is essential for entry and endomembrane trafficking of SARS-CoV-2. The D155Y substitution in SARS-CoV-2 ORF3a is located near its caveolin-binding Domain IV and thus the substitution can interfere with the binding affinity of ORF3a to host caveolin-1. Our in silico study report decreased molecular stability of D155Y mutant of ORF3a and increased fluctuation of the simulated D155Y ORF3a-caveolin-1 complex. Thus, we hypothesize that the D155Y substitution could change the virulence property of SARS-CoV-2.
1. Introduction
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is the causative agent of the novel Coronavirus Disease 2019 (COVID-19)[1]. Till February 2021, 114.1 million cases have been reported worldwide spanning across 215 countries and territories, out of which 11.1 million people have been infected with SARS-CoV2 in India[2]. Mortality rate across the world varies drastically from 9.1% (Mexico) to 0.9% (Turkey) [3]. Although age, ethnicity and sex contribute to the demographic variation in the viral transmission and its case fatality rate, how mutation in viral genomecan change, in such variation forpathological manifestation needs to be explored.
The SARS-CoV-2 genome consists of approximately 30 kilobasesand shares about 82% sequence identity with both SARS-CoV and MERS-CoV. It also shares more than 90% sequence identity for essential enzymes and structural proteins [4]. Despite the similarity, only SARS-CoV-2showssevere pathological manifestations in humans, suggesting theexistence of differential molecular interactions between viral proteins and host cell machinery.
The SARS-CoV-2 genome broadly consists of 14 open reading frames (ORF), which are generated from nested transcription of subgenomic RNAs.Interestingly, ORF1a and 1b encode for 16 non-structural proteins (nsp) known as replicase/transcriptase complex. The other ORFs code for 4 structural proteins and 8 accessory proteins [1,4]. ORF3 is wedged between spike (S) and envelop (E) ORFs and encodes for a membrane-spanning, ion channel protein ORF3a. It is also known as the single largest accessory protein of 275 amino acids[5,6]. Ribosomal profiling has identified two putative overlapping genes, namely ORF3b and ORF3c, at the 3’ end of ORF3 with an alternative reading frame to the canonical ORF3a[7–10], whose functional importance is not well understood.ORF3a can localise at plasma membrane and Golgi complex, and can exist in both glycosylated and non-glycosylated forms[11]. This viral protein has been shown to be highly immunogenic as antisera isolated from SARS-CoV-infected patients can detect ORF3a [10]. Yountet.aland others have shown that ORF3a has been co-evolved with Spike (S) protein, suggesting the possibility of direct or indirect interactions between ORF3a and S protein[10,12,13]. Studies in SARS-CoV-infected Caco2 cells show that ORF3a can be efficiently released in detergent-resistant membrane structures and the diacidic motif, ExD, located within the domain VI, plays importantrole in membrane co-localisation[14]. ORF3a has multi-functional roles including activating NLRP3 inflammasome and NFkB pathway, upregulating fibrinogen secretion, downregulating IFN Type Iand inducing ER stress and pro-apoptotic activity[5,15-17]. Therefore, mutations in this protein warrant further study to understand its role in the virulence and immune evasive potential of the recent SARS-CoV-2.Several mutations have been reported in the ORF3a gene and have been classified in the form of clades and sub-clades.The mutation patterns of ORF3a gene have been characterized as largely non-synonymous(Q57H, H93Y, R126T, L127I, W128L, L129F, W131C, D155Y, S171L, D173Y, G196V, and G251V). G251V and Q57H exhibit severe virulence property[18–21]. Interestingly, the 57th position in ORF3a of pangolin SARS-CoV is H. D155Y and S171L mutations were detectedin Indian patients in May 2020[22]. To understand the functional importance of these mutants, their characterization is needed.
Our study aims to understand the effect of these two substitutions (D155Y and S171L) in the structural stability of the ORF3a protein and its ability to form complex with caveolin-1. Using computational simulation, protein-protein docking we find that the amino acids involved inthe hydrophobic interactions, hydrogen bond formation, salt bridge formation and residue interaction patterns are different inwild type (WT), i.e., the original Wuhan sequencecompared with the two mutants having D155Y and S171L substitutions.
2. Methods
2.1 Bioinformatic Methods
A total of26,656 sequences of ORF3a protein deposited in NCBI database as on Nov 17, 2020 were considered for the bioinformatics analysis. The keywords used for the search were “SARS-CoV-2”, “ORF3a protein”, and “complete structure”. These structures were aligned using the BLAST algorithm on the NCBI website. Some of the post-BLAST sequences were larger than 275 due to erroneous performance of the code. But such cases were very low in number. Subsequently, the erroneous sequences were manually cleaned to obtain the final alignments of the complete protein sequences (275 amino acids). The number of samples whose locations were geo-tagged to India was 614. These sequences were then compared to the Wuhan sequence (NCBI Accession No: YP_009724391.1[23]) and the amino acid positions were compared. The positions, where mismatches were observed with respect to the reference Wuhan sequence (WT), were considered as locationsof mutations. Clearly, lesser number of mutations denote a sequence more similar to the WT, whereas more number of mutations denote a more deviant mutant.Overall the sequences found from NCBI database were compared against the WT and the number of mutations for every position of ORF3awas stored. This essentially provides us with the frequency distribution of the mutations found at each position of ORF3a.We have used PROVEAN score to assess whether the effect of a mutation is deleterious or neutral. PROVEAN score of each mutation was determined using PROVEAN web server [24].
2.2 Preparation of structure of the ORF3a proteins
The cryo-EM structure of WT ORF3a protein of SARS-CoV-2 was obtained from PDB (PDB ID: 6XDC[25]).The symmetry information present in the PDB file was used to convert the structure into the functional dimeric form using PDBe PISA server online [26]. The residues on the second monomer have been numbered using “ ’ ” throughout the manuscript.As no homologous structure of the protein was available, we considered the PDB structure of ORF3a, which has residues from 40th to 238th. Weintroduced the necessary mutations (D155Y and S171L) by modelling the residues in Swiss PDB Viewer[27].
2.3 Molecular dynamics simulation
We performed classical Molecular Dynamics (MD) simulation in AMBER20[28]using AMBER ff14sb force field[29]. The missing hydrogen atoms in the protein structure were added by the LEaP module of AMBER20 package. The protein was then subjected to energy minimisation for 2000 steps using steepest descent and conjugate gradient algorithms. We then solvated the energy minimised structures using rectangular water boxes comprising of TIP3P water molecules [30]. Particle mesh Ewald was used to calculate the electrostatic interactions at a cut-off distance of 12Å. We performed initial minimisation and equilibration in order to avoid bad contacts. This was followed by equilibration using NVT ensemble at 300K for about 500ps. The systems were then equilibrated using NPT ensemble at 1 atm pressure for 1 ns. We considered 2fs as the time step throughout the minimisation-equilibration-production. After the energy values and the density values converged, the systems were subjected to 100ns production runs using NPT ensemble at 300K and 1 atm pressure. The coordinates were saved after intervals of 2ps. We performed the analyses CPPTRAJ module of AMBER[31] and visualizations were performed using VMD [32]. The binding energies for the complexes werecalculated using MMGBSA[33]suite of AMBER.
2.4 Graph theory
We used graph theory to decipher the composition of interactomes in terms of participating amino acids involved in pairwise interactions. Briefly, the graph structure G(V, E, W) is denoted by three sets. The first one is, the set of residues, denoted as the vertex set (V) of a graph. Each individual residue, here, is considered as an independent entity, formally termed as a vertex or a node. Say, we denote V as {v1, v2, v3, …., vn}, where vi is the ith residue. Therefore, |V| = n, where n is the number of nodes in the graph, otherwise also known as the order of the graph. The second set is the set of interactions between residues denoted as the edge set (E). The interaction between the ith and the jth residue may be represented as an edge e(vi, vj) and the edge set E may be represented as {e1, e2, e3, …., em}. Therefore, |E| = m, where m is the number of edges in the graph, otherwise also known as the size of the graph. Please note that the edges have not been considered as directed because, there is no significance of the roles of the interacting residues in these interactions. Weinitially calculate the average energy values (calculated per unit time) over the time of observation for all nC2possible interactions. Some of them turn out to be high and are deemed insignificant. We use a threshold to remove the average energy values of those interactions. Here, m is the number of interactions with significant average energy values.These average energy values represent the importance of the interactions and may be denoted as the set of edge weights (W). For every edge, there is a corresponding edge weight, therefore it may be concluded that |W| = m.
In this work, we are interested in studying the interaction dynamics of the residues. Due to the difference in interaction energies, from observation, we could intuitively understand that a group of residues are more prone to interact among themselves than the other residues. But, to discover underlying densely interacting residue groups or clusters, we apply algorithms that could reveal the clusters accurately. The equivalent problem in residue-residue interaction graphs or networks is known as graph clustering. We have used one of the most popular community detection techniques, Louvain method, to find out the clusters in this residue interaction network[34]. Please note that we have used the terms cluster and community interchangeably. It provides us with a cover C = {c1, c2, c3, …, ck}, where k is the number of communities and ci is the ith cluster/community. Each vertex in V belongs to exactly one of the clusters. Therefore, union of the vertex sets of all the clusters would lead back to V.
2.5 Modelling the protein-protein interaction complex
We used hierarchical approach to predict the structure of the protein in the absence of a suitable template structure for caveolin-1.I-TASSER server was used to generate five initial models[35–37]. One model was selected based on the C-score (confidence score). The model was then evaluated using the SAVES v5.0 server, where Ramachandran plot and ERRAT analyses were performed[38,39]. Model visualizations were done using Chimera[40]. This model was then simulated for 100ns to generate a more stable structure. The average structure was then considered as the initial structure for docking after proper structural evaluation by Ramachandran Plot and ERRAT analyses[38,39]. The two molecules of human caveolin-1 were docked to the WT ORF3a by using HADDOCK[41]. HADDOCK not only considers traditional energetics and shape complementarity, but also incorporates experimental data in terms of restraints to guide the docking of two proteins.The residues of domain IV on ORF3a and the residues Asp82 to Arg101 on human caveolin-1 were defined as active residues in docking based on the cryo-EM structure information[42]. On the basis of the most negative binding energy, we selected a starting structure for the WT ORF3a-caveolin-1 complex. Necessary mutations were introduced in these structures by modelling with the Swiss PDB Viewer. These structures were subjected to all atomic MD simulations.
3 Results
3.1 Worldwide prevalence of D155Y and S171L substitution of ORF3a
ORF3a protein is important for the viral infection, spreading and modulating the host immune system. To understand the role of mutations in the function of this protein, we first checked the prevalence of each mutationsfound in ORF3a. Fig. 1a shows the number of mutant samples at each position of the ORF3a protein in the global and Indian population from a total of 26,656 samples, deposited in NCBI dataset (dated November 17, 2020). From this figure, we observe that mutations occurat 258 positions of the protein for the global population. Whereas, for the Indian population, mutations were found only at 13 positions.Mutations at the 57th position were the highest in both the global and the Indian population. For the Indian population, the positions 155th and 171st showed two instances of mutations while the number of instanceswas 23 and 34 respectively in the global population. Fig. 1b shows the distribution of these mutations worldwide.The mutations in ORF3a protein for both world and Indian populations are distributed in the entire protein.Table S1 shows a list of the mutation counts we have found at all the positions of the ORF3a protein sequence.
The number of instances of the various mutations at different positions of ORF3a protein in the total number of samples considered in this study has been plotted in (a). The global distribution of the mutations at positions 155 and 171 have been shown in blue and red respectively in (b)
3.2 Description of the protein systems
The SARS-CoV-2 ORF3a protein can form dimer[43]. The monomeric ORF3a has been dividedinto six domains, each having its own functional importance[20]. Fig. 2 shows the locations of D155Y and D155’Y (red spheres) and S171L and S171’L (cyan spheres). The locations of these substitutions between domains IV and V, and in domain VI suggest their possible role in caveolin binding, intracellular protein sorting and intracellular membrane trafficking of ORF3a[20].
The structure of WT ORF3a (PDB ID: 6XDC) marking the functional domains as known from literature has been shown. The positions of mutation at the 155th and 171st positions have been shown in orange and cyan spheres respectively.
3.3 Stability of the two ORF3a variants, D155Y and S171L
The cryo-EM structure of WT ORF3a protein (residues 40 to 238) was downloaded from the Protein Data Bank (PDB ID: 6XDC) and processed as discussed in the methodology section. The substitutions D155Y and S171L on each monomer were modelled on the WT structure separately using SwissPDBViewer[27]. Each of these structures was simulated in triplicate till 200ns. Fig. 3 shows thetime evolution of the root mean square deviation (RMSD) of the simulated structure with respect to the starting frame of simulation. Note that WT and both mutants have shown reasonable stability. WT and D155Y (black and red profiles in Fig. 3) showed lesser RMSD (the final RMSD being 2.25Å) and lesser fluctuation, whereas the S171L (green profile in Fig. 3) variant showed higher RMSD (the final RMSD being 2.75Å).The overall fluctuation in RMSD was also greater in S171L compared to the WT and D155Y. This indicates that S171L substitution causes more deviation and may interfere with its membrane localisation.However, the final RMSD values attained by the WT and the two mutants were comparable, indicating a similar final simulated structure.So it can be concluded that the substitutions at D155Y and S171L do not cause a major conformational change of ORF3a from the WT.
The time evolution of the RMSD of the ORF3a proteins with respect to the starting structure. (a) Black: WT, (b) Red: D155Y and (c) Green: S171L
3.4 Differential behaviour of the constituent residues
While RMSDs are a measure of the overall stability of the biological systems under consideration, the B-factor values give an idea on the flexibility of the individual residues. The B-factor values were measured to determine the average flexibility of the protein residues around their mean position across all trajectories (Fig. 4). Fig. 4 shows thatthe residues in the WT ORF3a protein exhibit the least deviation from their mean position, whereas the residues in both mutants show moreflexibility. Interestingly, in the D155Y variant, the 155th residue showed higher flexibility compared to the WT(14.52 Å2 for WT and 61.7 Å2 for D155Y and 16.95 Å2 for WT and 55.3 Å2 for the D155Yat positions 155 and 155’ respectively). Similarly, for the S171L variant, the flexibility of the 171st residue in the mutant was higher than the WT or D155Y(13.69 Å2 for WT and64.94 Å2 and 26.59 Å2 for WT and 124.48 Å2 for S171Lat positions 171 and 171’respectively). The terminal residues are exposed to solvent and are more flexible, resulting in their high B-factor values as seen in Fig. 4 (as marked). Several other residues which are located both near the positions of mutations as well as distallyalso showed greater flexibility. We thus observe an effect of the mutation on the overall dynamics of the protein at distant locations. Thus, these mutations may have allostericeffects on the domain specific function of the ORF3a protein.
The B-Factor plot for ORF3a for the three systems. (a) Black: WT, (b) Red: D155Y and (c) Green: S171L
3.5 Differential contribution of stabilizing residues in the ORF3a proteins
The WT and the mutant ORF3a proteinswere analysed to understand the role of individual amino acids, hydrogen bond occupancies and salt bridges in their structural stability. The free energies of the three variants(WT and the two mutants) were calculated by using the MMGBSA module of Amber20 and are tabulated in Table 1, which shows the differences in stability among them.The S171L mutant with a free energy of −5376.51 (± 19.34) kcal/molwas the most stable, followed by WT (−5356.85± 12.95kcal/mol) and D155Y mutant (−5266.41± 12.56kcal/mol).This indicates that the mutants D155Y and S171L can also exist independently just like the WT ORF3a protein. To understand the contributing factors for these variations in stabilizing energy, we looked at the contributions of eachamino acidto the overall free energy and tabulated the top contributors for each variantin Table 2. While we observe that the group of residues contributing to the overall stability remains almost unchanged among the variants, their ranking differs. For instance, in WT, Arg68 plays the most important role, whereas in case of the mutant systems, it is Arg126’ that has the most contribution. However, Arg68 features as the second most contributory residue in D155Y mutant, whereas in the S171L mutant, it has the fourth position. In this mutant, the Arg126 plays the second most important role, following its corresponding residue on the second monomer.
We also checked the hydrogen bond interactions in WT and the two mutant ORF3a proteins, and found that thetotal number of hydrogen bonds remain same (average number is 95, Fig. S1), inall the three variants.In contrast, the individual residuesthat have the most hydrogen bond occupancy vary among the ORF3a proteins. We found that the top three residue pairs involved in forming hydrogen bonds with the maximum occupancy are Tyr156’-Lys192’, Arg134-Asp155 and Ser205’-Asn144’. In D155Y, the top three residue pairs forming the hydrogen bonds with maximum occupancy are Tyr212’-Thr164, Ser205’-Asn144’ and Ser205-Asn144. In the mutant S171L, the top three residue pairs forming hydrogen bonds with maximum occupancy areLeu203-Asp210, Leu203’-Asp210’ and Thr89-Leu85’. A detailed list is given in Table S2.We also calculated the salt bridge interactions for the WT and the two mutant proteins and tabulated the list of salt bridges in Table S3, which shows that D155Yforms lesser number (24) of salt bridges compared to the WT and the S171L (31 each). Interestingly, mutation at position 155, but not at 171, breaks the salt bridge formation between Asp155-Arg134. This residue pair is formed at the end of the alpha helix and the beginning of a beta sheet in the proximity of domains III and IV of ORF3a.Thus this loss of salt bridge interaction in D155Y is significant and may play a role in the binding affinity of the interacting partner of the ORF3a protein at this region.
3.6 Changes in interactome interactions – Agraph theoretic perspective
The variation in hydrogen base pairing and salt bridges prompted us to check the interactome interactions in ORF3a variants. We represented the interactions between the interactomes in terms of a network. The pairwise hydrophobic interaction energies of the residues in theWT and the two mutants of ORF3a were calculated using MMGBSA suite of Amber20. These hydrophobic interaction energies were considered for building a residue-residue interaction network for the WT and the two mutant proteins. Here, we have used graph data structures and relevant algorithms, to model the interactionsamong the residues.
The spatial orientation of the protein, adjacency of the residues and interactions among them play a role in finding the clusters or communities of interacting residues. In the visualization of the clusters, as seen in Fig. 5a, we see the whole interaction network and an overview of the clusters. In Fig. 5b, we zoom on one part of the graph and provide a closer view of the interactions. The node colours denote its affiliation to a certain cluster. The edge colours are determined by the colours of the nodes it is incident upon. The edge thickness denotes the strength of the interactions between the residues, i.e., the weight of the edge. In Fig. 5c, one residue has been selected to show the nodes adjacent to it (also known as its neighbourhood). Fig. 6 shows these clusters as can be seen in the actual protein. We observe that the membership of the residues in the clusters in each protein has shown substantial variation. A list of the clusters and their constituent residues has been provided for the WT and the two mutants (D155Y and S171L) in Table S4. Fig. 6 and Table S5 indicate that the residues of the functional domains have rearranged in different interacting clusters in WT and the two mutants. Domain III being the largest in size has split into the most number of clusters. However, the clusters are different in terms of the constituent residues for WT and the two mutants. Thus, we may conclude that the mutations have changed the interaction patterns of the interactomes present in the protein. Due to changes in residue interactions, the clusters have changed from WT to the other mutants. But it should be noted that the cluster membership for the nodes in the regions of mutations do not change. This indicates that the mutations may have distal effects too, which can be explored further for better understanding of the protein function.
Visualization of residue interaction network in SARS-CoV-2 ORF3a protein using Gephi[60].
(a) The whole residue interaction network showing the complete cover C, with nodes coloured with the membership colour of a particular cluster, (b) A magnified view of the residue interaction network, and (c) Shows one particular residue (here, GLY209) and the residues it is directly interacting with.
The different interactomes are shown for (a) WT, (b) D155Y and (c) S171L
3.7 Formation of complex with partner protein caveolin-1
We are interested to check if the substitution can change the binding interaction of ORF3a protein with host caveolin-1.Issa et.al. [20] have suggested that domain IV of ORF3a binds to caveolin-1 protein, which is required forviral uptake and regulation[44,45]. We modelled caveolin-1 using a hierarchical approach to predict the structure of the protein. Five initial models were then generated using the I-TASSER server. Out of these, one model was selected based on the C-score (confidence score). This model was then evaluated using the SAVES v5.0 server, where Ramachandran plot analysis and ERRAT analysis were performed as shown in Fig. 7a-b. Ramachandran plot showed that 96.9% of the residues of caveolin-1 were within the favoured and allowed regions, while 3.1% of the residues were in the disallowed regions. On the other hand, ERRAT analysis had an overall quality factor of 89.412.
(a) The ERRAT analysis of the modelled structure of caveolin-1. (b) The distribution of the residues of the modelled structure on the Ramachandran Plot.(c) The ERRAT analysis of the simulated structure of caveolin-1. (d) The distribution of the residues of the simulated structure on the Ramachandran Plot.
The modelled structure of human caveolin-1 was simulated for 100ns to generate a well equilibrated and stable structure. The stability of the simulation, as evident from the time evolution of the RMSD of the protein from its starting structure, has been shown in Fig. S2a. The average structure from this simulation (Fig. S2b) was considered as the starting structure of the ORF3a-caveolin-1 complexesafter proper structural evaluation (Fig. 7c-d). Ramachandran plot of the average simulated and stable structure showed that 98.1% of the residues were within the ranges of favourable and allowed regions, and only 1.9% of the residues were in the disallowed regions. ERRAT plot too showed an improvement with the overall quality factor increased to 94.304%.Thus, both Ramachandran plot and ERRAT analysis (Fig. 7c-d) indicated that thecaveolin-1 model was of acceptable quality, and could be used as the starting structure for docking.
We carried outour protein-protein docking using the HADDOCK webserver[41,46]. We consider binding domains on ORF3a and caveolin-1 as the interacting residues [42]. Our analysis generated twelve probable structures from three clusters as shown in Fig. S3. In each of these structures, we had two molecules of the human caveolin-1 interacting with the dimeric form of ORF3a protein. The top structure from the topmost cluster (left-most structure in first row of Fig. S3), having a HADDOCK score of −155.3 (±22.2) was considered as our starting structure. This structure showed a symmetrical nature. The buried surface area of this complex was found to be 3031.4 (± 181) Å2, signifying a strong complex. The necessary mutations were introduced into the protein-protein complex by Swiss PDBViewer.
The starting structures for WT and the two mutants were simulated for 100ns. The stability for these structures was assessed by plotting the time evolution of their RMSD values with respect to the starting frame of simulation (as shown in Fig. 8). We observe that the WT and the S171L are stable having an average RMSD value around ~−9Å. Although the absolute value is high, yet the protein complexes reached stability and showed a plateau in the RMSD plot from ~40ns, again indicating a stable complex. However, for the D155Y system, the protein complex showed a lot more fluctuation and deviation from the starting structure. This indicates a not-so-stable complex structure, which is further supported by the lower PROVEAN score of D155Y (Table S6). Since the mutation is present in the vicinity of the caveolin binding domain in ORF3a, it can be said that the presence of the mutation leads to an unstable protein-protein complex formation. Thus, the D155Y substitution interfereswith the caveolin binding activity of ORF3a protein. We also calculated the free energy, corresponding to the binding of caveolin-1 to the ORF3a protein, in these three protein complex systems. The values for WT, D155Y and S171L were −37.6385 (±8.3248) kcal/mol, −11.5504 (±2.9333) kcal/moland −31.9254 (± 5.0812) kcal/mol, respectively. From these values, it is evident that the binding affinity of caveolin-1 is considerably less in D155Y mutant compared to WT and S171L. This is in corroboration with the unstable protein-protein complex in the D155Y system. The change in hydrogen bonding, salt bridge patternand hydrophobic interaction pattern associated with D155Y substitution may have contributed to the weakened interaction betweenD155YORF3a and caveolin-1.
The time evolution of the RMSD of the ORF3a-caveolin-1 complex with respect to the starting structure. (a) Black: WT-caveolin-1, (b) Red: D155Y-caveolin-1 and (c) Green: S171L-caveolin-1
4 Discussion
In this study, we have established that D155Y substitution changes the intramolecular hydrogen bonding, salt bridge formation, and disrupts the interaction between ORF3a and caveolin-1.
Several other mutations are present in ORF3a of SARS-CoV-2. Wu et. al. have shown that the incidence of mutation at position 57 is high, compared to the other positions[47]. In order to consider the effect of mutation at the 57th position in D155Y and S171L, we simulated Q57H, Q57H-D155Y and Q57H-S171L variants of ORF3a, for 200ns. Their structural stabilities were calculated with MMGBSA and have been tabulated in Table 3. We noted that the structural stabilities for Q57H and the Q57H-S171L variants were comparable to WT, D155Y and S171L (values listed in Table 1).The Q57H-D155Y variant had considerably lesser stability.Previously Hassan et.al. reported the presence of H at the 57th position in ORF3a protein of pangolin CoV[22]. Thus, we may hypothesize that the presence of H may provide natural stability of ORF3a.We checked the structural stability of W131C, W131R, G172C and G172V, which were found in Indian patients. We simulated the variants W131C, W131R, G172C and G172V for 200ns. These four variants were stable and showed an average RMSD value of 2.5Å with respect to the starting structure as shown in Fig. S4. The overall binding free energies for the four variants were also calculated, and we noted that the stabilities for the systems W131C, G172C and G172V were similar to that of the WT ORF3a protein as listed in Table 1. The variant W131R, was more stable than WT ORF3a. This indicates that these four mutants are very stable and can have independent existence. Further study is needed to check the effects of these substitutions both in silico and in vitro. On the contrary, mutation at 155th position (D155Y) reduced the binding affinity of ORF3a to caveolin. The disrupted interaction can be indicative of improved viral fitness, wherein, the virion particles can continue to build the host intracellular viral load without inducing host cell apoptosis or promoting their egress thus lengthening the asymptomatic phase of the infection. Contrariwise, the ORF3a-caveolin-1 affinity change can also affect the virion internalisation into host cells, endomembrane sorting and assembly of the viral components.
Direct Coupling Analysis revealed eight genes involved in epistatic interactions at several polymorphic loci. The locusof ORF3a is involved in three out of eight potentially significant epistatic links with, namely, nsp2, nsp6 and nsp12. These intragenetic interactions open up the possibility of potential evolutionary links of the above described substitutions at D155Y and S171L with other positively selected loci in viral genes, which is reportedly subject to demographic variations[48]. Moreover, the Neanderthal-derived COVID-19 risk haplotype is altogether positively selected in some populations and has 30% allele frequency thus introducing an evolutionary landscape to the current COVID-19 pandemic[49].
Our simulation studies and further analyses ofORF3a protein have shown that the presence of mutations affects the structural stability of the ORF3a protein. The residues involved in forming several stabilising interactions in these proteins also change with the presence of mutations. Although the overall stability of the protein structures in the WT and the two variants are not much different, these mutations may affect the binding affinity of ORF3a with its partner proteins. For instance, in this study we have shown the presence of mutation drastically reduces the binding affinity of ORF3a with caveolin-1.SARS-CoV-2 enters the host cell by both membrane fusion and by clathrin/caveolin-mediated endocytosis after binding to the ACE2 cell-surface receptors in the upper respiratory tract and alveolar epithelial cells[50–52]. Caveolin/cholesterol mediated endocytosis has been previously implicated in SARS-CoV through an in silico study wherein several caveolin-1 binding domains (CBD) were found in SARS-CoV proteins and internalisation of virion was proposed to be facilitated in a caveolin-1 and lipid raft-dependent manner. However, the role of caveolin-1 was not limited to viral entry.Rather, it was associated with all stages of viral life cycle starting from virus binding to surface receptors, fusion and endomembrane trafficking of virus in caveosomes, sorting of viral components to endomembrane surfaces, replication, assembly and to subsequent egress. The host-derived lipid bilayer surrounding the enveloped viral nucleocapsid contains caveolin-1 incorporated during viral fission from the host membrane [53]. Thus, binding interactions of SARS-CoV-2 ORF3a WT and mutational variants with caveolin-1 provides a putative alternative route for viral pathogenesis in COVID-19. Change in interaction of D155Y ORF3a with caveolin-1 may provide an alternative route to exhibit SARS-CoV-2 virulence properties in COVID-19 patients.
Cryo-EM structural analysis of SARS-CoV-2 proteins shows that ORF3a can exist in dimeric and tetrameric complex arrangements with six functional domains (Domain I to VI) of each protomer. The protein comprises of 3 helices, spanning the transmembrane domain (Domains II and III) and a cytosolic domain with multiple beta-strands (Domain IV, V, VI) [20,54]. Each domain of SARS-CoV ORF3a interacts with different host proteins and modulateshost signalling pathways. Domain II has binding sites for TRAF3 and ASC, and interacts to activate NLRP3 inflammasome[5]. Domain III has a conserved Cysteine residue at 133rd positionknown to stabilize ORF3a homodimer and homotetramers for its ion channel activity[43,54]. Cytosolic domain IV has a conserved motif YDANYFVCW from amino acids 141-149 that binds with host caveolin-1[55]. Finally, domains V and VI, comprising of the YXXφ and diacidicExD motifs respectively, are essential for intracellular viral protein sorting, trafficking and localization of ORF3a to the host membrane followed by its release into culture medium[14]. The various other host proteins as binding partners to ORF3a protein include components of the anti-inflammatory pathway HMOX1, innate immune signalling pathway, TRIM59, glycosylation pathway (ALG5) and nucleus-inner-membrane proteins (SUN2 and ARL6IP6)[56]. ORF3a also regulates Caspase 8-mediated extrinsic apoptotic pathway for its pro-apoptotic activity in HEK293T cells[17]. Thus, mutations in the binding regions or in its close proximity may interfere with the host protein-viral ORF3a interaction, which needs to be further validated. SARS-CoV-2 ORF3a was among the other candidate proteins that has been found to elicit significant CD4+ and CD8+ T-cell response and it has been suggested that an optimal vaccine should be inclusive of class I epitopes derived from M, nsp6 and ORF3a[57–59]. Our in silico study provides support to carry out in vivo and in vitro studies for evaluating viral pathogenesis with mutant SARS-CoV-2.
Author contributions
PB and SSJ conceptualized the study. SG and PB performed experimental work. SG, PB, DM, KB, SS and SSJ contributed to analysis of the results and preparation of the figures. SG, DM and KB were involved in the bioinformatics analysis of the data. SG, DM and KB wrote the initial draft of the manuscript. SSJ and PB edited the manuscript with input from all the authors. All authors agreed to the submission of this work to the Journal of General Virology.
Conflicts of interest
The authors declare that there are no conflicts of interest.
Funding Information
This study was funded by Technical Research Centre (TRC), Indian Association for the Cultivation of Science (IACS), Kolkata, India.
Acknowledgements
This work was supported by the Indian Association for the Cultivation of Science (IACS). We thank Google Cloud Research Credits (GCRC) program and the high-performance computing (HPC) support provided by Fluid Numerics. We also thank IACS for fellowship to SG and DM, DST-INSPIRE for fellowship to KB, and SERB-DIA (DIA/2018/000005) award to SSJ.
Abbreviations
- ARL6IP6
- ADP Ribosylation Factor Like GTPase 6 interacting protein 6
- ASC
- Apoptosis-associated speck-like protein containing a caspase recruitment domain
- BLAST
- Basic Local Alignment Search Tool
- CD4+
- Cluster of Differentiation 4+
- CD8+
- Cluster of Differentiation 8+
- COVID-19
- Coronavirus Disease 2019
- Cryo-EM
- Cryo Electron Microscope
- HMOX1
- Heme Oxygenase 1
- IFN
- Interferon
- MERS-CoV
- Middle East respiratory syndrome coronavirus
- MMGBSA
- Molecular mechanics with generalized Born and surface area solvation
- NCBI
- National Centre for Biotechnology Information
- NF-κB
- Nuclear factorkappa light chain enhancer of activated B cells
- NLRP3
- Nucleotide-binding oligomerization domain Leucine rich repeat and Pyrin domain containing
- ORF
- Open Reading Frame
- PDB
- Protein Data Bank
- PISA
- Protein Interfaces Surfaces and Assemblies
- PROVEAN
- Protein Variation Effect Analyzer
- RMSD
- Root Mean Square Deviation
- SUN2
- SUN domain-containing protein 2
- TRIM59
- Tripartite motif-containing protein 59.