Abstract
The spread of the COVID-19 caused by the SARS-CoV-2 outbreak has been growing since its first identification in December 2019. The publishing of the first SARS-CoV-2 genome made a valuable source of data to study the details about its phylogeny, evolution, and interaction with the host. Protein-protein binding assays have confirmed that Angiotensin-converting enzyme 2 (ACE2) is more likely to be the cell receptor via which the virus invades the host cell. In the present work, we provide an insight into the interaction of the viral spike Receptor Binding Domain (RBD) from different coronavirus isolates with host ACE2 protein. We used homology-based protein-protein docking, binding energy estimation, and decomposition, aiming to predict both qualitative and quantitative aspects of the interaction. Using in silico structural modelling, we brought additional evidence that the interface segment of the spike protein RBD might be acquired by SARS-CoV-2 via a complex evolutionary process rather than mutation accumulation. We also highlighted the relevance of Q493 and P499 amino acid residues of SARS-CoV-2 RBD for binding to hACE2 and maintaining the stability of the interface. Finally, we studied the impact of eight different variants located at the interaction surface of ACE2, on the complex formation with SARS-CoV-2 RBD. We found that none of them is likely to disrupt the interaction with the viral RBD of SARS-CoV-2.
Introduction
The coronavirus SARS-CoV-2 (previously known as nCoV-19) has been associated with the recent epidemic of acute respiratory distress syndrome [9], [2]. Recent studies have suggested that the virus binds to the ACE2 receptor on the surface of the host cell using the spike protein and explored the binary interaction of these two partners [4, 12, 28]. In this work, we focused our analysis on the interface residues to get insight into four main subjects: (1) The architecture of the interface of the spike protein and whether its evolution in many isolates supports an increase in affinity toward the ACE2 receptor; (2) How the affinity of SARS-COV-2-RBD and SARS-CoV-RBD toward different ACE2 homologous proteins from different species is dictated by a divergent interface sequences (3); A comparison of the interaction hotspots between SARS-CoV and SARS-CoV-2; and finally, (4) whether any of the studied ACE2 variants may show a different binding property compared to the reference allele. To tackle these questions we used multiscale modelling approaches combined to sequence and phylogeny analysis.
Materials and Methods
Sequences and data retrieval
Full genome sequences of 10 Coronaviruses isolates were retrieved from NCBI GeneBank corresponding to the following accessions: AY485277.1 (SARS coronavirus Sino1-11), FJ882957.1 (SARS coronavirus MA15), MG772933.1 (Bat SARS-like coronavirus isolate bat-SL-CoVZC45), MG772934.1 (Bat SARS-like coronavirus isolate bat-SL-CoVZXC21), DQ412043.1 (Bat SARS coronavirus Rm1), AY304488.1 (SARS coronavirus SZ16), AY395003.1 (SARS coronavirus ZS-C), KT444582.1 (SARS-like coronavirus WIV16), MN996532.1 (Bat coronavirus RaTG13) in addition to Wuhan seafood market pneumonia virus commonly known as SARS-CoV-2 (accession MN908947.3).
The sequences of the surface glycoprotein were extracted from the Coding Segment (CDS) translation feature from each genome annotation or by locally aligning the protein from SARS-CoV-2 with all possible ORFs from the translated genomes. ACE2 orthologous sequences from Human (Uniprot sequence Q9BYF1), Masked palm civet (NCBI protein AAX63775.1 from Paguma larvata), Chinese rufous horseshoe bat (NCBI protein AGZ48803.1 from Rhinolophus sinicus), King cobra snake (NCBI protein ETE61880.1 from Ophiophagus hannah), chicken (NCBI protein XP 416822.2, Gallus gallus), domestic dog (NCBI protein XP 005641049.1, Canis lupus familiaris), Wild boar (NCBI protein XP 020935033.1, Sus scrofa) and Brown rat (NCBI protein NP 001012006.1 Rattus norvegicus) were also computed and retrieved.
Human variants of the ACE2 gene were collected from the gnomAD database [7]. Only variants that map to the protein coding region and belonging to the interface of interaction with the RBD of the spike protein were retained for further analyses.
Sequence analysis and phylogeny tree calculation
MAFFT was used to align the whole genome sequences and the protein sequences of viral RBDs [8]. For the genome comparison, we selected the best site model based on lowest Bayesian Information Criterion calculated using model selection tool implemented in MEGA 6 software [21]. The General Time Reversible (GTR) model was chosen as the best fitting model for nucleotide substitution with discrete Gamma distribution (+G) with 5 rate categories. For the RBD sequences, the best substitution model for ML calculation was selected using a model selection tool implemented on MEGA 6 software based on the lowest Bayesian Information criterion (BIC) score. Therefore, the WAG model [24] using a discrete Gamma distribution (+G) with 5 rate categories has been selected.
Phylogenetic trees were generated using a maximum likelihood (ML) method in MEGA 6. The consistency of the topology, for the RBD sequences, was assessed using a bootstrap method with 1000 replicates. The resulting phylogenetic tree was edited with iTOL [13].
Homology based protein-protein docking and binding energy estimation
The co-crystal structure of the spike protein of SARS coronavirus complexed to human human-civet chimeric receptor ACE2 was solved at 3 Å of resolution (PDB code 3SCL). We used this structure as a template to build the complex of spike protein from different virus isolates with the human ACE2 protein (Uniprot sequence Q9BYF1). The template sequences of the ligand (spike protein) and the receptor (ACE2) were aligned locally with the target sequences using the program Water from the EMBOSS package [16]. Modeller version 9.22 [18] was then used to predict the complex model of each spike protein with the ACE2 using a slow refining protocol. For each model, we generated ten conformers from which we selected the model with the best DOPE score [19].
To calculate the binding energy we used the PRODIGY server [27]. The Calculation of contribution of each amino acid in a protein partner was computed with MM-GBSA method implemented in the HawkDock server [23]. Different 3D structures of hACE2, each comprising one of identified variants, were modeled using the BuildModel module of FoldX5 [3]. Because it is more adapted to predict the effect of punctual variations of amino acids, we used DynaMut at this stage of analysis [17].
Flexibility analysis
We ran a protocol to simulate the spike RBD fluctuation of SARS-CoV-2 and SARS-Cov using the standalone program CABS-flex (version 0.9.14) [11]. Three replicates of the simulation with different seeds were conducted using a temperature value of 1.4 (dimensionless value related to the physical temperature). The protein backbone was kept fully flexible and the number of the Monte carlo cycles was set to 100.
Results
Sequence and phylogeny analysis
Phylogenetic analysis of the different RBD sequences revealed two well supported clades. Clade 2 includes SARS-CoV-2, RatG13, SZ16, ZS-C, WIV16, MA15, and SARS-CoV-Sino1-11 isolates (Figure 1A). SARS-CoV-2 and RatG13 sequences are the closest to the common ancestor of this clade. Clade 1 includes Rm1 isolate, Bat-SL-CoVZC45 and Bat-SL-CoVZXC21. These three isolates are closely related to SARS-CoV-2 as revealed by the phylogenetic tree constructed from the entire genome (Figure 1A). The exact tree topology is reproduced when we used only the RBD segment corresponding to the interface residues with hACE2. This is a linear sequence spanning from residue N481 to N501 in SARS-CoV-2.
Multiple sequence alignment showed that the interface segment of SARS-CoV-2 shares higher similarity to sequences from clade 2 (Figure 1B). However, we noticed that S494, Q498 and P499 are exclusively similar to their equivalent amino acids in sequences from clade 1. SARS-CoV-2 interface sequence is closely related to Bat-CovRaTG13 sequence, isolated from a Rhinolophus affinis bat.
Prediction of the RBD/hACE2 complex structure
To investigate whether the interface of the spike protein isolates evolves by increasing the affinity toward the ACE2 receptor in the final host, we predicted the interaction models of the envelope anchored spike protein (SP) from several clinically relevant Coronavirus isolates with the human receptor ACE2 (hACE2). The construction of the complex applies a comparative-based approach that uses a template structure in which both partners (ligand and receptor) are closely related to those in the target system respectively. In our study, we only modeled the interaction of the RBD which was shown to be implicated in the physical interaction with ACE2 receptor (Figure 2A). The lowest sequence identity of the modeled spike proteins as well as those of any of the orthologous ACE2 sequences (Human, civet, bat, pig, rat, chicken and snake) do not fall below 63% compared to their respective templates. At such values of sequence identities between the equivalent partners of the receptor or the ligand, it is expected that the template and the target complexes share the same binding mode [1, 10].
Analysis of energy Interaction of hACE2 with other virus
We calculated the binding energy of RBD from different virus isolates interacting with hACE2 using the PRODIGY method (Figure 2b). The binding energies are converted to dissociation constant estimations (Kd). SP proteins from bat-SL-CoVZC45, bat-SL-CoVZXC21 and Rm1-Cov show the highest values (least favorable) which are all above 50 nM. All the other estimations fall below 18 nM. The interaction between hACE2 and the RBDs of SARS-CoV-2 isolate (Whuhan-Hu-1) and the SARS-CoV-Sino1-11 show Kd values of 5.1 and 18 nM, respectively.
0.1 Energy analyses of human SARS-CoV-2 and SARS-CoV with different animal ACE2 receptors
We made this analysis to investigate the tendency of SARS-CoV-2 and SARS-CoV to interact with different orthologous forms of ACE2 which is dictated by the divergence in their interacting surfaces. Homology based protein-protein docking was conducted to generate the interaction model of SARS-CoV-2 RBD with ACE2 receptor from different animal species (Figure 2C). We noticed that for the civet, dog, chicken and snake forms, the interaction energy is very low and very similar either for SARS-CoV-2 or SARS-CoV. Although the Kd are relatively low for the rat and bat forms interacting with SARS-CoV RBD, those of SARS-CoV-2 are high and go beyond 50 nM. On the other hand, it seems that the interaction with pig ACE2 is more favorable for SARS-CoV-2 isolate since the estimated Kd is 3 folds lesser compared to SARS-CoV-2.
Decomposition of the interaction energy
The MM-GBSA calculation allowed us to assign the contribution in the binding energy of each amino acid in the interface with hACE2. We carried this analysis using both sequences of the SARS-CoV-2 Wuhan-Hu-1 (Figure 3A) and the Sino1-11 SARS-CoV (Figure 3B) isolates. Residues F486, Y489, Q493, G496, T500 and N501 of SARS-CoV-2 RBD form the hotspots of the interface with hACE2 protein were investigated (we only consider values > 1 or < 1 kcal/mol to ignore the effect due to the thermal fluctuation). All these amino acids form three patches of interaction spread along the linear interface segment (Figure 3C): two from the N and C termini and one central. T500 establishes two hydrogen bonds using its side and main chains with Y41 and N330 of hACE2. N501 forms another hydrogen bond with ACE2 residue K353 buried within the interface. On the other hand, SARS-CoV RBD interface contains five residues (Figure 3D), L473, Y476, Y485, T487 and T488 corresponding to the equivalent hotspot residues of RBD from SARS-CoV-2 F487, Y490, G497, T501 and N502. Therefore, Q493 as a hotspot amino acid is specific to SARS-CoV-2 interface. The equivalent residue N480 in SARS-CoV only shows a non-significant contribution of 0.18 kcal/mol.
Flexibility analysis
Sequence analysis and the visual inspection of RBD/hACE2 complex might reflect the substitution of P499 in SARS-CoV-2 RBD as a form of adaptation toward a better affinity with the receptor. In order to further investigate its role, we performed a flexibility analysis using a reference structure (SARS-CoV-2 RBD containing P499) and an in silico mutated form P499T, a residue found in SARS-CoV and most of the clade 2. Our results show that the mutation caused a significant decrease in stability for nine residues of the interface corresponding to segment 482-491 (Figure 3E). Indeed, the RMSF variability per amino acid for this sequence increases compared to the reference structure.
Analysis of ACE2 variability and affinity with the virus
A total of eight variants of hACE2 that map to the interaction surface are described in the gnomAD database. All these variants are rare (Table 1) and mostly found in European non-Finnish and African populations. Considering both the enthalpy (ddG) and the vibrational entropy in our calculation (ddS), we found no significant change in the interaction energy (> 1 or < 1 kcal/mol) (Figure 4).
Discussion
Since the Covid-2019 outbreak, several milestone papers have been published to examine the particularity of SARS-CoV-2 spike protein and its putative interaction with ACE2 as a receptor [25, 26]. In the current study, we focused our analysis on the interface segments of SARS-CoV-2 spike RBD interacting with ACE2 from different species by estimating interaction energy profiles.
We have studied the effect of eight variants of ACE2 in order to detect polymorphisms that may increase or decrease the virulence in the host. We concluded that if ACE2 is the only route for the infection in humans, variants interacting physically with RBD are not likely to disrupt the formation of the complex and would have a marginal effect on the affinity. Therefore, it is unlikely that any form of resistance to the virus, related to the ACE2 gene, exists. However, this analysis merits to be investigated in depth in different ethnic groups for a better assessment of the contribution of genetic variability in host-pathogen interaction. The binding of SARS-CoV-2 RBD to different forms of ACE2 shows similar values compared to SARS-CoV which is in agreement with the ability of the virus to cross the species barrier. However, although we estimated low Kd value between SARS-CoV RBD and Gallus gallus ACE2, there have been no reported cases of SARS-CoV isolated from chicken. On the other hand, while Kd is high for SARS-CoV RBD interacting with porcine ACE2, there are only few reported cases of such type [20]. Indeed, host-pathogen interaction is a complex process unlikely to be controlled only by the binding of the spike protein with ACE2 [5]. For model animals where Kd values are very low, ACE2 analysis may play a key role in targeting the main animal reservoir of SARS-CoV-2. For high Kd values, other factors might regulate the infection including the implication of different receptors or the response of the immune system but that does not mean that the infection is unlikely to occur. On the other hand, whole-genome phylogenetic analysis of the different isolates included in this study is consistent with previous works that place the Wuhan-Hu-1 isolate close to Bat-SL-CoVZC45 and Bat-SL-CoVZXC21 isolates [14,15,22] within the Betacoronavirus genus. The use of RBD sequences, however, places the virus in a clade that comprises SARS-CoV related homologs including isolates from Bat and Civet. The clade swapping as seen in figure 1A, seems also to occur for RaTG13 and Rm1 isolated from bat. This is expected as the use of different phylogenetic markers may considerably affect the topology of the tree. However, given the functional implication of the spike RBD in host-pathogen infection, we have raised the question about where the virus obtained its RBD binding interface. The binding of the spike glycoprotein to ACE2 receptor requires a certain level of affinity. In the case where the RBD evolves from an ancestral form closer to that of Bat-SL-CoVZC45 and Bat-SL-CoVZXC21, we expected a decrease of the binding energy through the evolution process following incremental changes in the RBD. In such a scenario, we presume that there are other intermediary forms of coronavirus that describe such variation of the binding energy to reach a level where the pathogen can cross the species barriers and infect humans with high affinity toward hACE2. On the other hand, our results show that the binding energy and the interface sequence of SARS-CoV-2 RBD are closer to SARS-CoV related isolates (either from Human or other species). Therefore a recombination event involving the spike protein that might have occurred between SARS-CoV and an ancestral form of the current SARS-CoV-2 virus might be also possible. This will allow for the virus to acquire a minimum set of residues for the interaction with hACE2. The recombination in the spike protein gene has been previously suggested by Wei et al in their phylogenetic analysis [6]. Thereafter, incremental changes in the binding interface segment will occur in order to reach a better affinity toward the receptor. One of these changes may involve P499 residue which substitution to threonine seems to drastically destabilize the interface segment and has a distant effect. Moreover, the decomposition of the interaction energy showed that 5 out of 6 hotspot amino acids in SARS-CoV-2 have their equivalent in SARS-CoV including N501. Contrary to what Wan et al [22] have stated, the single mutation N501T does not seem to enhance the affinity. Rather, the residue Q493 might be responsible for such higher affinity due to a better satisfaction of the Van der Waals by the longer polar side chain of asparagine. Indeed, when we made the same analysis while mutating Q493 to N493, the favorable contribution decreases from −2.55 kcal/mol to a non significant value of −0.01 kcal/mol, thus supporting our claim.