Structure adaptation in Omicron SARS-CoV-2/hACE2: Biophysical origins of evolutionary driving forces

Since its emergence, the Covid19 pandemic has been sustained by a series of transmission waves initiated by new variants of the SARS-CoV-2 virus. Some of these arise with higher transmissivity and/or increased disease severity. Here we use molecular dynamics simulations to examine the modulation of the fundamental interactions between the receptor binding domain (RBD) of the spike glycoprotein and the host cell receptor (human angiotensin-converting enzyme 2: hACE2) arising from Omicron variant mutations (BA.1 and BA.2) relative to the original wild type strain. We find significant structural differences in the complexes which overall bring the spike protein and its receptor into closer proximity. These are consistent with and attributed to the higher positive charge on the RBD conferred by BA.1 and BA.2 mutations relative to the wild type. However, further differences between sub-variants BA.1 and BA.2 (which have equivalent RBD charges) are also evident: Mutations affect interdomain interactions between the up-chain and its clockwise neighbor chain, resulting in enhanced flexibility for BA.2. Consequently, additional close contacts arise in BA.2 which include binding to hACE2 by a second spike protein monomer, in addition to the up-chain - a motif not found in BA.1. Finally, the mechanism by which the glycans stabilize the up state of the Spike protein differs for the wild type and the Omicrons. We also found the glycan on N90 of hACE2 turns from inhibiting, to facilitating the binding to Omicron spike protein. These structural and electrostatic differences offer further insight into the mechanisms by which viral mutations modulate host cell binding and provide a biophysical basis for evolutionary driving forces.


I. INTRODUCTION
Coronavirus disease-2019 (COVID-19) has been responsible for more than 6 million deaths and over 500 million cases worldwide over the course of the ongoing pandemic and it remains a significant global health threat. It is caused by the severe acute respiratory syndromecoronavirus 2 (SARS-CoV-2) -a single-stranded RNA-enveloped virus and the seventh known coronavirus to infect humans. Since its wild type strain (WT) was first discovered in Wuhan, China, the virus has undergone continuous mutation leading to a succession of so-called variants of concern (VOC). Designated by the World Health Organization as Alpha to Delta, and Omicron, these VOCs occur with increasing transmission rate and variable pathogenicity.
In common with other coronaviruses, the virion surface consists of spike (S) glycoproteins comprising two (S1 and S2) subunits. The S2 subunit embeds the spike in the viral envelope and mediates viral fusion with the host cell membrane upon protease activation. The S1 subunit is involved in recognition and binding to the peptidase domain of the host receptor angiotensin-converting enzyme 2 (hACE2). It includes a trimer of N-terminal domains (NTD), and receptor binding domains (RBD) each of which exists in either of two discrete up or down conformations in the pre-fusion state. Only in the up-conformation is the receptor binding site exposed. Typically, in the active conformation one of the three RBD domains is in the up-state with the other two in the down position [1].
The, now dominant, Omicron strain is unique among known variants as it exhibits a much larger number of point mutations than did any of its predecessors -more than 30 on the S-protein, of which over half are found in the RBD. Since vaccine strategies rely heavily on the spike sequence, the emergence of such hypermutated strains threaten to weaken the neutralizing activity of vaccine-induced antibodies. Indeed, there is evidence that Omicron subvariants BA.1, BA.2 exhibit immune escape among double vaccinated hosts, and require a third dose to induce the neutralizing immune response [2,3].
The molecular mechanisms by which the SARS-CoV-2 RBD recognizes and associates with hACE2 are therefore fundamental to the infection process. They are also essential for elucidating the biophysical consequences of mutation and, thereby, the molecular basis for evolutionary driving forces. For example, relative to WT, the net positive charge on each RBD increases by two and three for Delta and Omicron variants, respectively. This increase in charge is accompanied both by higher infection rate and reduced potency of neutralizing antibodies [4][5][6][7][8][9][10][11][12]. Both effects are consistent with evolutionary pressure toward RBD mutations which increase positive charge: Firstly, as hACE2 is a charge negative entity [13] thus the electrostatic interaction is increased; and secondly, the electrostatic surfaces of neutralizing antibodies reveal that most antibodies have positively charged RBD-recognition domains. However, so far reports generally have focused on binding free energy difference.
Most of them were evaluated based on the binding structure of RBD+hACE2 for WT.
Structural changes in the RBD+hACE2 complex arising from Omicron-lineage mutations have been less explored in comparison.
The effects of the individual point mutations on RBD with respect to infectivity have also been another focus of recent studies. For example, N501Y, the RBD mutation in the Alpha variant enhanced the binding affinity to hACE2 [14,15]; E484K in Beta and Gamma variants can reduce the neutralization activity of human sera [16]; L452R in Delta and BA.2 variants have been shown to strengthen fusogenicity and infectivity by enhancing the cleavage of spike protein; and K417N removes the interfacial salt bridge between the RBD and neutralizing antibody CB6 [11]. Questions naturally arise concerning whether there are more ways to associate the high infectivity and the many point mutations on each RBD for BA.1/BA.2.
Fifteen and sixteen point mutations occur on each RBD of BA.1 and BA.2, respectively, and affect the interaction between RBD and hACE2. The mutations in the RBD of BA.1 and BA.2 contribute to an increase of the net positive charge by three compared to WT. A stronger electrostatic attraction of the RBD+hACE2 complex is thus expected for Omicron than WT. However, this leaves an obvious question as to why BA.2 is more infectious than BA.1 [17] as both have the same net charge at the RBD. Thus specifics of the localized electrostatic landscape must matter (i.e. interactions involving specific mutations) in binding the complex system.
Another key feature of the S-protein is its extensive glycosylation. Each monomer Sprotein displays 22 potential N-linked and four O-linked glycosylation sites [18,19]. Cryoelectron microscopy provides evidence for the existence of 17 N-glycans on 22 potential sites in the SARS-CoV-2 S-protein [1,20], while the remaining five sites are found unoccupied [19]. These glycans shield about 40% of the trimeric S-protein surface [21], as well as other epitopes from cells and antibody recognition and enable the coronavirus to evade both the innate and adaptive immune responses [1,22] Thus, glycosylation plays a vital role in the pathogenesis of virus such as coronavirus [18,23,24] Additionally, receptor hACE2 is also heavily glycosylated, and the glycan on N90 of hACE2 (N90 hACE2 ) was found to be able to interfere with the binding between hACE2 and S-protein [25,26], whereasN322 hACE2 interacts tightly with the RBD of the hACE2-bound S-protein and strengthens the complex [26]. Therefore, it is evident that glycans, either on the S-protein or on hACE2, are an important factor for the formation of S+hACE2 complex.
Herein, we use molecular dynamics (MD) simulations to investigate the structural and biophysical mechanisms by which mutations in SARS-CoV-2 modulate receptor recognition and binding. We focus on the Omicron variant BA.1, and its sublineage BA.2 in complex with hACE2 and compared these to that of the WT. In addition to elucidating the effects of mutations in the binding of this complex system, we also examine the role of the glycans at the interface in forming S+hACE2 complex. The structures for hACE2 and WT S-protein in the up state were modeled using the PDB structures 6VW1 [27] and 6VSB [20], respectively. The spatial arrangement of the three chains in the S-protein is shown in Figure 1. The color code of each S-protein chain is followed throughout this paper: chain A is the up-chain (cyan); chain B is the counterclockwise chain (yellow); chain C is the clockwise neighbor chain (lime). We obtained these from the CHARMM-GUI [28] Archive-COVID-19 Protein Library which arranged the position of hACE2 and S-protein, as given by Woo et al. [29]. Here, Woo et al. [29] modified the amino acid sequence of 6VSB to match the WT and had the remaining missing atoms of S-protein reconstructed.
Using the sequence of the WT S-protein as the template, we generated the Omicron subvariants BA.1 [30] and BA.2 [31] using VMD [32,33] by introducing the corresponding sequence changes (using the VMD plugin psfgen). The full glycosylation model (for hACE2 and S-protein proposed by Shajahan et al. [34] and Woo et al. [29], respectively) was introduced to the protein structures using the Glycan Modeler within CHARMM-GUI [35].

B. MD Simulation Protocal
All simulations were carried out using software NAMD 2.12 [37] and the CHARMM36 force field [38,39]. First the initial system was energy minimized using steepest descent to remove atom overlap and then the solvent relaxed by running a MD simulation where the backbone atoms were restrained with a force constant of 5 kcal mol −1Å−2 for a simulated 10 ns duration. Afterwards, an unrestrained MD production run was performed for just over 1 µs. For each MD run, a 2 fs timestep was used with periodic boundary conditions and NPT ensemble (at 310 K and 1.01325 bar conditions) applied using a Langevin thermostat (1 ps −1 damping) and Nosé-Hoover Langevin piston barostat (50 fs period and 25 fs decay). The non-bonded interaction cut-off was set at 12Å and switch distance at 10Å. Bonds involving Hydrogen were made rigid using the SHAKE algorithm [40]. Electrostatics were calculated using Particle Mesh Ewald approach [41] (1Å grid-spacing and 6 PME interpolation order).
To avoid translation, rotation and to limit swaying motions, part of the S2 subunit (over residues 701-729, 787-824, 866-939, and 1035-1146 on all three chains) was anchored via restraints on the backbone atoms using 20 kcal mol −1Å−2 force constant. Details in analyses the key measurements can be found in the supplemental information.

A. Two types of S+hACE2 conformations
We found a trend change in the complex structures around 600 ns (see the left panel of Figure S2), thus, we analyzed the data from the last 400 ns of the simulation. The representative structures after 600 ns are displayed in Figure Figure 2). We found that the time averaged value for θ RBD-A ranked WT > BA.2 > BA.1 ( Table I, left panel of Figure S2). We also measured the opening angle θ RBD-C to check whether chain C becomes open in order to bind hACE2 ( Table I, right panel of Figure   S2). We find that the Omicrons have larger θ RBD-C than WT (i.

The Omicron variants have more compact S-protein + hACE2 complexes
To understand why θ RBD-A varies we measured the radius of gyration of the trimeric Sprotein, R S gyr , and the combined S + hACE2 structure, R S+hACE2 gyr (Table I, Figure S3(a)).
For the Omicrons, R S gyr shows only a slight decrease, by 0.3-0.4Å, compared to WT, whereas for R S+hACE2 gyr BA.1 decreases by 0.8Å and BA.2 by 1.5Å compared to WT. The smaller R S+hACE2 gyr for BA.2 implies a more compact structure and suggests that BA.2 is the most tightly bound to hACE2 of the three variants.
We investigated the center of mass (COM) distances d RBD-α hACE2 to find out how each individual chain α contributes to the change in R S+hACE2 gyr (see Table II, Figure S3(b-d)). Clearly, BA.2 has the shortest distance between hACE2 and any of its three RBDs. For RBD-A, BA.1 is the longest, but not able to explain the slightly shorter R gyr for BA.1 compared to WT without taking into account the contribution of the other two RBDs.   position. The middle panel of Figure 2 shows that hACE2 changes its orientation from binding to WT to BA.2 (S2 of WT and BA.2 S-proteins aligned). This is the origin of the smaller θ RBD-A as well as the closeness of hACE2 to RBD-C. The right panel of Figure 2 includes also the RBD-A, RBD-C, and corresponding hACE2 of BA.1, allowing one to see the gradual change of the relative position of hACE2 from WT, BA.1, to BA.2.
To understand what drives the S+hACE2 complex into the two observed conformations described we performed a detailed study into the contact interactions. Contact comes in two forms: direct interaction between amino-acids and indirect interaction through the selected glycans attached to S-protein and hACE2. In the next few sections we will look at each in turn.
B. Direct interaction between S-protein and hACE2

Variation in key contacts hACE2···RBD-A involving mutated residues
From our simulation trajectories we identified the contacts between RBD-A and hACE2.
One might assume that the positively charged mutations on the RBD (WT → Omicron) enhance the RBD-A binding with hACE2. However, our results showed it more likely, that the mutations change the contact partners on hACE2, as summarized in Table III  can be more than one residue with high probability of making close contact. Experimental assays and protein structure analysis by Bhattacharjee et al. [44] showed that hACE2 interfacial residues can bind to multiple S-protein residues, for example, K353 hACE2 can interact with as many as six S-protein residues Y505, N501, G496, Q498, G502, and Y495. We see K353 hACE2 interact with G496/G496S, N501/N501Y, and Y505/Y505H, as well as Q493R (but not Q493 WT).
The mutations D405N and R408S are unique to BA.2 as they are not present in BA.1.
D405N eliminates one negative charge and R408S eliminates one positive charge, therefore The contact probabilities are plotted in Figure S4, and additional contact details are collected in Table S1 including contact pairs for residues that are not mutated.

The difference in contact probability of RBD-A···hACE2 between variants
With respect to the residues on RBD-A, we evaluated the difference in the probability of contact between RBD-A and hACE2 for the two Omicron subvariants against WT, as depicted in Figures 3a, 3b. We find that contacts involving the above mentioned RBD-A residues G496S A for BA.1 and D405N A , N440K A for BA.2 show a large increase in occurrence after mutation, (54%, 71%, 91% respectively). For both BA.1 and BA.2, the mutation at K417N A results in the loss of contact to hACE2 (a 98% reduction in contact with D30 hACE2 ). This observation is consistent with Luan and Huynh [11] who suggest that this may be for eluding antibodies.
For BA.1 (Figure 3a), A475 A and G476 A also lose contact with hACE2 substantially (89% and 54%, respectively), but is replaced by a new nearby contact at Q474 A (to Q24 hACE2 ).
In BA.2 E484A A , another common mutation for BA.1 and BA.2, takes over the contact to L79 hACE2 from G485 A of WT and BA.1, thus showing another example of changing contact partner.
To determine whether the S-protein mutations affect the accessibility of RBD-A to hACE2, we evaluated the probability difference of residues on hACE2 in close contact with RBD-A (Figures 3c,3d). Compared to WT, the contact probability for hACE2 to RBD-A is slightly lower for BA.1 (overall probability ratio 0.94) but slightly higher for BA.2 (overall probability ratio 1.05). The residues of contact on hACE2 differ between BA.1 and BA.2 mainly at E329 hACE2 and A386/A387 hACE2 where BA.2 establishes close contact using There is a probability decrease at E37 hACE2 and increase at D38 hACE2 which is caused by the contact between E37 hACE2 and Y505 A of WT seemingly being eliminated in BA.1/BA.2 by the mutation Y505H A , and by a salt bridge Q493R A ···D38 hACE2 forming in preference to E37 hACE2 . This change of binding pattern was also found by Lupala et al. [46], in the case of mutation Q493K (rather than Q493R seen here). Besides binding to E37 hACE2 , Y505 A of WT additionally binds to R393 hACE2 , whereas for BA.1/BA.2 the mutated Y505H A leaves R393 hACE2 largely solvated by water and rarely connected to hACE2. Furthermore, M82 hACE2 becomes water-exposed in BA.1/BA.2, and the contact L85 hACE2 ···F486 RBD-A is 42% and 63% more likely to occur for BA.1 and BA.2, respectively, than for WT.
The interactions between E329/A386/A387 hACE2 and RBD-A are unique for BA.2, and may be relevant to the difference between BA.1 and BA.2: the fact that more interfacial hACE2 residues are in close contact to RBD-A of BA.2 also suggests that BA.2 is more tightly bound than the others, consistent with its smaller R S+hACE2 gyr . We calculated the average distance between the lysine sidechain nitrogen atom of N440K A and the sidechain carboxylic carbon atom of E329 hACE2 as 5.8Å and 3.6Å for BA.1 and BA.2, respectively.
The longer distance for BA.1 is consistent with the hypothesis that RBD-A of BA.2 is more tightly bound to hACE2 than that of BA.1.

RBD-C of BA.2 forms contacts to hACE2
Based on the much shorter d RBD-C hACE2 for BA.2, we hypothesized that its RBD-C can interact with hACE2, and carried out a similar probability analysis (as done for RBD-A) to see whether any contacts between RBD-C and hACE2 exists. We found that there is practically no close contact from WT (probability of contact totalling < 0.05%), and slightly more for BA.1 (< 1.5%), while substantial amount of interaction is found for BA.2 ( > 99% for residues 445-449 and 498; 30-40% for residues 483-484). The results are shown in the upper row of Figure 4 where the major binding partners on hACE2 are residues 212-215, 564, 568,  (Table S1, Figures 3c, 3d), and therefore contribute additional binding between hACE2 and S-protein.
Multiple-RBD binding on one monomeric hACE2 for the Omicrons was qualitatively inferred in Yin et al. [47] by the observation that a monomeric hACE2 binds to a trimeric Omicron S-protein with 6-fold higher binding affinity than to WT S-protein, and only 2-fold higher affinity when binding to an Omicron monomer S-protein than to WT. As we did not model monomeric hACE2 interacting with a monomer S-protein we cannot make the same comparison here but nevertheless such scenario is consistent with our results.   From the RMSF, we hypothesized that RBD-C's ability to form contacts with h-ACE2 has to do with the increased freedom of RBD-C to restructure due to having less contact with the other RBDs. We hence investigated the degree of interaction between the RBDs and searched for key differences due to the mutations. Figure 6 shows the probability change in the inter-domain RBD-A···RBD-C contact from WT to BA.1/BA.2 (the full contact probabilities are shown in Figure S5). While the overall probability ratio of BA.1 over WT is around one, it is 0.44 for BA.2. Thus, there is a decoupling of the RBD-A and RBD-C in This is due to several differences in RBD-A···RBD-C contact between the varients, over the regions 400-420 C and 440-500 C . The latter is within the RBM C and is also exposed to hACE2. For region 400-420 C there is an enhancement of interactions for BA.1 but loss of interactions for BA.2. In none of our models are there any contacts with hACE2 within this region. Here, BA.1 exhibits an additional contact made with R408 C (and to a lesser extent N501Y C ) which is not present in either BA.2 or WT. Furthermore, for BA.1, R408 B forms contacts with residues 374 to 377 on RBD-C (a 98% presence). These contacts are missing in both BA.2, which has the mutation of R408S, and WT. We note that R408 C is located in the inner cavity formed by the three RBDs. In the case of BA.1, D428 A and R408 C form a salt bridge at the probability of 73%. This contact to D428 A is nearly absent for BA.2 because of the R408S mutation. For WT, due to the different relative position of RBD-C compared to the Omicrons, instead of R408 C , D428 A connects to Y505 C (85%).
For region 440-500 C , i.e., RBM C , there is an overall reduction in RBD-A···RBD-C interactions between Omicrons and WT which corresponds to the increase in interaction RBD-C···hACE2, as seen earlier, suggesting that RBD-C is pulled away from inter-chain S contact towards hACE2. By inspecting the solvent exposed side of RBD-A and RBD-C, we find that a salt bridge by K378 A ···E484 C is the dominant interaction (92%), and a hydrogen bonding pair S375 A ···E484 C (31%) forms at this interface for WT. Mutation E484A results in the loss of this salt bridge (shown in the probability difference in Figure 6) and The last part of the puzzle of how hACE2 interfaces with S-protein in our models lies with the role glycans play in the interaction. Of particular interest areN90 hACE2 andN322 hACE2 near the interface between hACE2 and S-protein. These two glycans of hACE2 are predicted to form interactions with S-protein [26,34]. However, althoughN322 hACE2 is in the vicinity of the RBD, its interaction with a monomeric S-protein was shown to be transient [48].
Moreover, Shajahan et al. [34] also pointed out that site N322 is aligned in human, bat, cat, pig and chicken, but pig and chicken are not susceptible to SARS-CoV-2 binding. In contrast, species that are susceptible have an N90 site while non-susceptible species do not [34], andN90 hACE2 was shown to contact RBD with high probability [48].
Our analysis on the contact frequency betweenN322 hACE2 and RBD-A/RBD-C shows dominant interaction with RBD-C across all variants: 97%, 99%, and 92% for WT, BA.1, and BA.2, respectively. Whereas for RBD-A, contacts are formed at 13%, 0.4%, and 94% of the total occurrences for WT, BA.1, and BA.2, respectively. The only conclusion we can draw here is thatN322 hACE2 has stronger interaction with BA.2. The not so decisive role ofN322 hACE2 is probably due to its location which is outside the RBDs. On the other hand,N90 hACE2 interfaces to RBDs directly. Here we will therefore observe the behaviors of N90 hACE2 , in our models of WT and the Omicrons and discuss its involvement in mediating the interactions between hACE2 and S-protein.  Figure S6). As can be seen from Figure 8N90 hACE2 is most parallel to the S+hACE2 interface when binding to BA.2 (average Θ N90 = 7.5 • ), less so for BA.1 (Θ N90 = 35.6 • ), and more orthogonal for WT (Θ N90 = 100.5 • ). Figure S7 shows the probabilities of RBD residues in contact withN90 ACE2 . RBD-A's of WT and BA.1 have little contact withN90 ACE2 , only N460 has a >50% contact presence for the former, and R408 the latter. N460 sits in the inner cavity, whereas R408 is half way to the solvent-exposed side, so that the main contact region on RBD-A differs between WT and BA. Owing to the few and inner-cavity-only contacts to both RBD-A and RBD-C,N90 hACE2 drops vertically (orthogonal to the S+hACE2 interface) for WT so that the space between hACE2 and S-protein is comparatively large, reflected in its large COM distance d RBD-C hACE2 (Table II), and consistent with experimental observation of Mehdipour and Humme [26].
Whereas for BA.1/BA.2, more so for BA.2, this space is reduced which is consistent with the smaller d RBD-C hACE2 . Here, the long axis ofN90 hACE2 poses parallel to the S-hACE2 interface due to the contacts along the interface to both RBD-A and RBD-C. Therefore, for BA.2, N90 hACE2 no longer acts as a spacer between RBD and hACE2. Compared to BA.2, BA.1 has fewer and less widespread contacts toN90 hACE2 , presumably due to the above-mentioned higher degree of RBD-A···RBD-C interaction resulting in lower RBD-A/RBD-C flexibility of BA.1.

IV. CONCLUSION
In this work, we used MD simulations to investigate the SARS-CoV2 S+hACE2 structures, with S-protein from WT, and the Omicron subvariants BA.1, and BA.2. Starting from the open conformation given by Woo et al. [29], we find two conformations for the interfacing of S-protein and hACE2. A more open contact involving only predominately chain A (seen for WT and being of the same form as the starting structure) and a more compact conformation involving both chains A and C (seen for the Omicrons). The key feature of this interface is the injection of the glycanN90 ACE2 between hACE2 and S-protein and highlights the important interplay between direct binding of S-protein and hACE2 residues and indirect binding via interactions throughN90 ACE2 (i.e., S···N90 hACE2 ···hACE2). The key factors we found that promote this binding mode are as follows: • Change in the S···hACE2 interface: mutations in RBD increase the total positive charge which increases the electrostatic attraction between S-protein and hACE2. Nevertheless, we observed shifts in the binding partners after mutation.
-BA.2 forms a broad contact withN90 hACE2 , spreading from inner cavity towards the solvent-exposed side (Figure 8c), and thereby reducing the interfacial gap and holding hACE2 close to the S-protein to bind.
• One neighbor to the up-RBD, RBD-C, is more flexible for the Omicrons by less coupling to other RBDs, hence free to approach hACE2. This is more obvious for BA.2 than BA.1: -E484A of BA.1 and BA.2: abolishes a salt bridge K378 A ···E484 C at the solvent facing side in WT -R408S of BA.2: abolishes a salt bridge D428 A ···R408 C toward the inner cavity in BA.1, as well as cuts the connection with RBD-B (e.g., R408 B ···F374 C in BA.1), and therefore further reduces the connectivity to RBD-C.
The mutations E484A, R408S, and D405N in our result affect the structure, and the latter two seemingly make BA.2 distinct from others in the study. Although there is a loss of positive charge in R408S which is regained via the mutation D405N and keep the level of electrostatic interaction. These mutations are maintained in the later, and more infectious, subvariants BA.4 and BA.5, suggesting that these are indeed favorable mutations for increasing fitness of the virus. Moreover, the high degree of interaction between glycans on hACE2 and S-protein indicates that it is advantageous to consider such glycan-related epitopes in the therapeutic design. It has been suggested that the increased electrostatic attraction between S-protein and hACE2 may simply enhance the binding. Here we find that the binding can be further strengthened by the binding mode involving multi-RBDs per hACE2 monomer the Omicrons tend to adopt.
The structural reports introduced and investigated here can be more generally used to monitor emerging variants and future structural evolution of the virus and thereby developing a better model of the biophysical basis for evolutionary advantage.