Abstract
In silico predictions combined with in vitro, in vivo and in situ observations collectively suggest that mouse adaptation of the SARS-CoV-2 virus requires an aromatic substitution in position 501 or position 498 (but not both) of the spike protein’s receptor binding domain. This effect could be enhanced by mutations in positions 417, 484 and 493 (especially K417N, E484K, Q493K and Q493R), and to a lesser extent by mutations in positions 486 and 499 (such as F486L and P499T). Such enhancements due to more favourable binding interactions with residues on the complementary angiotensin-converting enzyme 2 (ACE2) interface, are however, unlikely to sustain mouse infectivity on their own based on theoretical and experimental evidence to date. Our current understanding thus points to the Alpha, Beta and Gamma variants of concern infecting mice, while Delta and ‘Delta Plus’ lack a similar biomolecular basis to do so. This paper identifies a list of countries where local field surveillance of mice is encouraged because they may have come in contact with humans who had the virus with adaptive mutation(s). It also provides a systematic methodology to analyze the potential for other animal reservoirs and their likely locations.
Introduction
The ‘novel coronavirus disease’ (COVID-19) has resulted in significant global morbidity and mortality on a scale similar to the influenza pandemic of 1918.1 The ongoing pandemic has been sustained through the activities of human beings, who are the largest reservoir of the causative ‘severe acute respiratory syndrome virus 2’ (SARS-CoV-2). This RNA virus in a new host (human beings) is evolving rapidly, accumulating mutations, and existing as a cloud of variants with quasispecies diversity.2 Last year, the world witnessed the risk of this virus acquiring additional reservoirs (such as minks) and new mutations of consequence (such as ‘Cluster 5’), which could increase transmissibility and lead to a potentially weaker antibody response.3
Both the original form of the virus (known as ‘D614’) and the subsequent more transmissible ‘G614’ variant which has replaced it almost entirely in circulation,4,5 did not infect mice because their ‘angiotensin-converting enzyme 2’ (ACE2) receptor did not bind the viral spike protein effectively to allow entry into cells. Since mouse (Mus musculus) is a popular animal model of infection, the virus had to be adapted through techniques such as sequential passaging in mouse lung tissues and modifying the receptor binding domain (RBD).6–8 Other strategies to infecting mice with the original form of the virus or the G614 included transgenic mice expressing human ACE2 (hACE2), and sensitizing the mouse respiratory tract through transduction with adenovirus or adeno-associated virus expressing hACE2.9–12 Recently, virus ‘variants of concern’ (VOC) originating from Brazil, South Africa and the UK (which contain the common mutation N501Y in the RBD) were shown to infect mice.13,14 This ability of SARS-CoV-2 variants of concern to infect mice is unsettling because of the potential to establish additional reservoirs in a species that is in close contact with people and companion animals, especially as its population is hard to vaccinate or control.
Methodology
In this paper, we have combined our structural predictions from biomolecular modelling with available experimental evidence to understand how specific mutations in VOC and mouse adapted strains, especially in the RBD, have enabled this virus to infect mice. To do this, we compared interactions of these key mutations with the corresponding regions in both mouse ACE2 (mACE2) and hACE2 using molecular models based on crystallographic data of the ACE2/RBD interface (protein database ‘pdb’ entry number 6M17)15. As the improved protein prediction software Alphafold has just been made available, we used these new artificial intelligence methods to further validate our model structures.16 By introducing the ACE2 models and variants into molecular dynamics simulations, similar to our recent work,17,18 we qualitatively identified the key interactions of side chain residues at the ACE2/RBD interface. Further details are provided under Supplementary Methods for Molecular Modelling. We then compared our in-silico findings with experimental results reported by different research groups with mouse adapted strains/isolates6–8,13,21–29 and VOC14,30–34 (Table 1). Taken together, we were able to gain valuable insights into the likely effects of different mutations of consequence.
We then queried the world’s largest public database called ‘GISAID’ (Global Initiative on Sharing All Influenza Data) and looked for these mutations, and equivalent mutations of consequence, in the circa 2.4 million SARS-CoV-2 genome sequences available as of 21 June 2021.29 In brief, sequences were aligned to the back to the SARS-CoV-2 reference (EPI_ISL_402124, denoted as the WT allele) in order to generate a file containing all mutations in the variant call format (VCF), which is a concise way for storing gene sequence variations. We then calculated the ratio of the frequency at which the WT allele versus the mutant allele were observed across a rolling 14-day window. Given the data is discrete, highly variable in size across countries, and contains background noise, such a window of time is essential from our experience to reduce distortions and glean meaningful insights. For example, if 14 WT and 28 mutant sequences were observed within a specified 14-day window, then the frequencies were 1/day and 2/day respectively. The ratio of mutant:WT frequency is therefore 2, indicating the mutant allele is appearing twice as frequently as the WT allele within that period. This was used to create a heatmap for the key mutations, both individually and in combination (as discussed below). For clarity, we only considered countries where the mutant:WT ratio exceeded 1.2 across the entire time the mutant was sampled or within any given 14-day window. To reduce the noise further from low-sampled countries, we also instituted a minimum threshold of 10 WT and 10 mutant samples over at least 14 days to suggest possible spread locally. For this reason, a country where a mutant had only been recorded on a single day, or where only 9 mutant samples were recorded overall, was not included in our analysis. Our heatmap scale has also been truncated at 2.00 for ease of visual comparison.
Results and Discussion
In silico results
In silico comparison of the interface residues of the RBD/ACE2 complex in human and mouse models reveals 30 ACE2 residues in close contact, of which 19 are conserved between mACE2 and hACE2 (approximately 63% identity for the contacting ACE2 residues). The RBD mutations associated with the mouse adaptations listed in Table 1 can be grouped loosely into 3 regions by their positions on the ACE2/RBD interface as follows: Region 1 (RBD positions 498, 499 and 501) are centered around the highly conserved ACE2 residue tyrosine 41 (Y41); Region 2 (RBD positions 417, 493) centered around ACE2 residue 34; and Region 3 (RBD positions 484 and 486) close to a cluster of ACE2 residues 78 to 82. ACE2 residues within 5 Å (chosen to account for molecular fluctuations) of each RBD adaptation are listed in Table 2, with dissimilar human/mouse ACE2 residues highlighted in yellow. Figure 1 further illustrates these residues visually, and how they are relatively positioned in three regions. Figure 2 aligns the human and mouse ACE2 to highlight the key differences at the contact points with the RDB (shown in yellow).
Modelling the N501Y mutation at the ACE2/RBD interface reveals a close interaction with the highly conserved Y41 residue in ACE2, through attractive, non-covalent bonding between aromatic amino acids known as π-stacking interactions. In mACE2, π-stacking can be enhanced by the proximal substitution of histidine (H) in place of lysine (K) which is present at position 353 of hACE2. Our modelling also shows similar π-stacking enhancement with the conserved Y41, through the substitution of RBD glutamine at position 498 (Q498) to either histidine (H) or tyrosine (Y). These aromatic π-stacking interactions appear to be reasonably strong as N501Y and Q498H can each sustain mouse adaptation on its own in experiments.6,13,23,28 Counterintuitively, our modelling predicts that simultaneous aromatic mutations at RBD positions 498 and 501 is detrimental to mouse adaptation due to local π-stacking distortion to the binding interface. This could explain why we have not observed simultaneous aromatic mutations at these positions in over 2.4 million entries on GISAID as of 21 July 2021. We did detect one adapted isolate on GISAID from mouse lung homogenate where N501Y occurs with glutamate 498 to arginine (Q498R),29 which strictly speaking is not aromatic, however arginine is frequently associated with π-stacking interactions.37 Thus, the Q498R mutation is more tolerated with N501Y due to arginine’s inherent conformational flexibility compared to tyrosine or histidine.
In Region 1 (Table 2) we also note the proline 499 to threonine (P499T) substitution in two infectious clones, presumably engineered to enhance Y498. Modelling provides the following insight on this: the change from P to T will relax the backbone constraints of proline and allow conformational rearrangement of threonine to contact the conserved ACE2 residues Y41 and L45. However, we don’t find any strong molecular modelling basis for this mutation to sustain mouse adaptation on its own or evolve naturally alongside adaptive mutations at positions 498 or 501; this is borne out by experimental evidence to date. In other words, in silico predictions combined with in vitro and in vivo evidence collectively suggest that mouse adaptation requires an aromatic substitution in either position 501 or position 498 (but not both); while additional mutations, especially in Region 2 and Region 3 of the RBD as summarized below, enhancing ACE2 binding interactions and specificity in mice. These predictions are further supported by AlphaFold that assigns very high confidence scores (>93 in a scale of 0-100) for the structural predictions involving these key mutations, demonstrating the applications of this improved protein prediction software especially during pandemics such as COVID-19 and future ‘Disease-X’.
From Table 1, we see that mouse adapted strains sometimes carry the Q493K/R mutation (polar glutamine to basic lysine or arginine). Modelling predicts this as being enabled through favourable salt-bridge interactions with both glutamic acid 35 (E35) or aspartic acid 38 (D38) both of which are conserved in hACE2 as well as mACE2 (c.f. Table 2, Region 2). The K417N substitution (lysine to asparagine), which is another experimental observation from Table 1, is also predicted by modelling to be advantageous for mouse adaptation due to favourable amide hydrogen bond interactions with interfacial mACE2 residues asparagine 31 (N31) and glutamine 34 (Q34); such amide hydrogen bonding is not possible in hACE2 as it has non-amide lysine (K) and histidine (H) residues. With the Gamma variant of concern, it is unclear whether the K417T enhances the role of N501Y in mouse adaptation in a similar manner. The ‘Delta Plus’ variant of concern has the K417N mutation, but there is no molecular modelling basis to believe that it can infect mice without an aromatic change in position 498 or 501 as described above. It would be worthwhile to further investigate any interfering role of glycosylation at this interface region, because hACE2 contains N-linked glycosylation at asparagine 90 (N90) whereas mACE2 does not (its analogous residue is threonine T90 according to Uniprot references Q8R0I0 and Q9BYF1).
In Region 3, the K484 residue is not positioned directly at the interface and not observed to interact strongly with any ACE2 residues; however, our model shows occasional salt bridges can be formed with relatively close glutamic acid residues in positions 35 and 75 that are conserved in both hACE2 and mACE2. Our simulations show that the distance between K484 and these glutamic acid residues fluctuate dynamically from 3 to 20 Å, with salt bridges more likely when distances are around 3 Å. Thus, the E484K, which is present in the Beta and Gamma variants of concern, and more recently in some Alpha isolates as well, is likely to have an enhancing role through transient salt bridges. The same cannot be said about E484Q seen in the Delta variant of concern because salt bridge formation is unlikely with the polar glutamine (Q) residue. With no accompanying aromatic change in positions 498 or 501, we believe that the E484Q in Delta, and additionally the K417N in ‘Delta Plus’, cannot sustain mouse infectivity on their own based on current biomolecular understanding. The residues 75 to 82 in mACE2 are significantly different from hACE2 (Table 2), therefore any mutation in the corresponding RBD interface is worth investigation. We could find one from experimental observations, the engineered substitution F486L,27 and consider it to have at best an enhancing role. As it was observed simultaneously with Q498Y (which is likely to sustain mouse infectivity on its own), the contribution of F486L to the overall mouse adaptation remains to be ascertained.
Comparison with in vitro, in vivo and in situ observations
Early in silico predictions based on comparative structural analysis of ACE2 suggested that mouse has a very low probability of being infected.35,36 Although correct about mouse, those analyses also made inconsistent and erroneous predictions that ferrets wouldn’t be susceptible, pigs would be susceptible, etc., thus exposing the need for experimental inputs into the model. Therefore, this paper takes into account a range of experimental observations to cross-check our in silico predictions through biomolecular modelling. Wan et al. reasoned that “mouse or rat ACE2 contains a histidine at the 353 position which does not fit into the virus-receptor interaction as well as a lysine does”. 35 While this is true of the original Wuhan strain containing asparagine 501 (N501) in the RBD, our modelling indicates why the tyrosine 501 mutation enables mouse infectivity, even on its own. In hACE2, lysine 353 (K353) creates a salt bridge with conserved aspartic acid 38 (D38). In mACE2, lysine 353 is replaced by the aromatic histidine (H353) to complete the salt bridge, as well as contribute to π-stacking with Y501 variant. The N501Y mutation will also lead to favorable π-stacking with the highly conserved tyrosine 41 (Y41) residue in mammalian ACE2, as suggested by Starr et al.38 with deep scanning of RBD mutations and hACE2 affinity assays. These authors highlighted enhanced affinity of F501 (as it had the highest score), followed by Y501, V501, W501 and T501, in that order. But Y501 and T501 require only a single nucleotide change and have been observed more frequently in situ (Figures 3a,3b) – compared to F501, V501 and W501 which require 2, 2 and 3 nucleotide changes respectively. It is unsurprising that the latter variants requiring two or more changes were rarely observed in situ regardless of their high in vitro affinity scores from Starr et al.38
From the above and Table 3, we see that in silico analysis can provide valuable insights to interpret and bridge in vitro, in vivo and in situ observations on the RBD position 501. A similar analysis is possible with the alternative essential mutation for mouse adaptation at RBD position 498, where the in vitro affinity enhancement order is H498, Y498, F498 and W498 according to Starr et al.38 Of these, H498 (on its own) and Y498 (with enhancing RBD mutations) have been shown to result in mouse adaptation in vivo (Table 1);7,8,23–28 Q498R was also reported once, unusually in combination with N501Y, isolated from mouse lung after 30 passages. However, in situ observations of these variants have been limited to R498 (15 occurrences) and H498 (3 occurrences) so far. It is clear from in vitro, in vivo and in situ analyses (Table 3) that H498, R498 and Y498 are possible but not yet common. This is consistent with our in silico predictions because H498 and Y498 are aromatic (enabling π-stacking with ACE2 Y41; similar to Y501), while R498 has conformational flexibility and can still be associated with π-stacking interactions.37 H498 and R498 observed in situ require a single nucleotide change from Q498, while Y498 requires two nucleotide changes (or one change from H498).
Similar insights are also possible for the enhancing RBD mutations (c.f. Table 1 and Table 3). From in vivo and in situ observations, we see that K417M is less common than K417N or K417T (Figure 4a,4b), although in vitro studies did not predict any enhancement.38 In silico predictions show that all these require a single nucleotide change, but that N417 (and Q417) would benefit from amide hydrogen bonding. E484K is present in Beta and Gamma, and increasingly in Alpha VOC, while E484Q is present in the Kappa variant of interest that is related to the Delta VOC. In comparison to these two substitutions and notwithstanding higher in vitro affinity scores, R484 and T484 are infrequently observed in situ, which is consistent with our in silico predictions because they each require 2 nucleotide changes from E484, or one change from K484. The F486L and P499T were originally engineered in vivo, but have had sporadic in situ presence in human populations. In silico predictions suggest that the F486L mutation (accessible by three possible ways of a single nucleotide change) can aid mACE2 adaptation, due to the human-mouse differences in ACE2 at the 78-82 region; the P499T is also a single nucleotide change (but only one way from P to T) and predicted to be rare in comparison. Finally, in silico predictions for Q493 substituted by K, L or R (each a single nucleotide change) are borne out in vivo and in situ, although their affinity scores from in vitro experiments are low. The affinity scores from Starr et al.38 were developed for hACE2 (not mACE2), so we expected a greater correlation than what has been observed in situ in human populations, but perhaps it is still early in the pandemic to assess this definitively.
Some mutations in the essential as well as enhancing positions can lead to other mutations. For example, N501Y, the key mutation common to the Alpha, Beta, and Gamma variants of concern, can lead to F501 with a further nucleotide change. Similarly, the enhancing E484K mutation can also lead to R484 or T484 with a further nucleotide change. We examined whether in situ observations are consistent or contrary to our in silico predictions. Indeed, F501 was observed once in Germany (25 May 2021) and Mexico (22 June 2021), while the Y501 has been observed in these countries since 22 October 2020 and 31 January 2021 respectively. UK reported E484R in August 2020, followed by USA, Angola, Brazil and South Korea in February, April, May and June 2021 respectively. In each case, the E484K was detected prior to E484R – the former mutation circulating in UK, USA, Angola, Brazil and South Korea from April, March, August, April and December 2020 respectively. Similarly, E484T was only recently detected in the USA in June 2021, 15 months after the first report of E484K in that country. All eight instances are thus consistent with our prediction – whether this link is causal or a coincidence is worthy of investigation with local epidemiological data. While bioinformatics tools can provide useful insights, out of 78 COVID-positive cases only one sample is on average sent for virus genome sequencing, with huge variations across time and locations, and lots of missing meta-data.39,40 This means we are more confident about ruling in (e.g. when a variant has been detected in a location) than ruling out the possibility of a mutation circulating purely based on in silico data, even if the latter is statistically large (2.4 million as of 21 June 2021).
Our analysis is not just of theoretical interest; it has huge practical applications because mice can be kept as pets, or come into contact with other pets like cats which are known to be susceptible. Also, mouse plague can occur in area of COVID-19 outbreaks or endemicity, as is currently the case in New South Wales and adjacent states of Australia.41 In order to help public health and animal health professionals, Figure 5a,5b shows the countries where key essential and enhancing mutations listed in Tables 1 and 3 (viz. N501Y, E484K and K417N/T) have co-occurred. Supplementary Table S1 gives the raw data, down to regional counts for these combinations. We believe that this information will help locate areas at risk for appropriate mitigation measures.
Conclusion and further analyses
Assessing the risk of viruses adapting to new hosts requires careful interpretation of all available data from in silico, in vivo, in vitro and in situ sources. Understanding host adaptation at a molecular level, via modelling helps reconcile seemingly conflicting, experimental, and clinical observations while a pandemic is still in progress. We have demonstrated this with the SARS-CoV-2 virus adapting to mice. Our conclusions come with humility, as they are based on best available evidence up to this point, but allowing us and others to refine when more evidence becomes available. Armed with our collective understanding from different approaches, and bolstered by bioinformatics and emerging artificial intelligence technologies such as AlphaFold, we have shown how to position ourselves better to predict and mitigate virus host adaptations, not just for this pandemic but also for future Disease-X. Further analyses pertaining to COVID-19 should focus on the role of mutations beyond the spike and RBD; and explore other hosts like rats and other potential reservoir species (even those that previously exhibited low receptor activities35,36,42) which will be hard to vaccinate or control.
Supplementary Methods for Molecular Modelling
Molecular simulations were performed using NAMD2.1443 with CHARM36m44 forcefield employing a ‘TIP3’ water model. The SARS-CoV-2 spike/ACE2 model was a homology model based on the pdb structure 6M17.15 Models consisted of the SARS-CoV-2 spike RBD domain (residues 330 to 530), while the truncated mACE2 protein consisted of residues 19 to 600 built using Swiss Modeller.45 Glycosylation of the spike and mACE2 protein was manually constructed using Visual Molecular Dynamics (VMD). Simulations were run with Periodic Boundary Conditions ‘PBCs’ using the ‘NPT’ isothermal-isobaric ensemble at 310K and 1 bar pressure employing Langevin dynamics. The PBCs were constant in the XY dimensions. Long-range Coulomb forces were computed with the Particle Mesh Ewald method with a grid spacing of 1 Å. 2 fs timesteps were used with non-bonded interactions calculated every 2 fs and full electrostatics every 4 fs while hydrogens were constrained with the ‘SHAKE’ algorithm. The cut-off distance was 12 Å with a switching distance of 10 Å and a pair-list distance of 14 Å. Pressure was controlled to 1 atmosphere using the Nosé-Hoover Langevin piston method employing a piston period of 100 fs and a piston decay of 50 fs. Trajectory frames were captured every 100 ps. Variants were built from the main RBD/mouse ACE2 model by mutating residues to cover the main mouse adaptation mutations listed in Tables 1 and 3. Models were simulated for 300 nanoseconds. Trajectories were visualized with VMD and Nanome.
Author contributions
Conceptualization, methodology, and funding acquisition, S.S.V.; in silico analysis, M.J.K.; in vitro analysis, M.J.K. and S.S.V.; in vivo analysis, S.M. and S.S.V.; in situ analysis, L.O.W.W. and D.R.; writing – original draft preparation, S.S.V, M.J.K. and S.M.; writing – review and editing, all authors.
Acknowledgements
This work was supported by funding from the Australian government’s Department of Finance, National Health and Medical Research Council, and the CSIRO Future Science Platforms (Principal Investigator: S.S.V.). We are grateful for support from our colleagues at the Australian Centre for Disease Preparedness (https://www.grid.ac/institutes/grid.413322.5) (especially Simran Chahal, Trevor Drew and Alexander McAuley) and the Transformational Bioinformatics Group (especially Denis Bauer, Yatish Jain, Brendan Hosking and Aidan Tay). The title is from the poem ‘To a Mouse: On Turning her up in her Nest, with the Plough, November 1785’ by Scotland’s national poet Robert Burns, in which he says that the mouse is not alone in proving foresight may be vain as the best-laid schemes of mice and men go oft awry (But, Mousie, thou art no thy-lane, In proving foresight may be vain: The best-laid schemes o’ Mice an’ Men Gang aft agley).