Missense variants in ACE2 are predicted to encourage and inhibit interaction with SARS-CoV-2 Spike and contribute to genetic risk in COVID-19

SARS-CoV-2 invades host cells via an endocytic pathway that begins with the interaction of the SARS-CoV-2 Spike glycoprotein (S-protein) and human Angiotensin-converting enzyme 2 (ACE2). Genetic variability in ACE2 may be one factor that mediates the broad-spectrum severity of SARS-CoV-2 infection and COVID-19 outcomes. We investigated the capacity of ACE2 variation to influence SARS-CoV-2 infection with a focus on predicting the effect of missense variants on the ACE2 SARS-CoV-2 S-protein interaction. We validated the mCSM-PPI2 variant effect prediction algorithm with 26 published ACE2 mutant SARS-CoV S-protein binding assays and found it performed well in this closely related system (True Positive Rate = 0.7, True Negative Rate = 1). Application of mCSM-PPI2 to ACE2 missense variants from the Genome Aggregation Consortium Database (gnomAD) identified three that are predicted to strongly inhibit or abolish the S-protein ACE2 interaction altogether (p.Glu37Lys, p.Gly352Val and p.Asp355Asn) and one that is predicted to promote the interaction (p.Gly326Glu). The S-protein ACE2 inhibitory variants are expected to confer a high degree of resistance to SARS-CoV-2 infection whilst the S-protein ACE2 affinity enhancing variant may lead to additional susceptibility and severity. We also performed in silico saturation mutagenesis of the S-protein ACE2 interface and identified a further 38 potential missense mutations that could strongly inhibit binding and one more that is likely to enhance binding (Thr27Arg). A conservative estimate places the prevalence of the strongly protective variants between 12-70 per 100,000 population but there is the possibility of higher prevalence in local populations or those underrepresented in gnomAD. The probable interplay between these ACE2 affinity variants and ACE2 expression polymorphisms is highlighted as well as gender differences in penetrance arising from ACE2’s situation on the X-chromosome. It is also described how our data can help power future genetic association studies of COVID-19 phenotypes and how the saturation mutant predictions can help design a mutant ACE2 with tailored S-protein affinity, which may be an improvement over a current recombinant ACE2 that is undergoing clinical trial. Key results 1 ACE2 gnomAD missense variant (p.Gly326Glu) and one unobserved missense mutation (Thr27Arg) are predicted to enhance ACE2 binding with SARS-CoV-2 Spike protein, which could result in increased susceptibility and severity of COVID-19 3 ACE2 missense variants in gnomAD plus another 38 unobserved missense mutations are predicted to inhibit Spike binding, these are expected to confer a high degree of resistance to infection The prevalence of the strongly protective variants is estimated between 12-70 per 100,000 population but higher prevalence may exist in local populations or those underrepresented in gnomAD A strategy to design a recombinant ACE2 with tailored affinity towards Spike and its potential therapeutic value is presented The predictions were extensively validated against published ACE2 mutant binding assays for SARS-CoV Spike protein

high degree of resistance to SARS-CoV-2 infection whilst the S-protein ACE2 affinity enhancing variant may lead to additional susceptibility and severity. We also performed in silico saturation mutagenesis of the S-protein ACE2 interface and identified a further 38 potential missense mutations that could strongly inhibit binding and one more that is likely to enhance binding (Thr27Arg). A conservative estimate places the prevalence of the strongly protective variants between 12-70 per 100,000 population but there is the possibility of higher prevalence in local populations or those underrepresented in gnomAD. The probable interplay between these ACE2 affinity variants and ACE2 expression polymorphisms is highlighted as well as gender differences in penetrance arising from ACE2's situation on the X-chromosome. It is also described how our data can help power future genetic association studies of COVID-19 phenotypes and how the saturation mutant predictions can help design a mutant ACE2 with tailored S-protein affinity, which may be an improvement over a current recombinant ACE2 that is undergoing clinical trial.

Introduction
The COVID-19 pandemic is one of the greatest global health challenges of modern times. Although the disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is usually cleared following mild symptoms, it can progress to serious disease and death 1 . Besides the clear risks associated with age and comorbidities 2,3 , there could be a genetic component that predisposes some individuals to worse outcomes 4 . Identifying genetic factors of COVID-19 susceptibility has implications for clinical decision making and population specific epidemic dynamics. Genetic variation may constitute hidden risk factors and, in some cases, explain why otherwise healthy individuals in low-risk groups experience severe disease. The identification of specific genetic variants that influence the severity and progression of COVID-19 presents the opportunity for predictive diagnostics, early intervention and personalised treatments. The distribution of any genetic risk factors between populations could give rise to differences in population specific risk levels.
In the absence of GWAS hits for COVID-19 phenotypes, predictions of the effect that variation might have on the disease are essential. The emerging understanding of SARS-CoV-2 infection highlights proteins in which genetic variation could give rise to relevant phenotypic responses. Databases of human population variation provide a sample of variation in these proteins, to which we can apply state-of-the-art methods and workflows for predicting variant effects. A good candidate for variant analysis is a protein with a defined role in COVID-19 pathogenesis, with potentially functional variants reported in public databases and that has structural and functional data that can be used to interpret the potential effects of these variants.
Human angiotensin-converting enzyme 2 (ACE2) meets these criteria. ACE2 is the host cell receptor that SARS-CoV-2 exploits to infect human cells 1,5 . As this is the same receptor exploited by the SARS coronavirus (SARS-CoV) that caused the SARS outbreak in 2002, the detailed body of knowledge built around SARS-CoV infection is relevant to understanding SARS-CoV-2 1,5,6 . The spike glycoprotein (S) is the coronavirus entity that recognises and binds host ACE2. Both SARS coronavirus S-proteins are composed of an S1 domain that contains ACE2 recognition elements and an S2 domain that is responsible for membrane fusion 5 . The protease TMPRSS2 is the host factor that cleaves S into its S1 and S2 subdomains 5 , in SARS-CoV cleavage is thought to be promoted upon formation of the ACE2-S complex 7 . The S1 receptor binding domains (RBDs) from both SARS-CoV 8 and SARS-CoV-2 9 have been co-crystallised with human ACE2. The RBDs from both viruses are similar in overall architecture and both interface with roughly the same surface on ACE2. Differences are apparent in the so-called receptor binding motif (RBM) of S, which is the region of the RBD responsible for host range and specificity of coronaviruses [8][9][10] . The binding affinity of S and ACE2 is known explicitly to be correlated to the infectivity of SARS-CoV and is determined by the complementarity of the interfaces 8,10 .
In this paper we investigate the capacity of ACE2 variation to influence SARS-CoV-2 infection with a focus on predicting the effect of missense variants on the ACE2 SARS-CoV-2 S-protein interaction. The paper is organised as follows: In §2 we present an analysis of the effects of ACE2 missense variants on S-protein binding. We begin with an overview of missense variation at the ACE2-S interface from the Genome Aggregation Consortium Database (gnomAD 11 ) and describe how these variants could modify binding with reference to structural and mutagenesis studies ( §2.1). We then apply the mCSM-PPI2 12 mutation effect predictor, which we have extensively validated against published ACE2 mutant SARS-CoV Sprotein affinity assays 10 (see Methods), to predict the effect of ACE2 variants from gnomAD ( §2.2) and all possible missense mutations at ACE2-S binding residues ( §2.3) on SARS-CoV-2 S-protein binding, and rationalise key results by a structural evaluation of the ACE2 variant models. In §3 we describe how other coding sequence variants from gnomAD 11 might effect SARS-CoV-2 infection through mechanisms other than modulation of ACE2-S affinity and identify variants that may be of significance. In §4 we describe how this work could be applied to the development of an enhanced recombinant ACE2 therapy to treat COVID-19. Finally, in §5 we discuss the potential impact of ACE2 variants predicted to significantly affect SARS-CoV-2 infection in individuals and populations ( §5.1), outline an application of our work to genetic association analyses ( §5.2), address conflicting reports from similar studies and assess the confidence in our predictions ( §5. 3), and summarise our main results and conclusions ( §5.4).
2 The effect of ACE2 missense variation on SARS-CoV-2 S binding Figure 1 illustrates the distribution of gnomAD 11 missense and nonsense variants in the sequence and structure 9 of ACE2. In total, there are 338 ACE2 coding variants in gnomAD. The majority of these are missense variants (241, including 12 in splice regions) and synonymous variants (90, including 2 in splice regions). Synonymous variants do not alter the protein sequence and are therefore unlikely to have a functional impact. In general, the impact of missense variants is dependent on the structural and functional context of the mutated residue, as well as the physicochemical characteristics of the mutant residue. Missense variants are found in all the major functional domains of ACE2, including the peptidase and collectrin domains, the transmembrane region and other functionally annotated sites (Figure 1). Nonsense variants including stops and frameshifts are also reported in gnomAD for ACE2. These can have a drastic effect on protein structure and expression levels, but their effect is less dependent on the residue context than missense variants. However, there is linear positional dependence since truncations close to the Cterminus are more likely to be tolerated.

Missense variants and the ACE2-S interface
The simplest way that missense variation could impact SARS-CoV-2 infection would be by altering the ACE2-S interface. ACE2 missense variants located at residues that bind the Sprotein are most likely to have such effects. Figure 2 illustrates the ACE2-S interface and highlights residues with reported missense variants in gnomAD whilst Table 1 lists these variants and their population distribution alongside key structural features of the site. Seven missense variants occur at residues we classified as directly interacting with SARS-CoV-2 S (see Methods). All these variants would be considered rare or extremely rare and are mostly private to a single gnomAD population. There are no homozygous females, but a few males are hemizygous as indicated. p.Ser19Pro is the most frequently observed ACE2-S interface variant in gnomAD. Mutations to or from Pro can have severe consequences for protein structure but as Ser19 is the first observed ACE2 residue in the structure it is difficult to predict the overall conformational consequence. This variant may be slightly deleterious to normal ACE2 function given the apparent gender imbalance in its prevalence, but the significance of this observation with respect to ACE2 activity towards SARS-CoV-2 infection is unclear. In the context of SARS-CoV-2 infection, the loss of the H-bond donor Ser, which interacts with a main chain carbonyl on S, could weaken the ACE2-S complex. The next most common missense variant at the interface is p.Glu37Lys. This variant occurs in an interface core binding site (see Methods), which have been shown to be less tolerant of mutation with respect to human pathogenic variants due to disruption of protein-protein interactions 17 . Charge inversion mutations have the capacity to be highly disruptive. Glu37 does interact with positively charged residues, but these are intraprotein interactions with ACE2 Arg393 and Lys353. Nevertheless, the excess positive charge in the variant may disfavour binding and the Lys will be unable to act as donor in the H-bond with SARS-CoV-2 S Tyr505 suggesting that this variant would be destabilising. Similar considerations hold for p.Glu35Lys. Glu35 engages with S via an H-bond to Gln493, which will be disrupted by the variant, but the effect of the charge inversion will be reduced compared to the previous variant because the neighbourhood of Glu35 includes both positive (Lys31) and negative (Glu38) charges. The site of p.Glu329Gly is close to the engineered S Arg439. Although it is not formally engaged in H-bonding or an ionic interaction by ARPEGGIO's 18 definitions, the residues are likely to interact electrostatically and has been described as an N…O bridge 9 . We would therefore expect ACE2 p.Glu329Gly to have reduced affinity with the chimeric receptor in PDB 6vw1. The wild-type Asn439 will likely be too short to interact with Glu329 and the loss of charge complementarity would probably lead to the same mutation having no effect with the native SARS-CoV-2 S (see Methods for further details).  The effects of ACE2 residue substitutions on coronavirus S-protein affinity have been directly observed 10 . Extensive ACE2 mutagenesis provided detailed insight into the ACE2-SARS-CoV S complex and can give some insight into the effects of ACE2 population variation in SARS-CoV-2 S binding. Table 2 lists gnomAD missense variants seen at the site of residues that were previously mutagenized 10 . Half of the ACE2 mutants listed resulted in inhibition of S-protein binding but only two mutants correspond exactly to the gnomAD variant and of these only p.Arg559Ser showed an effect on SARS-CoV S-protein binding. The other mutant also seen in gnomAD, p.His239Gln, had no reported effect on the interaction. The variant p.Lys68Glu shares physicochemical properties with the mutant Lys68Asp, which slightly inhibited interaction with SARS-CoV, and the loss of Pro389 that results from the variant p.Pro389His also had a slight reduction in binding from a Pro389Ala mutant. The only variant that is situated at an ACE2-S interacting residue and occurs at a site with a reported single-point mutant is p.Asp355Asn. The mutant Asp355Ala was reported to reduce ACE2 binding to SARS-CoV S-protein to about 20 % of the wild-type 10 . This indicates that the site is important for maintaining the interaction between the proteins. There are a few caveats to the preceding analyses. The first is that our structural consideration of variants located at the interface is qualitative. For the variants that we think would be destabilising, we are unable to say how destabilising they might be and would have difficulty even ranking them. The second is that we have not ascertained how far the assay results for SARS-CoV S are applicable to SARS-CoV-2 S. Structural comparison of the two complexes could help to address this second caveat since ACE2 sites that interact with S-protein sites that are conserved between SARS-CoV and SARS-CoV-2 are more likely to respond in the same way to a mutation than where SARS-CoV-2 has a different residue. A third flaw is that we have not considered all the reported missense variants. Although variants located at the interface are most likely to have an effect, residues in proximity to the interface, but not directly a part of it, could also have an effect. Second and third degree contacts can have an effect too and even longer range substitution induced conformational effects cannot be discounted completely (although it has been shown that SARS-CoV S binding is invariant to ACE2 catalytic conformational changes 8,10 ). In §2.2 we address these caveats by employing mCSM-PPI2 12 to predict changes in the ACE2-S binding energy as a result of the gnomAD variants. We integrate the experimental mutagenesis data by validating mCSM-PPI2 in this system and calibrating its predictions to the experimental observations.

gnomAD variants in ACE2 can significantly modify its affinity with SARS-CoV-2 S
Research into the impact of missense variants at protein-protein interaction interfaces has led to methods that provide quantitative predictions of the effect of mutations on interface characteristics 12 . These predictions are an invaluable addition to insights gained from inspection of the site and its neighbourhood in the 3D structure. They can also be used to screen large numbers of variants that would be prohibitively time-consuming to process through modelling and manual inspection or experimentally. In this section, we predict the effects of ACE2 missense variation on S-protein binding by the application of the mCSM-PPI2 12 algorithm, which we extensively validated against experimental binding data for 26 ACE2 mutants in complex with the closely related SARS-CoV S (see Methods), and critically evaluate the key predictions by structural inspection. Table 3 contains mCSM-PPI2 predictions for nine ACE2 gnomAD missense variants located at or close to the S-protein interface. On the basis of our mCSM-PPI2 calibration, predicted ΔΔG > +1.0 kcal/mol is a confident indication that a variant will enhance ACE2-S binding whilst ΔΔG < −1.0 kcal/mol indicates a mutation that strongly or totally abolishes the ACE2-S interaction (see Methods). One variant is predicted to enhance binding (p.Gly326Glu) whilst three variants are predicted to strongly reduce S-protein binding. The remaining five interface variants are predicted to reduce ΔG over the range −0.2 kcal/mol to −0.6 kcal/mol, but as they do not exceed the calibrated threshold, we consider these variants as of uncertain significance with respect to ACE2-S binding. 2.2.1 p.Gly326Glu is predicted to enhance ACE2-S interaction p.Gly326Glu is predicted to increase the affinity of ACE2 for SARS-CoV-2 S by 1.0 kcal/mol. Figure 3 compares the ACE2-S interface in the vicinity of Gly326 with the mCSM-PPI2 12 modelled p.Gly326Glu. Wild type Gly326 does not form any interactions with SARS-CoV-2 S, the closest approach between the proteins is 5.5 Å between ACE2 Gly326 Ca and the mainchain carbonyl of SARS-CoV-2 Thr500. The mutant p.Gly326Glu fills the void between the proteins and forms multiple H-bonding, polar and ionic interactions with the SARS-CoV-2 S backbone and residues Asn506 and Arg439. These new interactions are strongly suggestive that the mutant would lead to enhanced binding. However, the relevance of the interaction with Arg439 is uncertain because this residue was the result of a mutation (Asn439Arg) from the SARS-CoV-2 S reference (GenBank MN908947.1), engineered to enhance binding through interaction with ACE2 Glu329 9 . Modelling p.Gly326Glu in a structure with Asn at position 439 (6vw1-Asn439, Figure 3C, see Methods) yielded an identical predicted ΔΔG (i.e., 1.0 kcal/mol). In 6vw1-Asn439, the lost ionic interaction with the mutant Arg is compensated by an H-bond to the main chain amine of S Val503, which is accessible to a different rotamer conformation of the mutant Glu residue, whilst retaining the H-bond with S Asn506. As well as explaining why p.Gly326Glu is still predicted to enhance binding in the SARS-CoV-2 S model with Asn439, this highlights the sort of interplay that could occur between ACE2 variants and different SARS-CoV-2 strains. In this example of an engineered S-protein, the predicted effect on ΔΔG is the same for the ACE2 variant with both S-proteins but other SARS-CoV-2 S variants could interact differently with ACE2 variation. 2.2.2 p.Asp355Asn, p.Glu37Lys and p.Gly352Val are predicted to inhibit ACE2-S binding gnomAD ACE2 variants p.Asp355Asn, p.Glu37Lys and p.Gly352Val have mCSM-PPI2 ΔΔG < −1 kcal/mol and we therefore predict that they will significantly inhibit SARS-CoV-2 S binding or abolish it altogether. Figure 4 illustrates the structural differences between the wild-type and mutant structures of these mutants. p.Asp355Asn has the largest predicted negative ΔΔG of these mutations. Wild-type Asp355 is held in place by an intraprotein ionic interaction with Arg357. The loss of this charge complementarity in the p.Asp355Asn results in a different orientation for mutant Asn that introduces a number of clashes between this residue and others nearby, including the carbonyl oxygen of Thr500 of S. These clashes might be better accommodated in free ACE2 resulting in reduced affinity for the S-protein.
In p.Glu37Lys, a favourable interprotein H-bond between wild-type Glu37 and RBD Tyr505 is lost alongside several intraprotein polar and ionic interactions. The new hydrophobic contact between the mutant Lys37 and the same Tyr residue is likely insufficient to compensate. p.Gly352Val is the farthest variant from the interface (5.4 Å) that is predicted to decrease S-protein binding and does not interact directly with any SARS-CoV-2 S residue. The effect on the interface is through a fourth-degree interaction that terminates with RBD Tyr505 (i.e., Gly352(C=O)-Tyr41-Lys353-RBD Tyr505). The introduction of Val into a position occupied by the buried Gly leads to several clashes. A few favourable internal hydrophobic contacts are also formed, including with Tyr41. The rearrangements necessary to accommodate Val might affect the bridged interaction between Tyr41 and RBD Tyr505 and the increased hydrophobic packing coupled with the loss of Gly could affect reduced flexibility. These considerations support the mCSM-PPI2 prediction. Mutation of residues on or near the interface are most likely to impact the interaction affinity but distant mutations can in principle affect interfaces through longer-range conformational effects. Figure 5 shows the distribution of predicted ΔΔG for 167 gnomAD ACE2 missense variants in residues resolved in PDB 6vw1. Whilst predicted ΔΔGs are biased toward a slight destabilising effect overall, the majority of variants have predicted ΔΔGs between −0.8 kcal/mol and 0.5 kcal/mol, suggesting that most have only a slight effect if any on ACE2-S binding. This is expected since most variants are away from the ACE2-S interface ( Figure 5C) and distant conformational effects would be relatively rare anyway. In this case, no additional variants are found with ΔΔG < −1.0 kcal/mol, which is consistent with experimental evidence that ACE2-S binding is agnostic towards ACE2 conformational states 10 . These data also show that variants with predicted ΔΔG in excess of ±1 kcal/mol outlie the bulk of predictions, providing further empirical support for these thresholds. There is a second outlier with relatively high positive ΔΔG that is highlighted by the probability plot ( Figure 5B). This is p.Val447Phe with predicted ΔΔG = +0.9 kcal/mol, below the strict 1 kcal/mol threshold but well in excess of the relaxed threshold that we considered may be useful for potential ACE2-S affinity enhancing variants (see Methods). If this variant did lead to enhanced ACE2-S interaction, it could contribute significantly to the risk burden of ACE2 owing to its prevalence. In gnomAD this variant is observed in 13 individuals across three populations. This result is also unusual because the variant site is nearly 40Å from the ACE2-S interface. Figure 6 shows the local environment of Val447Phe.
The introduction of the larger Phe residue leads to a few clashes with neighbouring Ile233, Ile236 and Leu584. However, we were unable to discern a mechanism through which this mutant could modify ACE2-S binding as the mCSM-PPI2 mutant model showed no changes at the interface compared to wild-type. However, mCSM-PPI2 may recognise such motion in its internal model's "atomic fluctuation" term 12 . However, in all the mutant models provided by mCSM-PPI2 we have evaluated, none display any relaxation of the structure, local or otherwise, in response to the mutant so we are unable to consider this factor in our critical evaluation. Due to the lack of structural corroboration of the mCSM-PPI2 predicted ΔΔG, we consider this variant to be of uncertain significance but worthy of follow up in cohort and experimental studies.

Exhaustive search for protective and risk factor ACE2 mutations
Existing human variation datasets are well-powered to detect common variation (1KG was estimated to detect >99% SNPs with MAF >1% 21 and gnomAD is substantially larger) but they are far from comprehensive with respect to rare variation 11 . Rare variants with a significant effect on ACE2-S binding could have significant implications for the epidemiology of COVID-19 in addition to the consequences for affected individuals. If there were 10 such variants with an allele frequency of 1 in 50,000, their collective occurrence might be as high as 1 in 5,000 (discounting linkage) and when this is considered alongside the possibility that a high proportion of the global population will be exposed to SARS-CoV-2 it becomes clear that such effects should be investigated. In other words, the genetic burden of rare variants cannot be ignored. Figure 7 illustrates the distribution of ΔΔG predictions by mCSM-PPI2 from in silico saturation mutagenesis of the ACE2-S interface. Most of the 475 possible ACE2 mutations in residues at the interface with S are predicted to lead to a slight reduction in binding but there is a secondary mode < −1 kcal/mol containing 125 mutations that are predicted to lead to strongly reduced ACE2-S binding ( Figure 7A). This constitutes a positive prediction rate of over 25% for inhibitory variants, a figure that is substantially higher than for the gnomAD variants (≈2%; §2.2) and the validation mutants (14%; Methods) due to the restriction of the analysis to sites at the ACE2-S interface where mutations are far more likely to effect the ACE2-S interaction. In contrast, only six mutations are predicted to result in enhanced binding at the strict ΔΔG threshold of > 1.0 kcal/mol for ACE2-S affinity enhancing variants and even at the relaxed threshold of > 0.5 kcal/mol only 14 mutations are highlighted. At face value, these results suggest that there is a good chance that individuals with ACE2 missense variants at these loci would have an ACE2 variant with reduced ACE2-S binding whilst ACE2 variants with enhanced affinity for S would be far less frequent. Of the 475 possible mutations, the subset of 151 that are accessible via a single base change from the ACE2 coding sequence are more likely to be observed in the human population. Figure 7B shows the distribution of predicted ΔΔG for this subset. In this subset, there are 38 substitutions distributed over 12 sites with ΔΔG < −1.0 kcal/mol and a single substitution with ΔΔG > +1.0 kcal/mol is observed (Thr27Arg). Other factors can be considered too. For instance, any methylated CpG dinucleotides in the coding sequence would be particularly variable. This effect is observed at the protein level in the high mutability of Arg residues 22 . In ACE2, the only interface residue with a CG dinucleotide is Asp38. The residue is coded by GAC and the 3 rd base is part of a CG dinucleotide. Here the CG to TG mutation leads to a synonymous change but if we consider that this mutation might recur it is worthwhile assessing accessible missense changes from this codon, in this case, GAT to GGT (Gly) or GCT (Ala) are substitutions with ΔΔG < −1 kcal/mol. This approach could be applied to include other common variant codons such as those from high frequency synonymous variants. Another factor that might be relevant is the number of degenerate variants that lead to a given substitution.

More mutations in ACE2 would reduce ACE2-S affinity than increase it
This interpretation hinges on mCSM-PPI2 providing predicted ΔΔG that is unbiased. It is conceivable that mCSM-PPI2 could be biased towards negative ΔΔG predictions because the training datasets are skewed towards these observations 12,23 . Fortunately, the mCSM-PPI2 training data were augmented to mitigate this possibility 12 and our own analyses of this bias in the ACE2 SARS-CoV S complex suggests that although the magnitudes of positive predictions can be depressed, overall the predictions are well-balanced (see Methods). In addition, some further evidence that mCSM-PPI2 ΔΔG is unbiased comes from the equivalent distribution of predicted ΔΔG for all possible mutations in SARS-CoV-2 S at sites that interact directly with ACE2 ( Figure 7C). The ΔΔG distribution for these SARS-CoV-2 S mutations are still skewed towards ΔΔG < 0 kcal/mol but it does not contain the secondary mode below −1 kcal/mol that we observe in the equivalent ACE2 data (compare Figure 7A and C). Compared to ACE2, SARS-CoV-2 S also has a few more mutations that are predicted to enhance ACE2-S binding, including 10 > 1.0 kcal/mol out of 22 > 0.5 kcal/mol.

A missense variant causing Thr27Arg would enhance ACE2-S affinity
We noted that one of the ACE2 mutations predicted to enhance ACE2-S binding was accessible from a single base change. This was Thr27Arg, which constitutes a second potential risk factor variant and deserves further evaluation. Thr27Arg is accessible from the C>G transversion in the wild-type codon ACA results in the AGA codon for Arg. Thr27Arg has the largest predicted ΔΔG of all the ACE2 binding-site mutants tested (1.4 kcal/mol) and is an apparent outlier amongst the full set of ACE2 saturation mutants ( Figure 7A). Figure 8 illustrates the structural differences between the mutant and reference residue. Thr27 makes no specific contacts with SARS-CoV-2 S, coming closest to S Tyr489 (4.3 Å). Thr27Arg is amply accommodated in the interface. The guanidium interacts with S Tyr421 and Tyr473 whilst the charge is neutralised by nearby ACE2 Asp30. The guanidium fills a space between S Tyr473 and ACE2 Glu23 at the rim of the interface (behind the plane of Figure 8). The additional positive charge is neutralised by interaction with ACE2 Asp30. These features of Thr27Arg are strongly supportive of its potential to enhance ACE2-S binding.   (Table 4 and Figure 1) may affect the ratio of these ACE2 cleavage processes and consequently the availability of the TMPRSS2 ACE2 cleavage S-driven augmented entry pathway 7 . If this pathway were to prove active in SARS-CoV-2, then we could expect variants that disrupt ADAM17 cleavage to enhance SARS-CoV-2 infection whilst variants that disrupt TMPRSS2 might be protective. Mutations of Arg and Lys residues in the TMPRSS2 (R697-R716) and ADAM17 (R652-K659) cleavage regions can abolish ACE2 processing by these enzymes 7 . There are five Arg substitutions in the TMPRSS2 cleavage region that could reduce TMPRSS2 processing of ACE2 and consequently confer some level of resistance to SARS-CoV-2 infection, whilst the gain of Arg mutation p.Ser709Arg could lead to increased susceptibility by increasing TMPRSS2 processing. p.Arg697Gly is particularly notable given its prevalence and apparent specificity to South Asian males (Table 4). ADAM17 contains one Arg missense in gnomAD but since the variant leads to a Lys its effect in this respect is most likely neutral.

Nonsense variants in ACE2
A few nonsense variants with potentially severe consequences for the canonical ACE2 transcript are present in gnomAD (Table 5). There are three stop-gained variants, two frameshift variants and a start-lost variant. Each of these variants would have a severe impact on the protein through the translation of a truncated and aberrant product or nonsense mediated decay. It is possible that stop-gained p.Leu656Ter and frameshifting p.Lys702ArgTer16 could lead to a semi functional product since they allow translation of a complete peptidase domain but the relevance to SARS-CoV-2 entry is unclear. An inframe deletion variant that results in the deletion of Lys313 occurs in a single male in the non-Finnish European cohort. It is difficult to interpret the effect of this variant as it could alter the pattern of exposed residues on the helix in which it occurs. However, we think it likely that it would be structurally accommodated, particularly since the closest loop begins only a few residues away at Gly319 (Figure 1). Crucially, Lys313 is about 20 Å from the SARS-CoV-2 RBD so we do not expect it to impact ACE2-S formation. 4 Therapeutic applications ACE2 has been identified as a potential therapeutic target for COVID-19 24 . One idea is to disrupt the ACE2-S interface with small-molecule or peptide inhibitors. The descriptions of the ACE2 variants provided here can help to anticipate possible pharmacogenomic effects of variants located at the ACE2-S interface. Another promising treatment direction is the administration of recombinant human ACE2 (rhACE2) 24,25 . This might compete with endogenous ACE2 for SARS-CoV-2 S. The approach has been proved-in-principle through the inhibition of SARS-CoV-2 infections in engineered human tissue models 25 and a clinical trial is now underway (EudraCT Number: 2020-001172-15). An earlier trial was registered (Clinicaltrials.gov #NCT04287686) 24 but has since been withdrawn before enrolling participants. The affinity of the administered rhACE2 might be a crucial component of the treatments efficacy, as indicated by the fact that mouse ACE2 did not have the same effect as human ACE2 25 . We will now describe how the ACE2-S interface saturation mutagenesis data can be used to design mutant ACE2 with tailored SARS-CoV-2 S affinity.

Design of therapeutic recombinant ACE2 with tailored affinity for SARS-CoV-2 S
In addition to predicting the effects of missense variants, the mCSM-PPI2 saturation mutagenesis data can be used to select substitutions in ACE2 that are likely to lead to reduced or enhanced affinity towards SARS-CoV-2 S. An ACE2 mutant with enhanced S affinity might be desirable for rhACE2 therapy. Even slightly enhanced affinity over endogenous ACE2 might shift the equilibrium towards rhACE2-S-protein complex formation and reduce the dose required for inhibition of viral infection. Table 6 presents the top mutations from the set of mutants predicted to lead to the largest increase in ΔΔG per site. These mutations are excellent candidates to improve the efficacy of rhACE2 therapy in COVID-19. We draw special attention to Gly326Glu, which corresponds to one of the gnomAD missense variants discussed earlier (see §2.2.1) and may be an excellent candidate since native ACE2 activity is likely to be maintained given that the variant is observed in a population dataset. The full dataset is provided in Supplementary Data File 1. There are other criteria that should be considered to minimise the chance of disrupting the native activity of ACE2, which is thought could mitigate COVID-19 lung pathologies in addition to viral inhibition 24 . Basic considerations such as avoiding mutations to or from proline or glycine and nuanced strategies such as restricting candidates to sites with known population variants are relevant. The former criterion might be ignored if a major physicochemical change is deemed necessary to enhance S affinity. Whether or not this interface has any endogenous role will underlie tolerance of native ACE2 activity to mutations.
The utility of an ACE2 variant with enhanced ACE2-S binding as a competitive inhibitor of endogenous ACE2 is clear, but since our data could be used to guide the design of a variant with depressed binding, we can consider if this might be of any use. Assuming that rhACE2 therapy works by competing with membrane bound native ACE2 for SARS-CoV-2 S (rhACE2 efficacy in acute respiratory distress syndrome is thought to be due to circulating ACE2 activity 26 , i.e. it does not enter the cell), an ACE2 mutant with reduced affinity for SARS-CoV-2 could be beneficial if it replaced native ACE2 on the membrane. This could be useful if gene therapy were practical or desirable for this disease.
As far as we know, the clinical trial assessing the efficacy of rhACE2 therapy in COVID-19 uses recombinantly expressed wild-type cleaved ACE2 25 . Clinical testing and approval can be expediated for repurposed existing therapies as is clearly the strategy here. Our proposal to for enhanced rhACE2 could in principle be achieved with a single mutation and might be suitable for delivery within a similar rapid development framework.

The impact and prevalence of ACE2 variant receptor activity genotypes
How far could ACE2-S affinity modulation variants effect SARS-CoV-2 infection? There is clear evidence that mutations in ACE2 that effect its affinity for coronavirus S-protein could provide complete protection from infection. This can be understood from the host range specificity of coronaviruses driven by ACE2-S complementarity alongside mutagenesis studies showing that controlled single point mutations can abolish interaction 10 . In SARS-CoV-2, it has been shown that the APN and DDP4 cell entry pathways are not active 1,5 , so there can be no recourse to these routes (unless mutations in these genes could establish this activity). In this work, some observed ACE2 variants were predicted to have a comparable effect on ACE2-S binding as mutants that were experimentally determined to abolish the interaction altogether. Therefore, it is possible that individuals carry ACE2 variants that confer resistance to SARS-CoV-2 infection. If infection efficiency operates on a continuous scale, perhaps to some extent proportional to the binding affinity, then various gradations of resistance could be conferred by different variants.
It is less clear what effect the ACE2-S binding enhancer variants might have. In principle, they could increase susceptibility to infection by promoting SARS-CoV-2 cellular attachment, potentially reducing the infective dose of SARS-CoV-2. ACE2 variants with enhanced receptor activity could also promote infection of cell types that express lower levels of ACE2, and so lead to increased risk of COVID-19 complications (e.g., acute renal failure) since viral spreading is associated with clinical deterioration 27 . Such effects might also lead to increased transmission from individuals with ACE2-S enhancing variants. It was suggested that enhanced ACE2-S affinity in SARS-CoV-2 versus SARS-CoV could lead to increased infection in the upper respiratory tract 5 , this hypothesis is beginning to bear out with evidence of upper respiratory tract infection 28 and higher ACE2 affinity of SARS-CoV-2 compared to SARS-CoV now established 9 . However, there is also the possibility that such variants are protective, if for instance the ACE2 variant led to a persistent stable ACE2-S complex that interfered with S proteolysis, although to our knowledge this phenomenon has not been observed. Experimental work-up of the Thr27Arg and Gly326Glu variants of ACE2 should provide answers to these questions.
Aside from ACE2-S affinity, host cell ACE2 expression levels are known to be important for the cellular specificity of SARS-CoV 29 and attention has been directed toward ACE2 expression levels as a potential factor in COVID-19 susceptibility and severity 30 . It is helpful to consider that with respect to SARS-CoV-2 cell entry, the physical mechanism underlying the effects of variation in ACE2 expression and ACE2 variant S-protein affinity is the formation of the ACE2-S complex. In a simplistic model, ACE2 expression modifies ACE2 membrane concentration whilst ACE2-S affinity modifies the equilibrium dissociation constant. It is conceivable then that affinity variants and expression variants compound the effect of one another, in a manner akin to that suggested for ACE2 variants and ACE2 stimulating drugs 30 , and so ACE2 haplotypes and linkage with other regulatory loci could be critical. This also has implications for more moderate ACE2-S affinity modifying variants, since their effect could be amplified in the presence of altered ACE2 expression.
Other epidemiological features can be reconciled with our observations. For instance, males are significantly more likely to die as a result of COVID-19 infection. Since ACE2 resides on the X chromosome, female heterozygotes will express a proportion of ACE2 alleles whilst males are hemizygotes and carry only a single ACE2 allele (this was also noted by Asselta et al. 31 who also noted that ACE2 escapes complete X-inactivation). A male carrying a rare ACE2-S affinity enhancing variant will be worse affected than a female heterozygote. Moreover, it is conceivable that in females the more resistant ACE2 allele could rapidly become predominant in infected tissues due to selection induced by host cell turnover over infection cycles, gradually increasing the expression of the S-protein resistant ACE2 allele if present. In addition to the benefits of this selection process, this may have implications for female survivors of COVID-19 depending on the persistence X-inactivation bias and the nature of the hitchhiking alleles.
Although, as described above, there is clear evidence that missense variants are capable of abolishing ACE2's receptor activity towards SARS-CoV-2 S, it is complicated to estimate the prevalence of such variants in the population. The gnomAD variants predicted to reduce ACE2-S binding are all very rare. p.Glu37Lys is the most prevalent with an allele frequency (AF) 0.0003 in the gnomAD Finnish samples and it is also observed in one African/African American sample. The two other gnomAD inhibitory variants p.Gly352Val and p.Asp355Asn are both doubletons observed only in non-Finnish Europeans, each corresponding to AF = 0.00003 in this population. Expanding on this, a simple estimation is that these two variants might confer SARS-CoV-2 resistance to just over 1 in 15,000 non-Finnish Europeans, comparable to many rare diseases. However, these are not the only potentially protective ACE2 variants. Our analysis of the effects of all possible missense mutations at the ACE2-S interface predicted that 38 substitutions accessible via single base changes at 12 sites would lead to severely reduced interaction. These substitutions account for 41 missense variants out of the possible 108 single nucleotide variants at the codons corresponding to the ACE2 residues that bind SARS-CoV-2 S. A crude upper bound under the highly conservative assumption that these variants occur at lower frequencies than the allele frequency of a singleton can be calculated. The minimum possible AF for a singleton in gnomAD non-Finnish Europeans is 8.8 × 10 -6 . Since we are calculating the effect of 41 unobserved variants, their joint frequency (assuming no linkage) is 3.6 × 10 -4 , which is equivalent to 72 variants per 200,000 alleles or 7.2 per 10,000 population. A more sophisticated estimate takes account of the empirical detection of rare variants in ACE2. For example, there are 1,181 variant alleles in 56,885 non-Finnish Europeans arising from variants with AF < 0.01 in the 2,415 nucleotide ACE2 coding sequence (n.b. at this AF threshold, 90 % are missense). This amounts to a variation rate of about 4.3 × 10 -6 per nucleotide per allele. So, at the 12 sites (36 nucleotides) with variants predicted to strongly reduce ACE2-S binding we could roughly expect 1.6 × 10 -4 variants per allele or 31 variants per 100,000 population. This is another conservative estimate for these sites since surface residues tend to be more variable than the core or other functional regions of the protein 32 . Finally, since 41 out of 108 possible SNPs at these 12 sites are predicted to lead to a strongly reduced ACE2-S interaction, we can multiply by this proportion to estimate the number that might strongly reduce ACE2-S binding. This yields an estimate that 11.8 individuals per 100,000 population possess an ACE2 allele that is predicted to be resistant to SARS-CoV-2 infection.
Although these rough estimates suggest that highly resistant ACE2 alleles are extremely rare it is important to note that allele frequencies show significant variation even between large variation datasets 33 . Moreover, it is known that local population allele frequencies also vary substantially from those reported in public datasets 34 and it is therefore possible that some populations have resistance mutations at higher frequencies. Also, we have only considered ACE2 variants that are predicted to result in a strong reduction or total abolition of ACE2-S binding. Our data also suggests that ACE2 interface mutants tend to at least slightly reduce ACE2-S binding and this means that there will be a higher prevalence of ACE2 alleles with this property. In concert with altered ACE2 expression levels, these moderate ACE2-S affinity variants could have a larger effect at the population scale.

Applications to population genetic analyses of ACE2 variation in COVID-19
Another approach to assess the role of ACE2 variation in COVID-19 has been the application of comparative population genetic analysis. Two reports have found polymorphisms in ACE2 expression quantitative trait loci (eQTL) vary between populations but both concluded there was no clear evidence regarding ACE2 missense variants 4,31 . Why did neither of these studies find evidence that ACE2 missense variants could be associated with variable population specific epidemiology? First, Rosanna et al. 31 acknowledge that there is no conclusive evidence that apparent cross-population differences in COVID-19 epidemiology are not accounted for by population demography 31 , which raises the possibility that population differences in allele frequencies and burden are meaningless in the context of COVID-19 genetics. Nevertheless, the detection of decreased TMPRSS2 burden in Italians and Europeans compared to East Asians 31 showed that the approach could turn up interesting results with rational mechanistic interpretations. A potential prejudice against the resolution of COVID-19 genetic association with ACE2, and one that highlights an application of our work, is the inclusion criteria for variants in burden tests. Standard filtering strategies based on predicted variant severity may have worked well for TMPRSS2 where SARS-CoV-2 exploits its native catalytic function 5 , but ACE2 mediated SARS-CoV-2 cell entry does not require ACE2 catalytic activity 10 and as a result these filters might be counterproductive. Inclusion criteria for missense variants might be better based on their predicted effect on S-protein affinity, differentiating variants predicted to reduce ACE2-S affinity from those that enhance it since they may have opposite effects (Table 3), or potential to mitigate some other SARS-CoV-2 infection pathway such as TMPRSS2 or ADAM17 cleavage region variants (Table 4). Moreover, it might be useful to test missense variants with specific effects on SARS-CoV-2 infection separately, to isolate these features from the other roles ACE2 may have in COVID-19 pathology. This application to burden tests would likely prove useful to attempts to determine host genetic factors that influence COVID-19 susceptibility and severity.

Reconciling conflicting results from closely related studies
Two studies applied computational methods to gnomAD variants at the ACE2-S interface and report conflicting conclusions to our own 35,36 . Hussain et al. 35 employed homology modelling, pathogenicity predictions and the PRODIGY 37 webserver to assess 17 gnomAD ACE2 variants located at the S interface or at reported mutagenesis sites, all of which were considered in our work, including variants p.Asp355Asn, p.Glu37Lys and p.Gly352Val that we find are likely to strongly inhibit ACE2-S binding (Table 3). They tentatively suggested that p.S19P and p.E329G might be resistant to S-protein attachment, which partially conflicts with our conclusion that these variants are unlikely to significantly effect ACE2-S binding (Table 3), and they did not predict a significant reduction in ACE2-S binding in the p.Asp355Asn, p.Glu37Lys and p.Gly352Val variants. In a different study, Othman et al. 36 assessed eight gnomAD ACE2 variants at the interface. Their variant models were derived from homology models built from SARS-CoV S bound to human-civet chimeric ACE2 (PDB ID: 3scl) and they also used PRODIGY 37 in addition to another predictor to calculate ΔΔG. After finding no variants exceeded a predefined 1 kcal/mol threshold they concluded that this meant there was little chance of ACE2 mediated resistance to SARS-CoV-2. Amongst the eight variants tested in Othman et al. 36 were the three variants we predict will strongly disrupt ACE2-S binding and p.Gly326Glu, which we find enhances ACE2-S binding.
Although these negative findings conflict with our work, we remain confident in our results for a few reasons. First, we point to our validation data on mCSM-PPI2, which demonstrates its effectiveness in the highly similar SARS-CoV S-protein ACE2 complex (see Methods), and we report that PRODIGY, which was employed by both other studies, did not perform as well on this validation set (Supplementary Figure 2). Another factor that could contribute to the different results is that the other studies derived variant models from different structures of ACE2-S than we have (i.e., PDBs 6lzg 38 , and 3scl). We found mCSM-PPI2 predictions from PDBs 6vw1 9 and 6lzg 38 to be highly concordant for the same interface mutations (ρ = 0.97; see also Supplementary Figure 3) confirming that structure selection was not an issue.
With respect to p.Ser19Pro and p.Glu329Gly, which Hussain et al. 35 predicted might be resistant to S-protein attachment on the basis of PRODIGY predicted ΔG and the presence of fewer charge interactions in the p.Glu329Gly model and other lost contacts. An interesting rationale given for these results was that the interaction between ACE2 Lys353 and the Sprotein, a site shown to be important for binding SARS-CoV S-protein 10 , was lost in their models of p.Ser19Pro and p.Glu329Gly but it is unclear how confident this movement is given that Lys353 is about 30 Å and 16 Å away from the variant sites, respectively. In our results, p.Ser19Pro had the smallest predicted ΔΔG (−0.2 kcal/mol) of any of the gnomAD interface variants whilst p.Glu329Gly had the third smallest (−0.4 kcal/mol; Table 3). These values are well below our calibrated 1.0 kcal/mol threshold, but we do recognise the possibility they could be false negatives or could weakly inhibit ACE2-S binding. However, we observed nothing in these sites' local environments that suggests the variant residues would strongly affect ACE2-S association ( §2 and Supplementary Table 1) and since Hussain et al. 35 refer to distant structural changes to explain the result we think that the mCSM-PPI2 result for these variants is more structurally realistic.
The difference between our conclusions and these reports highlights the uncertainty inherent in prediction methods and the variability in predictions from different algorithms and models, but we are confident that the results presented in our work are strongly justified. The mCSM-PPI2 prediction algorithm was designed specifically for the task in hand, to predict changes in protein binding affinities upon mutation, and displays best-in-class performance according to community best-practice benchmarks 12 . Our validation of mCSM-PPI2 with experimental data from the closely related SARS-CoV S-ACE2 complex provides direct evidence that this algorithm yields accurate predictions for this complex and calibrates the algorithm's quantitative predictions with observed physical behaviour (Methods). Additionally, we have followed up key results with a critical structural inspection of the local environments of the mutant models and in our judgement the predictions are in line with the principles established by the original structural and mutagenesis work in this area 8,10 .

Conclusion
In summary, we have assessed the potential of ACE2 genetic variation to impact SARS-CoV-2 infection, primarily through prediction of ACE2-S binding efficiency, but also with respect to the potential ACE2 TMPRSS2 cleavage driven augmented entry pathway. Our results are based largely on predictions from the state-of-the-art mCSM-PPI2 12 mutation effect predictor for protein-protein complex affinity, but we first validated this method against published experimental ACE2 mutant SARS-CoV S-protein affinities 10 and demonstrated its applicability to this system and calibrated the quantitative predictions to physical observations (see Methods). We found a single known variant in the gnomAD 11 , p.Gly326Glu, that we predict enhances ACE2 binding affinity for SARS-CoV-2 S that is therefore a potential risk factor for COVID-19. We found three variants in gnomAD that are predicted to reduce ACE2 affinity towards S, p.Glu37Lys, p.Gly352Val and p.Asp355Asn, that are therefore potentially protective against COVID-19. The models that underlie these predictions were structurally evaluated and we judged the predictions to be in line with the established principles concerning ACE2 receptor recognition by coronavirus S-proteins [8][9][10]39 . We extended our predictions to all possible ACE2 missense variants that interact with SARS-CoV-2 S. We found that mutations that lead to reduced ACE2-S interaction affinity are far more likely than mutations that lead to increased interaction. Simple estimates of the prevalence of ACE2 resistance conveying alleles suggest that the most protective alleles will be very rare (12-70 per 100,000 population) and potential risk factors will be rarer but the possibility of higher prevalence in local or underrepresented populations remains. Also, weaker effects, which are more common, could be amplified by ACE2 expression polymorphisms. This is consistent with the high rate of asymptomatic and mild infections and the rarity of severe disease in low-risk groups. A further candidate risk factor variant was identified in this analysis, Thr27Arg. Finally, we described how the mCSM-PPI2 predicted mutant affinities could be used to enhance the potency of a recombinant ACE2 therapy that is currently undergoing clinical trial 24,25 in COVID-19. This work has clear applications in helping to prioritise experimental work into SARS-CoV-2 S-protein human ACE2 recognition; developing genetic diagnostic risk profiling for COVID-19 susceptibility and severity; improving detection and interpretation in future COVID-19 genetic association studies and in developing therapeutic agents for COVID-19.

Integration of structure, variant and mutagenesis data
We conducted an initial scoping analysis of ACE2 in Jalview 2.11 14 . The ACE2 protein sequence was retrieved from UniProt 13 and the structure of SARS-CoV-2 S-protein bound ACE2 (PDB ID: 6vw1) was mapped to this sequence with Jalview's built-in structure mapping capability. The structure was visualised in a Jalview 14 linked UCSF Chimera 16 session. Contacts between ACE2 and SARS-CoV-2 S-protein were identified with the Chimera FindClash command with default settings for contacts, and Van der Waals overlaps were stored as attributes. The average overlap per residue was pulled into Jalview as features from Chimera with the Fetch Chimera attributes function. This allowed UniProt features (e.g., mutagenesis, SARS-CoV binding, TMPRSS2 cleavage site) to be interactively inspected alongside the residue S-protein contacts and gnomAD variants loaded from a Jalview feature file (Supplementary Figure 1; see below).
The pyDRSASP suite 15 was used to integrate 3D structure, population variant and mutagenesis assay data for analysis. Population variants from gnomAD were mapped to ACE2 with VarAlign 15 . Residue mappings were derived from the Ensembl VEP annotations present in the gnomAD VCF. In addition, we manually checked the gnomAD multi nucleotide polymorphisms (MNPs) data file and found no records for ACE2.  Figure 4). We transcribed data for additional mutants from Li et al. 10 that were not reported in UniProt.
Structural data were retrieved with ProIntVar 15 . The structure of chimeric SARS-CoV-2 Spike receptor binding domain in complex with human ACE2 (PDB ID: 6vw1) 9 was downloaded from PDBe. Residue-residue contacts were calculated with ARPEGGIO 18 . ACE2-S interface residues were defined as those with any interprotein interatomic contact (Supplementary Table 1). The asymmetric unit contains two ACE2-S-protein dimers. 68 residue contact pairs were observed in both complexes in the asymmetric unit, 6 were unique to complex A-E and 4 were unique to complex B-F. Although the complex between chain A ACE2 and chain E SARS-CoV-2 S is assigned as the most probable biological unit by RCSB and PDBe, for the purposes of completeness, we counted all contacts in both complexes as the ACE2-S interface. Interface contact numbers are defined as the number of residues on the partner protein a residue is in contact with (i.e., we do not count intraprotein contacts). For example, ACE2 S19 contacts SARS-CoV-2 S residues Ala475, Gly476 and Ser477 so its interface contact number is three. Solvent accessibility was calculated with DSSP 41 for the intact complex and each individual PDB chain (corresponding to monomeric ACE2 and SARS-CoV-2 S) and relative solvent accessibilities (RSA) were derived using the maximal allowed residue accessibility 42 . Interface residues were classed as Core or Rim as suggested by David and Sternberg 17 on the basis of differences between RSA of the complex and free monomers. UniProt-PDB mappings were retrieved from SIFTS 43 .

Prediction of missense variant effects on S-protein -ACE2 interaction
The mCSM-PPI2 12 web server was used to predict the effect of mutations on the SARS-CoV-2 Spike-ACE2 interface topology and binding affinity with the structure PDB ID: 6vw1 9 . ACE2 gnomAD variants were supplied to the web server via upload of a mutation list to the Predictions interface. Wild-type and mutant interaction graphs were inspected online with the Single-Point Mutation interface. The wild-type and mutant models were downloaded from the server as PyMol sessions and detailed structural comparisons were conducted with PyMol 20 . Our typical workflow was to group all the objects in each session (i.e., wild-type and mutant) and then load both models and contact descriptor objects into a single PyMol session. The models within the mCSM-PPI2 PyMol sessions did not retain their original numbering and it was necessary to reload the original PDB 6vw1 model to identify residues conveniently.
Most of the saturation mutagenesis data were obtained by selecting the Saturation Mutagenesis option on the Interface Analysis submission page. In this analysis, the mCSM-PPI2 web server defines the interaction interface automatically. No residue farther than 5 Å from the interface site, as defined by the web server, was included in the analysis. Saturation mutagenesis data for positions Gly326 and Glu352, which were beyond this limit but found in our study to harbour ACE2-S affinity altering variants, were obtained via the mutation list submission. Data were downloaded in CSV format. n.b. The mCSM-PPI2 model for p.Gly352Tyr displayed a chemically invalid conformation for the mutant Tyr and so results for this mutant were excluded.
mCSM-PPI2 predictions for mutations corresponding to the ACE2 single-point mutants with experimental data 10 on SARS-CoV Spike-ACE2 binding were obtained with the structure of SARS-CoV S receptor binding domain complexed with human ACE2 (PDB ID: 2ajf) 8 as starting model. These results were merged with the experimental binding assay data, collected as outlined above.
mCSM-PPI2 predictions were obtained for models with the engineered Arg439 substituted with the native Asn. Arg439Asn was modelled with mCSM-PPI2 on the structure 6vw1. The model was exported to PDB from the PyMol session downloaded from the mCSM-PPI2 results page. This model was provided as the starting structure for mCSM-PPI2 predictions. mCSM-PPI2 predictions for the validation reverse mutations were obtained in the same way.
Validating mCSM-PPI2 Spike-ACE2 ΔΔG predictions on SARS-CoV S-protein data mCSM-PPI2 12 is a random forest family predictor that models mutations with graph based signatures and other descriptors trained on protein-protein binding affinities from the SKEMPI 2.0 database 23 . mCSM-PPI2 predicts the change in the Gibbs free energy of the complex as a result of point mutations (ΔΔG). Negative ΔΔG means that the mutation is predicted to decrease the binding affinity whilst positive ΔΔG predicts an increase in affinity. mCSM-PPI2 achieved an average Pearson's correlation (ρ) of 0.82 and root mean square error (RMSE) of 1.18 kcal/mol during 10-fold cross-validation and ranked first out of 26 others on the CAPRI 44 community standard benchmark. A better representative of the predictor's generality to different complexes is the cross-validation results on a training set with low redundancy of complexes. This gave ρ = 0.75 and RMSE = 1.30 kcal/mol. This behaviour is typical for machine learning methods where performance is best in systems closer to the training data. The method was also evaluated on a set of 378 alanine scanning mutations across 19 protein complexes and this resulted in exceptional cross-validation metrics ρ = 0.95 and RMSE = 0.25 kcal/mol, which can be interpreted as an indication of the consistency of the method for different mutations within complexes Experimentally determined SARS-CoV S-protein affinities are available for several ACE2 mutants 10 . These mutants present the opportunity to validate mCSM-PPI2 predictions on a system closely related to SARS-CoV-2 S-protein interactions. If mCSM-PPI2 can be shown to perform well here, then we expect good performance for SARS-CoV-2 S-protein affinities, potentially approaching the accuracy displayed in the evaluation of the alanine scan dataset. Methods Figure 1A illustrates the performance of mCSM-PPI2 ΔΔG as a predictor for the experimentally determined S-protein interactions. The experimental affinities are positively correlated with predicted ΔΔG (ρ = 0.68) so that mutants in ACE2 that strongly reduced or abolished SARS-CoV S interaction (S-) tend to have lower predicted ΔΔG than mutants with little or no effect on the interaction (S+). In detail, five of the seven S-mutations have predicted ΔΔG between -2 and −1 kcal/mol whilst most S+ mutations have predicted ΔΔG > −0.5 kcal/mol. The remaining two S-mutants, Lys31Asp and Lys353His, received ΔΔG predictions > −0.5 kcal/mol, which is within the range displayed by the bulk of S+ mutants. Two S+ mutants received slightly lower ΔΔG predictions than most other S+ mutants, but still well above −1 kcal/mol. These results suggest that ΔΔG < −1.0 kcal/mol yields a confident prediction that the mutation strongly inhibits or abolishes ACE2-S interaction (rule-in threshold) whilst ΔΔG > −0.5 kcal/mol is indicative that the mutation has little to no effect on the interaction (rule-out threshold). ΔΔG predictions between these limits are relatively sparse but S+ mutations received predicted ΔΔG as low as −0.73 kcal/mol whilst the first inhibitory mutant had predicted ΔΔG −1.17 kcal/mol. Table 7 shows the concordance between the experimental result and the mCSM-PPI2 prediction setting a prediction threshold at ΔΔG < −1.0 kcal/mol. At this threshold, no S+ mutants are falsely predicted to abolish ACE2-S binding and only two S-mutants are falsely predicted to have little to no effect on the interaction.
We can evaluate the mutant models to try to explain why two of the validation mutants are misclassified. The discordance between the mCSM-PPI2 prediction and experiment for Lys353His is surprising because mCSM-PPI2 correctly identifies the other Lys353 mutants, Lys353Ala and Lys353Asp, as causing a severe reduction in the ACE2-S interaction. Methods Figure 2 compares the interactions of the wild-type and mutant complexes at these sites. The size of the residues is a clear difference between the mutants, resulting in fewer hydrophobic contacts in the Lys353Ala and Lys353Asp mutants. Additionally, Lys353His can form ring-ring interactions with Tyr759 on the RBD, which could compensate for the fewer hydrophobic contacts in this mutant. These considerations go some way to explain why Lys353His is predicted to have less impact on ACE2-S binding than Lys353Ala or Lys353Asp, but they do not explain the discrepancy with the experimental result. One factor that might be at play is that the protonation state of the mutant His could modulate its inhibition of ACE2 S binding and this might not be well-modelled by mCSM-PPI2. The second misclassified mutation is another Lys mutant. Methods Figure 3 shows the environment of ACE2 Lys31 and Lys31Asp. The sidechain of Lys31 is oriented away from the SARS-CoV S interface allowing the Lys β-and ɣ-carbons to form hydrophobic interactions with S Tyr442 and Tyr475. In free ACE2, Lys31 is oriented outward and is fully exposed and it has been suggested that the internal neutralisation of Glu35 by Lys31 in the ACE2-S complex is required for efficient binding 39 . In Lys31Asp, there are fewer hydrophobic interactions, but the carboxyl group forms an H-bond with Tyr442 that could compensate. These interactions might explain the relatively small predicted ΔΔG of −0.25 kcal/mol and it could be that mCSM-PPI2 does not recognise the importance of the internal neutralisation for binding.  Our validation set does not include any mutants predicted to enhance ACE2-S binding. The potential candidates we found were mutant ACE2 orthologs with humanising mutations 10 but we were unable to find a suitable experimentally determined protein structure to model these mutations. In the absence of these data we are unable to validate or calibrate mCSM-PPI2 explicitly for ACE2 variants that are predicted to enhance S binding but there are a few options to address this deficit. First, we recognise that the ability to predict mutations that increase ΔG as well as those that lower ΔG was part of the design specification for mCSM-PPI2 12 . This objective was advanced by extending the mutant affinity training data with hypothetical reverse mutations to mitigate the bias towards destabilising mutations 12 . With this in mind, we could simply accept this stated capability of the algorithm at face value and assume a symmetric threshold of +1.0 kcal/mol to identify mutations that will significantly enhance binding. This is technically equivalent to extrapolating the relationship between ΔΔG and experimental ACE2-S association (Methods Figure 1A) past 0 kcal/mol, which warrants some explanation. Forward extrapolation of the linear trend in Methods Figure 1A above 0 kcal/mol would yield predicted assay results > 100 %, reaching ≈ 200 % by +2.0 kcal/mol. Although values greater than 100% were observed in the experimental binding assay (and are valid) 10 , the assay is bounded above by an unknown quantity that in our estimation should correspond to the inverse of the ratio of immunoprecipitated ACE2 by the C-terminal tag antibody to the total amount of ACE2 in the sample. This ceiling represents the point at which all mutant ACE2 is immunoprecipitated by the S1-Ig fusion protein used in the original assay 10 .
An alternative approach is hinted at by the reverse mutations the mCSM-PPI2 developers used to augment their training dataset with complex stabilising mutations. Since ΔΔG is a thermodynamic state function, ΔΔG reverse = −ΔΔG forward , and any deviation in the predicted result is an error 12 . This means that the reverse mutations of the experimental mutants that inhibit binding lead to enhanced binding. These can be readily modelled, for example, to model the reverse mutation for Asp355Asn we submitted the mCSM-PPI2 model of Asp355Asn and requested a prediction for Asn355Asp. Methods Figure 1B illustrates the relationship between the experimental binding assays and predicted ΔΔG reverse . Generally, ΔΔG reverse follows our expectations (Methods Figure 1C), with most ΔΔG reverse > 0 kcal/mol and displaying a negative linear correlation with ΔG forward (ρ = −0.78), but there are a few moderate outliers amongst the S-mutations with apparently underestimated ΔΔG reverse . These outliers contribute to a reduction in correlation with the experimental assays compared to that achieved with the forward mutations (Methods Figure 1B, ρ = 0.41), which leads to poorer performance as a predictor of the assay results (Methods Figure 1B). An overall reduction in concordance was expected, since starting from predicted models runs the risk of compounding errors, but it is not clear if the reduced sensitivity is due to generally poorer sensitivity of mCSM-PPI2 towards mutations that enhance affinity or a consequence of these errors. However, we can still achieve good specificity with ΔΔG thresholds of > 1.0 kcal/mol (TPR = 0.29, TNR = 0.95) or slightly more relaxed > 0.5 kcal/mol (TPR = 0.57, TNR = 0.84).
These results indicate that mCSM-PPI2 is applicable to the ACE2 coronavirus S complex and that ΔΔG is a good predictor for S-protein affinity. A threshold of ΔΔG < −1.0 kcal/mol is calibrated to a strong reduction or total abolition of ACE2 S binding affinity. For potential affinity enhancing variants, ΔΔG > +1.0 kcal/mol is a sensible starting point that should yield high specificity but if increased sensitivity is desirable then a relaxed threshold as low as ΔΔG > + 0.5 kcal/mol is appropriate.
mCSM-PPI2 predictions are robust to the presence of engineered Arg439 in PDB 6vw1 The structure of the ACE2 SARS-CoV-2 S-protein receptor binding domain (RBD) complex contains a chimeric SARS-CoV-2 S-protein with the engineered substitution S Asn439Arg introduced to increase binding with ACE2. mCSM-PPI2 predicts that S Asn439Arg enhances S-protein since mutations to the wild-type is predicted to destabilise ACE2-S (ΔΔG = −0.25 kcal/mol for S Arg439Asn in PDB 6vw1; ΔΔG reverse = 0.11 kcal/mol), although notably this prediction does not meet our strict thresholds. Although the mutation itself does enhance ACE2-S binding 10 , its presence does not appear to significantly influence mCSM-PPI2 predicted ΔΔG of gnomAD variants. This is shown in Methods Figure 4, which compares predictions from the chimeric RBD (i.e., PDB 6vw1) to those from a model of the RBD with Asn439 modelled in 6vw1; these are practically identical (ρ = 1.00). Despite this, the variant residue conformations can be influenced by the presence of the engineered residue ( §2.2.1) but it appears that for all gnomAD variants, where this happens the alternative conformations yield the same or very similar ΔΔG.
Methods Figure 4. Comparison of mCSM-PPI2 12 predicted ΔΔG for all ACE2 gnomAD 11 variants that map to PDB 6vw1 8 calculated from PDB 6vw1 directly and from the mCSM-PPI2 model of R439N in PDB 6vw1.

Enumerating possible ACE2 missense SNPs
The ACE2 gene (ENSG00000130234) was retrieved from Ensembl in Jalview 14 . Two identical CDS transcripts (ENST00000427411 and ENST00000252519) were found with the Get Cross-References command corresponding to ACE2 full-length proteins (ENSP00000389326 and ENSP00000252519). These correspond to the UniProt ACE2 sequence Q9BYF1. The CDS was saved in Fasta format and this was parsed in Python with Biopython. The CDS was broken into codons and all possible single base changes were enumerated and translated using the standard genetic code. This provided the set of amino acids accessible to each residue via a single base change.
Software Jalview 2.11 14 was used for interactive sequence data retrieval, sequence analysis, structure data analysis and figure generation. UCSF Chimera 16 and PyMol 20 were used for structure analysis and figure generation.
The pyDRSASP 15 packages, comprising ProteoFAV, ProIntVar and VarAlign were used extensively for data retrieval and analysis as described in Methods. Biopython was used to process sequence data.
Data analyses were coded in Python in Jupyter Notebooks. Numpy, Pandas and Scipy were used for data analysis. Matplotlib and Seaborn were used to plot data.

Code availability
All code and analysis notebooks used in this study are available from the Barton Group public GitHub repository at https://github.com/bartongroup/covid19-ace2-variants.

Data availability
This study employed several public datasets, which are available from their original sources and all necessary identifiers and accessions are provided in the methods.