MMGB/SA Consensus Estimate of the Binding Free Energy Between the Novel Coronavirus Spike Protein to the Human ACE2 Receptor.

The ability to estimate protein-protein binding free energy in a computationally efficient via a physics-based approach is beneficial to research focused on the mechanism of viruses binding to their target proteins. Implicit solvation methodology may be particularly useful in the early stages of such research, as it can offer valuable insights into the binding process, quickly. Here we evaluate the potential of the related molecular mechanics generalized Born surface area (MMGB/SA) approach to estimate the binding free energy ΔGbind between the SARS-CoV-2 spike receptor-binding domain and the human ACE2 receptor. The calculations are based on a recent flavor of the generalized Born model, GBNSR6. Two estimates of ΔGbind are performed: one based on standard bondi radii, and the other based on a newly developed set of atomic radii (OPT1), optimized specifically for protein-ligand binding. We take the average of the resulting two ΔGbind values as the consensus estimate. For the well-studied Ras-Raf protein-protein complex, which has similar binding free energy to that of the SARS-CoV-2/ACE2 complex, the consensus ΔGbind = −11.8 ± 1 kcal/mol, vs. experimental −9.7 ± 0.2 kcal/mol. The consensus estimates for the SARS-CoV-2/ACE2 complex is ΔGbind = −9.4 ± 1.5 kcal/mol, which is in near quantitative agreement with experiment (−10.6 kcal/mol). The availability of a conceptually simple MMGB/SA-based protocol for analysis of the SARS-CoV-2 /ACE2 binding may be beneficial in light of the need to move forward fast.


Introduction
Emerged as a global threat to human health, the SARS-CoV-2 virus that causes COVID -19 disease has been studied widely since the start of 2020. 1 Despite sequence and structure similarities with other viruses, 2 no highly effective treatment option for the novel coronavirus is available. As of today, about 14 million people across the globe have tested positively for the virus, and around 600,000 have died of  This fast-growing pandemic highlights the role of computational structural biology and computer-aided drug design (CADD), which have the ability to accelerate the slow and expensive process of drug discovery. 4 In structure-based drug discovery, accuracy and speed of the binding free energy prediction of drug-like compounds (ligands) to target biomolecules plays a key role in virtual screening of drug candidates. [5][6][7] Despite decades of research, efficient and accurate computational prediction of binding free energies is still a challenge. [8][9][10][11][12] In theory, the binding free energy of a molecular system can be estimated directly from thermodynamic first principles. 13 However, for any realistic molecular systems, e.g., the complex of interest in this work that is made of more than 12,000 atoms, 14 approximations must be made to make the estimate computationally feasible. For example, alchemical methods, 15,16 simulate changes in the free energy along a pathway that sometimes reflects non-physical properties or literally "alchemy". The required sample points along the pathway are generated via Monte Carlo or Molecular Dynamics (MD) simulations. Some of the popular methods in this class are thermodynamic integration (TI) and free energy perturbations (FEP). 17 However, these simulations are still computationally expensive, specifically when it comes to high throughput virtual screening of thousands of potential drugs.
Remarkably more efficient, end-point free energy methods ignore details of the complex to unbound state pathway and estimate free energy on an ensemble of snapshots representing the initial (complex) and the final (unbound) states only. These While calculations based on practical implicit solvation models such as generalized Born (GB) are arguably not as accurate as corresponding estimates based on the best available explicit solvent models, the use of implicit solvent has an undeniable appeal. And not only of computational efficiency: reasoning about physical origins of observed effects is often much more transparent in an implicit than explicit solvent. [25][26][27] That last advantage may be particularly valuable now, when so much about COVID-19 structure-infectivity relationship remains unknown.
In this work, we employ MMGB/SA implemented in AmberTools18 28 for binding free energy calculation of the SARS-CoV-2 spike receptor-binding domain (SARS-CoV-2 S RBD) and the human ACE2 receptor (PDB ID:6m0j), see Fig. 1. Through the MMGB/SA approach, the absolute binding free energy of a complex is calculated as the sum of gas-phase energy, solvation free energy, and entropic contributions averaged over several snapshots extracted from the main MD trajectory. A grid-based surface GB model is used for estimating the polar component of solvation free energy, coupled with water and atomic radii introduced earlier. 29 Human H-Ras and the Ras-binding domain of C-Raf1, so-called Ras-Raf complex, 6,30 is chosen as the reference for the initial evaluation of the MMGB/SA model.
Final results are compared with those from a few available relevant studies. The main goal of the work is a quick assessment of the potential of the simple and efficient MMGB/SA method to future studies of SARS-CoV-2 to ACE2 binding.

Methods and Materials Binding Free Energy Decomposition
Binding free energy, ∆G bind , of a molecular system is calculated as follows: where ∆H is the enthalpy change of the system, T is the absolute temperature in K, and ∆S is the entropy change of the system. A high-level illustration of ∆G bind between bound and unbound states of a solvated complex is shown in Fig. 2. In theoretical/computational studies, a useful way of calculating ∆G bind is through a thermodynamic cycle shown in Fig. 3.
With this approach, ∆G bind,solv is calculated as follows: The solvation free energy, ∆G solv , is broken into the polar and non-polar components: 5 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint Note that the T ∆S above does not exactly correspond to T ∆S in Eq. 1; specifically, the entropy of solvent re-arrangement 25,27 is subsumed into ∆G solv , see below, which is then considered a part of ∆H. Combining the free energy components defined above, we obtain: The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint virtual screening. As another important advantage, is that through MMPB/SA it is possible to decompose the total free energy into sub-components and measure their contributions separately. 6,30 This feature is certainly useful when it comes to comparing several different free energy methods. Finally, MMPB/SA is applicable to a wide range of structures, 32 from small host-guest systems to large protein-protein complexes with thousands of atoms. 6 Through the MMPB/SA approach, the average of ∆G solv is calculated on a collection of snapshots extracted from an MD simulation. Several decisions have to be made in applying the approach in practice. First, the computational protocol must be selected between the "single-trajectory" (one trajectory of the complex), or "separate-trajectory" (three separate trajectories of the complex, receptor and ligand). In this study, we choose the former protocol as it was shown 33 to be not only much faster than the alternative, but also less "noisy" due to the cancellation of inter-molecular energy contributions. This protocol applies to cases where significant structural changes upon binding are not expected. Shown in Fig. 4, the single-trajectory MMPB/SA starts with the initial structure of the complex in vacuum.
After solvating the structure in a solvent model, an MD simulation is performed to generate the snapshots for further analysis. Then, a relatively large number (typically N > 100) of uncorrelated snapshots are extracted to represent the structural ensemble. Next, binding free energies of these structures are calculated in the implicit solvent after removing the explicit solvent molecules. The average binding free energy over these snapshots is reported as the final ∆G bind .
With the single-trajectory protocol, the binding free energy of a protein-protein complex is formally calculated as follows: where < ... > i denotes an average over i snapshots extracted from the main MD trajectory.
The implementation of this protocol is available in AmberTools18 in Perl 33 and Python. 34 7 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . Figure 4: MMPB/SA flowchart. The initial structure of the complex is solvated using a water model. An MD simulation is run from which a relatively large number of snapshots are extracted. The average binding free energy of the snapshots is assigned as the binding free energy of the system.
In this work the former is used to maintain consistency with the reference study 30 opted for tuning the MMGB/SA model.

Solvation Free Energy
Polar Component. A computationally efficient alternative to the PB, the GB implicit solvent model 35,36 can be used for computing ∆G solv . Generally speaking, GB models have shown to be computationally less expensive than the PB models, although the deterioration of the accuracy has always been a concern. Here, we employed a grid-based surface GB model called GBNSR6, 37 which, in a recent study, 38 was shown to be the most accurate among several GB models in terms of the ability to approximate ∆G pol relative to the numerical PB. In this work, ∆G pol is calculated with the ALPB modification 39,40 of the generalized Born 41 model: 8 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . where in = 1 and out = 80 are the dielectric constants of the solute and the solvent, respectively, β = in / out , α = 0.571412, and A is the electrostatic size of the molecule, which is essentially the overall size of the structure that can be computed analytically. Here we employ the most widely used functional form f GB where r ij is the distance between atomic charges q i and q j , and R i , R j are the so-called effective Born radii of atoms i and j, which represent each atom's degree of burial within the solute. The effective Born radii, R, are calculated by the "R 6 " equation: 42,43 where ∂V represents the chosen representation of the dielectric boundary of the molecule, dS Non-polar Component. A common method to estimate the non-polar contribution to the solvation free energy in Eq. 3 is to assume that it is proportional to the solvent accessible surface area (SASA) of the molecule: While there are more accurate methods to estimate the non-polar 45 contribution, here we use the simple Eq. 8 for the sake of simplicity and consistency with ref. 30 Also for consistency with the same work, here we use γ = 0.0072 kcal/mol/A 2 . Atomic radii that form SASA not only play an important role in the non-polar component, but also enter the polar component through the dielectric boundary. Therefore, the right choice of atomic radii is crucial to the accuracy of binding free energy estimation. Three sets of atomic radii are used here: 9 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint OPT1, 29,46 bondi, and mbondi2. The first two are listed in Tab. 1. Mbonid2 is indeed bondi whose hydrogen atoms bound to a nitrogen are expanded from 1.2Å to 1.3Å, see . 47 Carbon (C), hydrogen (H), oxygen (O), nitrogen (N) and sulfur (S) are the main atomic types in this study. The water probe radius is fixed to 1.4Å.

Gas-Phase Energy
Gas-phase energy of the solute, ∆E M M , is the summation of internal energies, electrostatic energies, and van der Waals energies. In all of the MMGB/SA calculations reported here, ∆E M M is calculated using the ff99 AMBER force field. The choice of this old force field is deliberate, and was initially motivated to ensure maximum consistency with , 6 which provides a very detailed analysis of MMGB/SA performance on Ras-Raf. Good agreement with experiment, Table 2, motivated us to use the same ff99 force field for all the subsequent MMGB/SA calculations reported here. All of the enthalpy calculations in this study are averages over 500 snapshots extracted from the main MD trajectory.

Configurational Entropy
Normal-mode analysis (NMA) and quasi-harmonic analysis are of the two common methods for calculating configurational entropy of the solute. 33 Since the latter has shown poor convergence in several cases, NMA is selected for entropy calculations. 30 The main drawback of this method is the computational cost that becomes intractable for large systems, e.g., systems with more than 8,000 atoms in MMGB/SA (Perl version) of AMBER18 are not supported for NMA. To tackle this problem, one approach is to truncate the complex so that the binding interface is preserved in its original shape. 48 In this study, the SARS-CoV-2 S RBD 10 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a

Structure Preparation
Ras-Raf Complex. This well-studied complex was selected as the reference for testing the parameters of the MM/GBSA model. We used tleap module in AMBER18 to set up the input coordinate and topology files. The structure was solvated in a box of TIP3P 49 water model (10Å buffer). This choice of old water model and ff99 AMBER force field was deliberate, to ensure full consistency with . 6 The GTP molecule and the magnesium ion (M g 2+ ) were eliminated for the sake of simplification. Since the net charge was 0, no counterion was added.
SARS-CoV-2 S RBD and ACE2 Complex: Full Structure. H++ server 50 was employed to protonate the complex at pH=7.5. The server automatically generates the solvated structure in a box of OPC 51 explicit water model (10Å buffer), with AMBER ff14SB force field. This full structure is used only for enthalpy calculations that are compared with those of the truncated complex structure for justifying the truncation approach, see below.
SARS-CoV-2 S RBD and ACE2 Complex: Truncated Structure. To execute NMA entropy calculation the original structure (PDB ID:6m0j) was truncated from 12,515 atoms (791 residues) to 7,286 atoms (463 residues) by removing residues, one by one, starting from the N-terminus of the spike protein, and the C-terminus of the ACE2 protein. The goal was to have fewer than 8,000 atoms remaining, while preserving sequence continuity of the resulting structure to facilitate the set up of MD simulations, see Fig. 5. The remaining atoms are still within 8Å from the binding interface. The same protocol used for the full structure was employed for parameterization and solvation.

11
. CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. .

Trajectory Generation
All of the MMGB/SA estimates are based on snapshots extracted from MD trajectories generated as described below. The structures were prepared and equilibrated as follows.
The solvated complexes were first energy minimized (max. minimization cycle of 1000), followed by 50 ps of heating (from 1 K to 300 K) at constant volume, followed by 50 ps of density equilibration at 300 K at constant 1 bar pressure, followed by another 2 ns of constant (1 bar) pressure equilibration at 300K. In these stages, atomic coordinates were restrained to their initial positions with 2 kcal/mol/A 2 . All simulations, including the production runs described below, were executed with the GPU-enabled pmemd.cuda MD engine in AMBER18, using Langevin dynamics with a collision frequency of 2ps −1 and an integration time step of 2 fs while the bonds involving hydrogen atoms were constrained by the SHAKE algorithm. Electrostatic interactions were approximated via the Particle Mesh Ewald (PME) method, with a non-bond cutoff set to 9Å. Coordinates were recorded every 10 ps.
Ras-Raf Complex. A production of 10 ns was performed using the Ras-Raf structure prepared with the protocol described in Sec. .

SARS-CoV-2 S RBD and ACE2
Complex: Full Structure. A production of 50 ns was carried out using the full structure described in Sec. .

12
. CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint SARS-CoV-2 S RBD and ACE2 Complex: Truncated Structure. The same protocol used for the full structure was employed. A weak restraint of 0.01 kcal/mol/A 2 was applied to the atoms of the truncated complex, relative to the X-ray positions, during the 50 ns production to prevent the truncated complex from falling apart. This restraint diminishes the discrepancy between the force filed and water model in the structure used for MD simulation (OPC, ff14SB) and the one for ∆G bind calculations (TIP3P, ff99).

MM/GBSA on Ras-Raf
Here, we study the accuracy of ∆G bind calculation using a different GB model (GBNSR6) coupled with two sets of atomic radii. According to Tab. 2, it is observed that ∆G bind calculated by GBNSR6 with OPT1 radii underestimates the binding affinity whereas GBNSR6 coupled with bondi radii overestimates it. Yet, both of these results have better agreement with the experiment 52 compared to the reference MGB model in . 30 We have also noticed that a consensus estimate ∆G bind = (∆G bind (bondi) + ∆G bind (OP T 1))/2 is only within 2 kcal/mol off the experimental reference. Encouraged by this reasonable agreement with experiment, we have decided to utilize GBNSR6 with both bondi and OPT1 radii to produce a consensus estimate of ∆G bind for the SARS-CoV-2 S RBD and ACE2 complex.

MM/GBSA on the Truncated SARS-CoV-2 S RBD and ACE2
The RMSD of the truncated SARS-CoV-2 S RBD and ACE2 backbone compared to the crystal structure of the full complex is shown in Fig. 6. The trajectory is stable after 50 ns of production, with the RMSD from the X-ray reference of around 3.15Å. Comparing the estimated ∆H between the truncated and original structures (results not shown) demonstrates that the relatively small difference between the two (about 1 kcal/mol using mbondi2 radii and 8 kcal/mol using OPT1 radii) indirectly validates the use of the truncation procedure.

13
. CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . Table 2: MMGB/SA results on RAS-RAF. Means and the standard errors of the mean are listed. All the components are in kcal/mol. Consensus ∆G bind = (∆G bind (bondi) + ∆G bind (OP T 1))/2 For comparison, an earlier MMGB/SA estimate is also listed (it does not contribute to the consensus estimate). The experimental value is from isothermal titration calorimetry. 52   14 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. .

Conclusion
In this study, we have evaluated the performance of a relatively new GB model, GBNSR6, and the newly introduced intrinsic atomic radii, in calculations of binding free energy of a well-studied protein-protein complex, Ras-Raf and of SARS-CoV-2 S RBD and ACE2 complex, critical in the mechanism of the novel coronavirus infection. Unlike many previous efforts, the new radii were specifically optimized to best reproduce the explicit solvent results particularly in the implicit solvent-based binding MMGB/SA estimates. We also employed the common bondi radii for the same calculations.
A better agreement with experiment of the absolute binding free energy for Ras-Raf, compared to previous work, was achieved for both radii sets; however each individual estimate either under-or over-estimated the experimental binding energy. We have therefore proposed a consensus estimate, which is the average of the two: for Ras-Raf the consensus based on OPT1 and bondi is in near quantitative agreement with experiment.
We applied the same approach to estimate ∆G bind of the SARS-CoV-2 S RBD and ACE2 complex. As in the Ras-Raf case, the under-and over-estimation by each radii sets nearly cancelled, resulting in a consensus estimate essentially within 1 kcal/mol from the experimental reference. That essentially quantitative agreement paves the way for further exploration of SARS-CoV-2 S RBD and ACE2 complex with MMGB/SA, which has a number of advantages. Specifically, the MMGB/SA approach could be reasonably accurate for future analysis of relative binding free energies in this system, including the effects of mutations, relative contributions from various residues to ∆G bind , congeneric series of ligands, etc. Equally importantly, MMGB/SA is well-suited for reasoning for physical reasoning. Needless to say, additional thorough investigation is needed to see if the proposed approach can be extended to other complexes.
Both of the two separate estimates of ∆G bind based on either bondi or OPT1 radii result in the expected near cancellation of the relatively large ∆H and −T ∆S terms, suggesting that each of these estimates "makes sense" on its own. Moreover, note that for both of 16 . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint the protein-protein complexes studied here, the two radii sets provide a fairly narrow range within which the experimental value lies. Specifically, ∆G bind calculated with bondi radii is underestimated (too negative), whereas ∆G bind calculated by OPT1 radii is over-estimated.
One rationale for this behavior, which is the basis of the proposed consensus approach, could be that bondi and OPT1 radii sets have very different physical foundations behind them (geometry for the former and global optimization of the electrostatics for the latter), so the resulting errors in the corresponding electrostatic estimates are not as strongly correlated as for radii derived on the same principle.
All of the MD trajectories generated in this work are available from the authors upon request.
. CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint . CC-BY 4.0 International license (which was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint this version posted August 26, 2020. . https://doi.org/10.1101/2020.08.25.267625 doi: bioRxiv preprint