Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials

Alchemical free energy methods with molecular mechanics (MM) force fields are now widely used in the prioritization of small molecules for synthesis in structure-enabled drug discovery projects because of their ability to deliver 1–2 kcal mol−1 accuracy in well-behaved protein-ligand systems. Surpassing this accuracy limit would significantly reduce the number of compounds that must be synthesized to achieve desired potencies and selectivities in drug design campaigns. However, MM force fields pose a challenge to achieving higher accuracy due to their inability to capture the intricate atomic interactions of the physical systems they model. A major limitation is the accuracy with which ligand intramolecular energetics—especially torsions—can be modeled, as poor modeling of torsional profiles and coupling with other valence degrees of freedom can have a significant impact on binding free energies. Here, we demonstrate how a new generation of hybrid machine learning / molecular mechanics (ML/MM) potentials can deliver significant accuracy improvements in modeling protein-ligand binding affinities. Using a nonequilibrium perturbation approach, we can correct a standard, GPU-accelerated MM alchemical free energy calculation in a simple post-processing step to efficiently recover ML/MM free energies and deliver a significant accuracy improvement with small additional computational effort. To demonstrate the utility of ML/MM free energy calculations, we apply this approach to a benchmark system for predicting kinase:inhibitor binding affinities—a congeneric ligand series for non-receptor tyrosine kinase TYK2 (Tyk2)—wherein state-of-the-art MM free energy calculations (with OPLS2.1) achieve inaccuracies of 0.93±0.12 kcal mol−1 in predicting absolute binding free energies. Applying an ML/MM hybrid potential based on the ANI2x ML model and AMBER14SB/TIP3P with the OpenFF 1.0.0 (“Parsley”) small molecule force field as an MM model, we show that it is possible to significantly reduce the error in absolute binding free energies from 0.97 [95% CI: 0.68, 1.21] kcal mol−1 (MM) to 0.47 [95% CI: 0.31, 0.63] kcal mol−1 (ML/MM).


Introduction
, AMBER14SB [47], and TIP3P [48] while the ML model uses the ANI-2x [49] neural network potential parameterized using DFT B97X/6-31G* QM calculations. Bottom: The ANI-2x [49] ML potential first computes radial and angular features for each atom and then sums energetic contributions by atom using deep learning models specific to each element-element pair. 91 While the notion of simulating an entire solvated protein-ligand system with quantum chemical accuracy-92 in a manner that avoids molecular mechanics parameterization altogether-with orders of magnitude less July 30, 2020

Hybrid ML/MM potentials provide improved ligand energetics
Tyk2 is a challenging test set for predicting kinase:inhibitor binding free energies. The Tyk2 congeneric ligand benchmark series was taken from the Schrödinger JACS benchmark set [45], which is challenging for both commercial force fields (OPLS2.1 achieves a ΔG RMSE of 0.93±0.12 kcal mol −1 [45]) and public force fields (GAFF 1.8 achieves a ΔG RMSE of 1.13 kcal mol −1 , and ΔΔG RMSE of 1.27 kcal mol −1 [50]). Left: Illustration of the X-ray structure used for all calculations. Right: 2D structures of all ligands in the benchmark set, showing common scaffold and substituents. The Schrodinger Tyk2 benchmark set contains a congeneric series selected from [51,51] where experimental errors in are reported to have with X ∈ ℝ 3 as all non-ligand coordinates (receptor and/or solvent), X ∈ ℝ 3 as the ligand coordinates,   ) stored. The Bennett acceptance ratio (BAR) [54][55][56] is used to estimate the free energy difference, which corresponds to the crossing of the true ( ) work distributions [55,57].
( )], unless the MM models have been specifically parameterized to minimize the variance in [41], 132 small differences in the equilibrium valence degrees of freedom ensure that the variance of this weight is 133 so large as to make this approach impractical ( Figure 4).

134
As an alternative, we propose a convenient approach that uses short, nonequilibrium (NEQ) simulations 135 to reduce the variance sufficiently to enable practical free energy estimates. Importantly, we aim to avoid 136 two pitfalls: First, we aim to avoid the significant bias that arises in attempting to estimate free energy 137 differences from short, unidirectional MM→ML/MM nonequilibrium switching trajectories [66], as well as 138 the costly long nonequilibrium trajectories that would be required to minimize that bias; instead, we aim 139 to use bidirectional protocols (MM→ML/MM and ML/MM→MM) and the optimal Bennett acceptance ratio 140 estimator, which minimizes this bias [54,55,66]. Second, we aim to minimize the amount of simulation 141 data that must be generated from the ML/MM state since it is so slow to sample from. 142 We construct an alchemical protocol that connects the easily-sampled MM thermodynamic states to the Here, the alchemical parameter ∈ [0, 1] interpolates between the MM and hybrid ML/MM endstates, ∈ 146 ℝ 3 corresponds to the configuration of the receptor (and all other environment) atoms, and ∈ ℝ 3 147 corresponds to the configuration of the ligand atoms. The NEQ free energy correction can be computed 148 using four sequential steps, each performed independently for the solvated phase and the complex phase:  The Bennett acceptance ratio (BAR) [54][55][56] is then used to estimate the free energy correction Δ → ∕ 160 to the absolute free energy of the MM endstate for each phase (complex, solvent) of the alchemical free 161 energy calculation. 162 We stress that resampling after forward NEQ switching (step 2) is a critically important part of the proce-   182 We applied this nonequilibrium switching free energy correction scheme to a benchmark set of a well-183 studied congeneric series of inhibitors for non-receptor tyrosine-protein kinase (Tyk2) from the Schrödinger  [51]. Statistical analysis was performed using the Arsenic package [http://github.com/openforcefield/arsenic], with 95% confidence intervals calculated by bootstrap analysis. For all plots, an additive constant was added to all computed values, such that the mean computed value is equal to the mean experimental value, such as to minimise the RMSE as in [45].

Improved torsion energetics appear to drive accuracy improvements 211
It is well-appreciated that general small molecule MM force fields often fail to accurately describe torsion 212 energy profiles observed with higher-level quantum chemical calculations [73,74], a phenomenon driven 213 by the significant effect substituents can have on torsion profiles via electronic effects [26,75]. To overcome 214 this limitation, many MM force fields recommend refitting torsion potentials directly to quantum chemical 215 calculations for individual molecules in a bespoke manner [1,27]. 216 It is plausible that the improved accuracy demonstrated by ML/MM in Figure 6 arises primarily from 217 improved modeling of torsion energetics or torsion-torsion coupling. To investigate this, we examined the  In this work, we demonstrated that a hybrid ML/MM model of a challenging, pharmaceutically-relevant 226 benchmark receptor:ligand system dramatically outperforms the both commercially and publicly-available 227 MM force fields in its ability to recover experimental binding free energies of a congeneric series of inhibitors.

228
The ability to halve the current state-of-the-art free energy uncertainty to ∼0.5 kcal mol −1 using a simple post-229 processing procedure along with publicly-available software and ML models is promising. In particular, it 230 suggests that there is significant potential yet to predict ligand:target binding affinities for prospective drug 231 design campaigns.

232
The NEQ protocol suggests the potential for further efficiency improvements.

233
Among the insights made in the course of this study, we make special note of the NEQ procedure and  The fact that the most pronounced discrepancies in torsion profiles for both solvent and complex phases 248 were observed about amide functional groups is particularly notable. If it is indeed the case that current 249 MM force fields fail to recover appropriate conformational energetics of amides, then there is certainly a 250 potential for free energy accuracy improvement in a large subset of druglike molecules, especially among 251 protease inhibitors, which are characterized by several amide groups to mimic peptide backbones.

252
Binding free energy improvements afforded by ML/MM show promise for other systems.

253
In this study, we were fortuitous in that the OpenFF 1.0.0 provided heavy-tailed phase-space distributions, 254 particular in the solvent phase (see Figure 8), that overlapped sufficiently with the ML/MM model. Had this 255 not been the case, it is likely that the NEQ correction procedure would have failed to recover free energies 256 with sufficient precision at the annealing times employed in this study. Further study will indicate whether 257 other MM force fields-including the GAFF force field [13,14] and more recent iterations of the OpenFF 258 force fields-generally provide sufficient phase-space overlap with ML models for ML/MM corrections to 259 remain computationally convenient and accurate with respect to experiment.

260
While these results illustrate the notable improvement to relative free energy calculations for this Tyk2 261 protein:ligand system, more extensive studies will be needed to determine how robustly this accuracy im-262 provement manifests for a broad range of congeneric series. While the current implementation involves 263 post-processing of pre-generated MM data, the implementation of the method could be improved by inte-264 grating ML potential models such as ANI into extensible simulation packages such as OpenMM [76], perhaps 265 via a plugin architecture. Improved interoperability would increase ease of adoption for computational ef-266 forts in drug design projects. The definition of the hybrid ML/MM potential could be improved through 267 expanding the terms in the system that are computed with ML by using ML methods such as AIMNet [77], 268 SchNet [78], PhysNet [79], or AP-Net [80] that allow for decomposition of electrostatics and long-range dis-269 persion from short-term valence energies.

270
Machine learning will likely permeate all aspects of alchemical free energy calculations. July 30, 2020 AER is a current member of the Science Advisory Board of Schrödinger Co. and receives funding from 317 Genentech Co. as specified in [45]. Simulations were performed for both the bound-complex phase and solvent phase 323 to afford relative binding free energies using OpenMM 7.4.2 [76]. Protein and ligand files were adapted 324 from [50] and are available as part of the openmm-forcefields 0.7.1 package [https://github.com/openmm/ 325 openmmforcefields]. Simulations were performed with AMBER14SB [47] protein forcefield and the OpenFF 326 1.0.0 "Parsley" small molecule forcefield [46]. The system was solvated with a 9.0 Å padding using TIP3P 327 water [48] and 150 mM NaCl [84]. Simulations were performed without hydrogen bond constraints using 328 4 amu hydrogen massses following mass repartitioning. A timestep of 2 fs with a 1 ps −1 collision rate at 300 K 329 was performed using a BAOAB Langevin integration scheme [85,86], using an NPT ensemble at 1.0 atm sam-330 pled using a Monte Carlo barostat with molecular scaling. Nonbonded interactions were handled using a 9 Å 331 cutoff using Particle Mesh Ewald (PME) with a tolerance of 2.5 × 10 −4 and long-range dispersion corrections.

332
The alchemical perturbation was performed using a single topology protocol. The mapping protocol to The bidirectional nonequilibrium protocol was parameterized in accordance with Eq. 2 with 5000 annealing 350 steps. MM model and simulation parameters are consistent with those described above.

351
The forward NEQ procedure is as follows for each phase: 352 Algorithm 1: Nonequilibrium (NEQ) switching protocol Input : iid system configurations ( ∈ ℝ 3 ) from MM relative free energy calculation; the set thereof is denoted . distributions for each ligand and each phase is shown in Figure S.I.1.

369
The forward, resampling, ML/MM endstate simulation (for 10 ps), and backward annealing procedures, 370 described previously, were used to recover bidirectional work distributions along with the BAR-estimated 371 free energy correction.

372
The ligand configuration was extracted from the solvent and complex phases and modelled with the