Computing Absolute Binding Affinities by Streamlined Alchemical Free Energy Perturbation (SAFEP)

Free Energy Perturbation (FEP) is a powerful but challenging computational technique for estimating differences in free energy between two or more states. This document is intended both as a tutorial and as an adaptable protocol for computing free energies of binding using free energy perturbations in NAMD. We present the Streamlined Alchemical Free Energy Perturbation (SAFEP) framework. SAFEP shifts the computational frame of reference from the ligand to the binding site itself. This both simplifies the thermodynamic cycle and makes the approach more broadly applicable to superficial sites and other less common geometries. As a practical example, we give instructions for calculating the absolute binding free energy of phenol to lysozyme. We assume familiarity with standard procedures for setting up, running, and analyzing molecular dynamics simulations using NAMD and VMD. While simulation times will vary, the human tasks should take no more than 3 to 4 hours for a reader without previous training in free energy calculations or experience with the VMD Colvars Dashboard. Sample data are provided for all key calculations both for comparison and readers’ convenience.


Introduction
In this tutorial, we are principally concerned with computing the Absolute Binding Free Energy (ABFE) of a ligand to its receptor.While many methods of measuring free energies ex-ist, alchemical Free Energy Perturbation (FEP) methods make use of the fact that since the change in free energy is pathindependent it can be calculated via an unphysical path.In the case of FEP, that unphysical path is defined by scaling the non-bonded interactions of the ligand [1][2][3][4][5][6].In essence, the user can make a bound ligand "disappear" from the binding site, make it re-appear in the bulk solution, and calculate the corresponding free energy difference.
While elegant and exact in principle, FEP calculations are often unwieldy in practice.One of the most stubborn challenges that most FEP implementations face is that the ligand must maintain the original bound configuration during decoupling, even as the very interactions that stabilize the bound configuration are weakened.Consequently, absolute binding schemes introduce restraints on the ligand to mimic the interactions that stabilize the bound ensemble.While there are many such schemes to accelerate convergence, [7][8][9][10] most do so in ways that require error-prone manual parameterization and many hours of the user's time.Many of these techniques have already been reviewed by Mey et al [2].As discussed there, no restraint scheme is without cost; the restraints themselves will bias the estimated free energy and must be corrected in the coupled state, the decoupled (gas phase) state, or both.Conveniently, restraints that have simple geometries and do not affect the coupled state can be accounted for analytically.On the other hand, numerical approaches must be used to correct for restraints (such as an RMSD [10] or DBC collective variable [11]) that restrict the accessible configuration space to an irregular volume.However, if the restraint potential on the collective variable is chosen to have negligible impact on the coupled state (such as a flat-well potential [12]), these calculations can be done cheaply in vacuum as illustrated in this tutorial.When more intrusive restraints (such as harmonic potentials [9,10]) are applied in the coupled state, it is necessary to calculate numerical corrections in the coupled state regardless of the restraint geometry.
Streamlined Alchemical Free Energy Perturbation (SAFEP) is specifically designed to make FEP calculations faster and easier for the user without sacrificing the accuracy of the final free energy estimate.SAFEP reduces conceptual and computational complexity by replacing conventional rotational and translational restraints for stabilizing the ligand in the binding site with a single Distance-to-Bound Configuration (DBC) coordinate.This makes setup easier because instead of parameterizing six or more individual restraints, the user need only parameterize one.Furthermore SAFEP can handle superficial binding sites in phase-separated bulk with atomistic resolution [11].Statistically optimal FEP estimators require both decoupling and recoupling calculations; SAFEP uses Interleaved Double-Wide Sampling (IDWS) to extract both quantities from the same calculation, roughly halving the required simulation time (see the alchLambdaIDWS entry in the NAMD User Guide section on Free Energy Perturbation) [13].SAFEP makes extensive use of the Colvars Dashboard in VMD allowing the user to easily measure collective variables, impose biases, and generate restraint configuration files from one interface [14].Finally, analysis tools and data visualizations are included in one Jupyter notebook allowing for comprehensive quality assurance along with the ∆G calculation.
Figure 1 depicts the thermodynamic paths at the heart of SAFEP.The desired quantity ∆G • bind (red, left column) is equal to the sum of the steps in the SAFEP method (black, right column), This equation forms the basis for the steps that follow in this tutorial.

Scope
The following steps will walk the user through the calculation of an Absolute Binding Free Energy (ABFE) using a computationally affordable example (phenol bound to a mutant lysozyme), but we have written these steps to be straightforward to generalize to other systems.These exact steps have been tested thoroughly for this particular system.More detailed discussions of each step can be found in Appendices A-E.To facilitate the generalization of the method to other systems, we have provided additional troubleshooting advice in Appendix F, as well as example files for each step.These appendix entries are hyperlinked and referenced throughout the body of the tutorial.

Background knowledge
We assume intermediate experience with running classical MD (any software) and beginner experience with NAMD 2.14 or later.If this is not the case, please see the NAMD Tutorial [15].The latter portions that involve analysis are less important for this tutorial.Useful but not required material on alchemical free energy perturbations can be found in "Insilico alchemy: A tutorial for alchemical free-energy perturbation calculations with NAMD" [7].Finally, basic knowledge of VMD and Python will be required for data analysis and visualization.Computing the ABFE of a ligand bound to a protein (∆G • bind ) is the ultimate goal.This is found by computing the free energies of several, smaller perturbations: 1) decoupling the unbound ligand from the condensed phase to the gas phase under no restraints (∆G * bulk ); 2) enforcing a restraint scheme (-∆G • V and ∆G DBC ); and 3) coupling the ligand from the gas phase to its bound pose in the condensed phase (-∆G * site ) under the Distanceto-Bound Configuration (DBC) restraint.The free energy contribution of the volumetric restraint (∆G • V ) is calculated analytically, while the other three contributions are calculated via simulation.The free energy of the top horizontal leg vanishes in SAFEP due to the design of the DBC restraint.See Appendix C for more details.
README for more information on getting the latest version.3. Python 3.9.12 or later 4. Jupyter 5. safep Python package and its dependencies (see Procedure below for installation instructions) NAMD will be used to perform simulations.GPU acceleration of restrained free energy perturbations are available in NAMD3 (with CUDASOAintegrate off) [16,17].System setup, trajectory visualization, and restraint definition will be carried out in VMD [18].Data analysis and visualization will be handled by a Jupyter notebook with the above dependencies.
High-performance computing resources are recommended, but not required.Sample outputs are provided for each step for users with limited computing resources or time.
The Colvars Dashboard is the recommended tool for writing and editing Colvars config files.Advanced users can also edit the text files directly.Example files are provided for each step.

Process Overview
Within the scope of free energy perturbations, absolute free energies of binding are typically calculated by the doubledecoupling method (DDM) [1,3,4].In this method, pair interactions (non-bonded terms) between the ligand and the rest of the simulation box are gradually scaled to zero (decoupled) from both a bound state and an unbound, solvated state.
In order to maintain the ligand in its bound state, most current approaches introduce a series of rotational and translational restraints on the ligand, each of which requires parameterization and an additional ∆G correction.In contrast, SAFEP uses just one restraint: a flat-well on the "Distance-to-Bound Configuration" (DBC).This reduces both the number of parameters to be optimized and the number of simulations to be performed compared to layered restraints (See Figure 2 or Appendix C for details).
The thermodynamic cycle used for absolute binding free energies in SAFEP is shown in Figure 1 while the unknown values (black arrows) can be calculated by the simulations outlined in Figure 3.More precisely, the thermodynamic cycle (Fig 1) and the corresponding simulations (Fig 3) are broken into three main steps involving three simulation systems: 1) the ligand bound to the protein, 2) the ligand in the gas phase, and 3) the ligand in the bulk.The order of computations is unimportant so long as the endpoints are defined consistently (e.g. the same temperature is used throughout and restraints are used consistently).For the sake of clarity, we have arranged the process linearly: Steps A and B are concerned with calculating ∆G * site , step C addresses the free energy of the DBC (∆G DBC ), step D measures ∆G * bulk , and step E calculates an analytical correction (∆G • V ) and combines all the preceding terms into the overall ∆G • bind using Equation 1.

Reference Coordinates
Align to reference using protein fitting group  Step E).Note: some simulations can be run simultaneously.

Protocol
The following steps demonstrate the SAFEP protocol applied to a computationally affordable example: calculating the binding affinity of phenol to a lysozyme mutant.For more details on the rationale behind this choice, see Appendix A. Procedure: • All run.namdfiles contain a line that reads set useSampleFiles 0. To use the sample data provided, set the value to 1. Otherwise, NAMD will use your inputs (provided they are named exactly as described in this document).• If you run VMD and your simulations on different computers, then you will need to manually edit paths later when you are running simulations.• Some simulations will take several days on a single core.To use 4 cores in parallel we have included the +p4 argument in the commands for the longer NAMD runs.This number may need to be optimized for your particular computing resources.• If you are using namd3 with CUDA (GPU) acceleration, you should consult the NAMD User Guide [13] for optimal settings.As of NAMD3b3, CUDAOSOAIntegrate should be left OFF, and "+idlepoll" should be included in the run command.• Common settings used by multiple simulations are in common/common_config.namd, which is sourced by the individual configuration files.This simplifies the individual configuration files and ensures consistency between calculations, which is a critical part of any free energy method.

Move on to Step A
Step A: Sample the bound state and define the corresponding restraint Alchemical decoupling removes the interactions that stabilize the occupied ensemble.Consequently, during decoupling the ligand may spontaneously diffuse into the bulk.Therefore we need to impose an external restraint to force the ligand to occupy the bound state throughout decoupling.With SAFEP we apply a single restraint on the Distanceto-Bound Configuration (DBC) collective variable as illustrated in Fig. 2.This restraint is straightforward to define and relatively insensitive to small differences in parameters [11].Its sole correction factor is calculated via Thermodynamic Integration later in this tutorial.For more details about the DBC restraint see Appendix C and [11].For a discussion of the merits of accounting for symmetry when computing the DBC of a small, symmetric ligand see Appendix C.This simulation should be long enough (∼50 ns) for the ligand to explore the configuration space of the bound state.See Appendix A for more details.• Select the phenol_lysozyme.pdbfile you used as input for this section.This will insert a line in the dashboard text editor that indicates the file that will be used for the DBC reference coordinates.• Copy the line just inserted and replace the refpositionsfile line at the bottom of the atoms block (the second highlighted line in the figure below).This sets the same PDB file to be used for aligning to the protein frame-of-reference.• For NAMD builds older than October 31, 2022: change "centerToReference" and "rotateToReference" to "cen-terReference" and "rotateReference" respectively.• The colvar config editor should now look like the screenshot below with your file's path in place of the two highlighted lines.The force constant in this case is in units of kcal/(mol•Å 2 ).The strength of restraint should be neither so great that it causes instabilities nor so weak that it fails to cleanly separate the bound and unbound ensembles.Step B: Decouple phenol from the protein via FEP In this section we will calculate ∆G * site by decoupling the ligand from the protein binding site (and all other contents of the simulation box) using alchemical FEP.We will maintain the ligand in the bound configuration relative to the protein by restraining the DBC coordinate as defined in the previous step.1. Specify which atoms will be decoupled using the pdb beta field a. Open VMD and load the psf and pdb files specified in "Required Input."b.Set and write beta values:

Required
• Open the Tk Console from the Extensions menu.
• Ensure that your Tk Console is in the correct directory: cd stepB_alchemy_site/outputs • Set the beta value of all atoms to 0: [atomselect top all] set beta 0 • Set the beta values of the ligand atoms to -1 for decoupling: [atomselect top "resname PHEN"] set beta -1 • Save as a pdb file: [atomselect top all] writepdb alchemy_site.pdb

Perform the FEP simulation
We have provided a configuration file for this FEP run: stepB_alchemy_site/inputs/run.namd.

Analyze the trajectory a. Visually inspect the trajectory in VMD:
• Open VMD.
• Ensure that the ligand remains in a bound-like configuration for the duration of the simulation.
b. Measure the restraint energy: • Open the Colvars Dashboard.
• Open the biases tab, select the DBC restraint, and click Energy Timeline .
• The restraint energy should remain near zero for several nanoseconds, then increase and reach a maximum in the second half of the simulation (when the ligand is fully decoupled).If this is not the case, see Appendix F. • Follow the in-notebook prompts to parse your new fepout file (stepB_alchemy_site/output/AFEP2-02.fepout).By default, we use the sample output.Be sure to update the paths as indicated in the notebook: • Compare your outputs to the sample outputs found in Appendix B.3.
• Caution: the errors reported are based on the PyMBAR BAR estimator and are subject to the corresponding assumptions and caveats [20].
Step C: Compute the DBC restraint free energy correction We designed the DBC restraint so that it doesn't do any significant work in the fully coupled system.However it does reduce the entropy of the fully decoupled ligand, which would otherwise be exploring an "empty" simulation box.We need to calculate the corresponding free energy cost so we can correct for it.In this section we will use thermodynamic integration (TI) to calculate ∆G DBC : the free energy difference between a gas-phase ligand under DBC restraints vs a (spherical) volumetric restraint.For more details see Appendix D.

Required Input:
•  • Open the Jupyter Notebook as in subsection 3.c from step B. Update the paths in "User Settings" as needed.Remember, sample data will be used by default.• Run the first several cells at least until the first FEP analysis section.
• Run the first cell in the section titled "Process the DBC TI calculation" and make sure the DBC and COM walls are correct.• Run all the remaining cells in that section.The output will include ∆G DBC as well as an error estimate.Sample results and more explanation can be found in Appendix D.

Step D: Decouple phenol from bulk solvent
You have completed one alchemical FEP calculation already, but double-decoupling methods require two such calculations to close the thermodynamic cycle.We need to know the free energy of transferring the ligand from the binding site into vacuum, and from vacuum into the bulk.In this section we will calculate the latter term, ∆G * bulk , by decoupling the ligand from the bulk solution.

Required Input:
• 1. Note: We have completed this step for you in the specific case of this tutorial.Prepare the ligand as you would for a traditional simulation.Use a sufficiently large box size; very small boxes are more prone to instabilities and selfinteractions which will lead to artifacts in the final free energy estimate.2. Specify which atoms will be decoupled using the pdb beta field • Confirm that bulk_fep_path points to your files • Parse the fepout file by running all the cells in the Jupyter notebook section titled "Decoupling from Solvent." c. Caution: as noted above, the errors reported are based on the PyMBAR BAR estimator and are subject to the corresponding assumptions and caveats [20].

Step E: Calculate corrections and combine quantities
We will now calculate ∆G • V analytically.With this final piece of information, we can calculate the dissociation constant and estimate a titration curve based on the probability of occupancy assuming a two-state system: Pocc = [PHEN] where K d is the dissociation constant given by K d = e β∆G • bind .For additional information see Appendix E.

What's next?
Congratulations!You have completed all the essential steps of the SAFEP protocol.You can now apply the protocol above to your own receptor-ligand binding problem, applying minor changes as needed.Of course, the outcome of your SAFEP calculation will be only as realistic as your model and force field.As you proceed, it is critical to visualize the simulations, track the evolution of the DBC, and evaluate the convergence of your own calculations.For a more in-depth analysis of the alchemical calculations, see the general-purpose notebooks from the SAFEP repository at https://github.com/BranniganLab/safep.Additional information and a (non-exhaustive) troubleshooting guide can be found in the appendices below.If you have trouble with this tutorial or the SAFEP package, please contact us either directly or by opening an issue on the corresponding GitHub page.We anticipate that future updates to this tutorial will incorporate lessons from user feedback and our own experience, including new examples requiring changes in the protocol.The standard deviation of the total free energy, defined in Eq. 9 The standard deviation of the free energy for a particular state i, defined in 10

Appendix A System Selection and Setup
Lysozyme L99A/M102H (PDBid 4I7L) was chosen for several reasons.Lysozyme L99A/M102H is a small protein that binds a small, rigid molecule with reasonably high affinity which has already been measured experimentally.These properties make it well-suited as a model for prototyping and validating free energy calculation methods generally.
Because lysozyme is elongated, we save some computation time by using a narrower box.Rotations of the protein could cause self-interactions, which we avoid by imposing a soft harmonic restraint on its orientation, as defined in common/protein_tilt.colvars.The provided systems were prepared using CHARMM-GUI [22,23] using a truncated lysozyme (PDBid 4I7L, residues 3 to 157) and solvated using default parameters (TIP3P water, 0.15 M NaCl).The production run uses largely default parameters and settings.The only notable exception is that WrapAll should be set to off.This is because wrapping across the PBC can cause unexpected results during analysis which can compromise the FEP and TI calculations.

Appendix B Running FEP in NAMD Appendix B.1 Configuration Files
In addition to the configuration, forcefield, and structural files, running FEP in NAMD requires a particular pdb file (sometimes called a "fep file") that contains flags that indicate which atoms are being coupled or decoupled.This is usually indicated in the beta column as '-1' for decoupling or '1' for coupling.All other atoms should have beta set to 0.
The configuration file also contains some additional options that are detailed in the NAMD user guide [13] and described briefly in the provided configuration files.While most of the settings should remain unchanged in a wide range of systems, there are a few exceptions.alchOutFreq determines the number of steps between collecting FEP samples.It should be set to a multiple of fullElectFreq to ensure accurate energy estimates.Later versions of NAMD should resolve this issue automatically; see Bug advisory and Workaround.Additionally, the sampling frequency should be between 50 and 200 steps; sampling too frequently will result in bloated data sets of highly autocorrelated samples while sampling infrequently will result in too few samples to get a well-converged estimate of the change in free energy.alchVdwShiftCoeff controls the strength of the softcore potential which is essential to prevent "end-point catastrophes" in which one or more Lennard-Jones potentials diverge to infinity near λ=0 or λ=1.In example inputs for this tutorial, we set it to 6 Å 2 , which is a safe value for a range of molecular systems.See [24] and [19] for more details.
alchElecLambdaStart is the value of λ for which discharge of the ligand is complete, while Lennard-Jones interactions begin weakening when λ equals alchVdwLambdaEnd.In this tutorial, we have set both parameters to 0.5, which separates the discharge phase of the calculation from the Lennard-Jones decoupling, and works well for many cases.alchEquilSteps determines the time between starting a new λ value and beginning to sample the ensemble.Alchemlyb and PymBAR provide functions that will downsample the data set using automated equilibrium detection schemes.We have found that automated equilibrium detection performs about as well as manually setting alchEquilSteps and autocorrelation is the bigger problem when trying to assess convergence.As a result we have set this to 1. See [20] for a more detailed discussion of equilibrium detection, autocorrelation, and their effects on free energy estimation.See Appendix B.3 or the provided Jupyter notebook for more information on how these are used in our analysis.
deltaLambda is passed as a parameter to the runFEP function and determines the width of the λ windows.Narrower windows will converge faster but will increase the total number of windows required to span λ = 0 to λ = 1.As a result, we need to empirically optimize the number and length of windows.See Appendix B.3 and Appendix F for more details on assessing and optimizing these parameters.The number and length of windows used here (∼ 40 ns total simulation time) are a good starting point, but very flexible ligands or dynamic binding sites may require much more sampling.

IDWS (interleaved double-wide sampling) tells NAMD
to alternate between sampling the forward and reverse λ directions (via the runFEP function, which is passed the alchLambdaIDWS parameter).This should be set to "true" thus removing the need for independent forward and backward runs.on the ligand [1,4,6,8,9].The main draw-back of such approaches is that each restraint must be designed and parameterized which adds to 1) the time and effort required to setup the simulations and 2) the complexity of the simulations themselves.As a result, troubleshooting and interpretation are more difficult and time-consuming.SAFEP in contrast uses just one restraint, the distance-to-boundconfiguration (DBC), which is both robust and requires minimal parameterization [11].As in FEP, restraint free energy perturbation (RFEP) scales certain energy terms and the associated forces using a perturbation parameter, λ ∈ {0, 1}.FEP estimates finite free energy differences between λ values while TI estimates the free energy derivative.As derived and discussed in most statistical mechanics and molecular dynamics textbooks (e.g.Ref. 25,26), the Helmholtz free energy F approximates the Gibbs free energy G when P∆V can be neglected.Furthermore, the P∆V terms tend to cancel out when subtracting the two alchemical free energy differences in our thermodynamic cycle, making the following approximation quite accurate:

Appendix D.2 Thermodynamic Integration and Analysis
where ∆λ i is the difference between adjacent λ values.The core of this expression, ∂U ∂λ λ i , can be calculated analytically for a given value of ξ by applying the definition k λ ≡ λ α k 1k 0 to equation 4 and taking the partial derivative with respect to λ, giving where k 0 is the force constant (forceConstant) when λ = 0, k 1 is the force constant when λ = 1 (targetForceConstant), and α (targetForceExponent) is a tuning parameter that improves convergence of TI by making the energy a smoother function of λ near λ = 0.In Step C, ξ is replaced by d, the DBC, and ξmax is replaced by the upper wall of the DBC restraint, dmax.This is applied to the colvars trajectory data in the Jupyter notebook section associated with Step C (Figure 9).
Finally, the error is estimated by The error on each local average ∂U ∂λ λ i is estimated as its standard deviation, thus avoiding any assumption on the number of independent samples, leading to A finer estimate of the error can be obtained by running replicas of the TI calculation and computing their dispersion.Technical details of our implementation can be seen in common/TI.tclwhich is just syntactic sugar for the Colvars implementation (See the Colvars user guide).
We consider a "unitary" (single protein, single site [11]) system with m ligands in a volume V.The ligand concentration in this system is L = m/V.A DBC coordinate d i can be defined for each of the m ligands, indexed by i.
The threshold on the DBC coordinate meaningfully divides the ensemble into two possible macrostates: occupied (one ligand occupies the site and m -1 ligands are in solution) and unoccupied (no ligands occupy the site and m ligands are in solution).We formalize this here through the DBC "test function," which for an individual ligand i is a Heaviside step function of the form The instantaneous site occupancy Θocc is determined by whether any of the m ligands occupy the site, given by the sum of all the individual test functions, Since here we consider the case in which the site can bind at most one ligand, Θocc is either 0 or 1.
The partition functions for the occupied and unoccupied states are thus Zocc and Zunocc respectively, where Zocc = Θocce -βU d N ⃗ r; (13) and the potential energy U is a function of the positions ⃗ r of all N particles in the system.Θocc is a function of the DBC coordinates d (and thus the positions of ligand and site atoms only).The occupancy probability Pocc(L) is thus which yields the average occupancy ⟨Θocc⟩.Gocc and Gunocc are the free energies of the occupied and unoccupied macrostates respectively, where βGocc = -ln Zocc; ( so We turn now to connecting these quantities to a SAFEP calculation.In step D of the protocol, we decoupled one lig-and from a bulk that contained m = 0 fully coupled ligands, for an infinitely dilute concentration of L = 0/V.We then extrapolated to the standard concentration using an ideal gas correction that assumes ideality. For a ligand at finite concentration in a non-ideal bulk, it is neither useful or necessary to standardize the free energy.Instead, we would carry out Step D at the finite ligand concentrations of interest (L = m/V > 0), and adjust Step E to calculate the unstandardized free energy ∆G bind (L) as follows: where the volume per molecule in the bulk is V/m and thus since ∆G bind (L) = Gocc(L) -Gunocc(L).Equation 18 can be rewritten in terms of ∆G bind (L), Even for a non-ideal bulk, we may assume the excess chemical potential is unchanging for small changes in concentration.Thus we would perform ligand decoupling (step D) at finite concentration L, and use the ideal gas correction to predict occupancy for nearby concentrations L ′ , as long as |L-L ′ | is small, Substitution of Equation 22 in Equation 21 yields the occupancy probability for concentration L ′ , Incidentally, for dilute L, Le β∆G bind (L) = L • e β∆G verge or appear to converge to unreasonable values, it can be difficult to discern what has gone wrong without simply starting over.We provide here some of the most common issues and their respective fingerprints as cautionary tales and troubleshooting tools.If you encounter a problem with running the tutorial as written and do not see your issue listed below, please contact us.

Bound and Unbound modes
If all but one mode may be described as unbound, place the DBC restraint upper wall between the bound mode and the unbound mode with the lowest DBC value.Proceed with FEP.

Multiple, indistinguishable bound modes
Such modes are a result of symmetric ligands and are best addressed using a symmetric DBC.See Appendix C.2.2 for more details.

Multiple, distinguishable bound modes
If one or more bound mode(s) is meaningfully distinct from some other mode(s), select a representative frame for each class.These frames become reference poses for each binding "site" from which you must calculate ∆G site , ∆G DBC , and ∆G V separately.Free energy contributions from each mode can be combined by exponential summing as shown in equation 12 of [27].

Appendix F.2.3 Step B or D: FEP Calculations
Symptom: DBC restraint energy stays close to 0.

Causes:
The DBC restraint may be too wide (upperWalls) or too soft (forceConstant).The λ windows may be too short to properly sample the decoupled ensemble.

Solutions:
First, watch the trajectory for any abnormal behavior; wide DBCs will often be obvious from the last several nanoseconds of a FEP run because the ligand will explore a much larger conformation space than expected.See the DBC debugging checklist below.If the system passes all checks, try running λ = 1 for longer to make sure the DBC restraint is functioning.
Symptom: DBC energy is consistently greater than 0.

Causes:
This issue is most often due to too-narrow DBC restraints (upperWalls) or other mistakes in the colvar definition.

Solutions:
This problem is harder to diagnose from the trajectory alone unless there are obviously over-restrained degrees of freedom in the ligand.Consult the DBC debugging checklist below.

Cause:
The most likely cause of unbinding during FEP is a DBC restraint that is too wide (upperWalls), too soft (forceConstant), inactive, or improperly defined.Symptom: Hysteresis larger than 1 kcal/mol for any λ window.

Causes:
Large hysteresis values are most often caused by: 1) insufficient equilibration, 2) short windows (less than a few hundred ps), or 3) wide windows (large dLambda).

Solutions:
If the system is well-relaxed and equilibrated by the usual metrics (box size, pressure, temperature, etc.), then it is most likely that either the λ windows are too short or too wide.Try increasing the sampling time or increasing the total number of windows.We have had good results with 120 windows of 3 ns each, but longer may be necessary for particularly unwieldy systems.

Causes:
Large hysteresis near the end-points of the FEP calculation are most commonly caused by so-called "end-point catastrophes."See The NAMD UG for more details [13].

Solutions:
Ensure alchVdwShiftCoeff is between 5 and 8.If this is already the case, and no other part of the calculation is problematic, try doubling the number of windows between the window with large hysteresis and the nearest end-point.
Symptom: Hysteresis oscillates or is otherwise correlated with λ.Causes: 1. Some versions of NAMD have a bug that allows FEP data to be written on a step without energy calculations.This results in the use of stale energies (from a previous step) and inaccurate estimates for differences in energy.2. The λ windows may be too short to reach equilibrium

Solutions:
As noted above, δ λ,i should be independent of λ with a mean of 0.
1. Manually ensure that alchOutFreq is a multiple of both fullElectFrequency and nonbondedFreq.See Bug advisory and Workaround for more details.2. Extend the length of each window or decrease dLambda.

Appendix F.2.4 Step C: TI Calculation
The DBC restraint should do the most work early in the TI calculation then, as the force constant is scaled out, the COM restraint should take over and keep the center of mass in a well-defined volume.TI convergence issues are most easily diagnosed by watching the MD trajectory and examining the colvar trajectories.

Symptom:
The collective variable trajectory is abnormal.

Causes:
Strange behavior is often caused by a mismatch between the coordinates used to determine the center of the COM restraint and the reference coordinates used for the DBC restraint as is the case in the figure above.

Solutions:
Update the reference coordinates to match those used for determining the center of the COM restraint.That is, revisit Step C1.b paying special attention to the coordinates used.Make sure you save your refPositionsFile paths are correct.
Symptom: COM restraint energies during TI are very high or very low and don't change much.

Causes:
The harmonic walls forceConstant is likely too stiff (high) or too soft (low).

Solutions:
Adjust the force constants to be between 100 (minimum) and 200.

Symptom:
The ligand doesn't move during TI (or moves very little).Causes: 1.The COM restraint is too narrow (upperWalls is too small).2. The run is too short to visit all λ values.3. The DBC restraint isn't being released.

Solutions:
1. Double check the choice of wall position and ensure that it is correct in all Colvars configuration files.2. Make sure your run is long enough; if you increase the targetNumSteps parameter, you must also increase the run time proportionally.3.If you are using our TI.tclscript, double-check all settings and parameters.If you have written your own TI

Figure 1 .
Figure 1.The SAFEP thermodynamic cycle.Computing the ABFE of a ligand bound to a protein (∆G •bind ) is the ultimate goal.This is found by computing the free energies of several, smaller perturbations: 1) decoupling the unbound ligand from the condensed phase to the gas phase under no restraints (∆G * bulk ); 2) enforcing a restraint scheme (-∆G • V and ∆G DBC ); and 3) coupling the ligand from the gas phase to its bound pose in the condensed phase (-∆G * site ) under the Distanceto-Bound Configuration (DBC) restraint.The free energy contribution of the volumetric restraint (∆G • V ) is calculated analytically, while the other three contributions are calculated via simulation.The free energy of the top horizontal leg vanishes in SAFEP due to the design of the DBC restraint.See Appendix C for more details.

Figure 2 .
Figure 2. The Distance-to-Bound-Configuration (DBC) coordinate.The DBC coordinate is used as a bias to prevent ligand dissociation during uncoupling.The user specifies a subset of protein atoms as the fitting group (teal) and a subset of ligand atoms (red); also shown are the protein surface (gray) and remaining ligand atoms (black).A) User-specified reference coordinates for both protein and ligand.B) During simulation, both protein and ligand will drift from the reference coordinates (black dashed outline).C) In order to remove rotational and translational diffusion of the protein from the DBC calculation, Colvars aligns the system to the reference coordinates using only the protein fitting group atoms.D) The DBC is the RMSD of the user-specified ligand atoms (solid) with respect to the reference coordinates (dashed).

Figure 3 .
Figure 3.The overall SAFEP workflow.From top to bottom and left to right: 1) the ligand must be setup (as for classical MD) in each of the three states (bound, solvated, gas phase) and minimally relaxed (white boxes); 2) a longer, unbiased simulation of the ligand-protein complex is necessary to sample the bound state (green box) which is used to determine the distribution of the DBC (orange box, Step A); 3) two FEP calculations and a Thermodynamic Integration (TI) calculation are carried out (blue boxes, Step B, Step C, and Step D); and 4) the resulting values are combined to get the standard free energy of binding (gray box; Step E).Note: some simulations can be run simultaneously.

2 .
Define the Distance-to-Bound Configuration (DBC) Coordinate a. Open the Colvars Dashboard in VMD: • Open VMD.• Load the psf, pdb, and dcd files listed above under "Required Input."You may choose to load your own dcd if you completed Step 1. • From VMD's main window options select: Extensions→Analysis→Colvars Dashboard b.Create a DBC colvar: • Click New [Ctrl-n] to start editing a new collective variable.• Delete all sample text shown in the editor on the right-hand side.• Open the Templates→colvar templates drop-down list and select DBC (ligand RMSD) to populate the editor with a template that now needs to be edited.c.Define the atom selection for the ligand atoms: (Fig. 2, red) • Delete atomNumbers 1 2 3 4 from the atoms block and leave your cursor on the now-empty line.• Select the left panel text box Editing helpers→Atoms from selection text and enter resname PHEN and noh.• Press Enter or click Insert [Enter] to insert the new selection into the configuration text at your cursor.d.Identify equivalent, symmetric atoms: • In the rmsd block, add atomPermutation 1 5 3 9 7 11 12 to the line above the atoms keyword.This indicates equivalence between ligand atoms listed in atomNumbers.That is, (5 and 3) and (9 and 7) are interchangeable.See the Colvars User Guide and Appendix C.2.2 for more details on symmetric DBC and atomPermutation.e. Define the atom selection for the binding site atoms: (Fig. 2, teal) • Delete atomNumbers 6 7 8 9 within the fittingGroup block and leave your cursor on the now-empty line.• Select the left panel text box Editing helpers→Atoms from selection text and enter alpha and same residue as within 6 of resname PHEN.• Press Enter or click Insert [Enter] to insert the new selection into the configuration text at your cursor.• VMD versions 1.9.4a57 and older only Colvars has a functionality called "auto-update selections."Please turn it off by deleting the comment line that begins with "auto-updating" in all Colvars config files.It is off by default in newer CV Dashboard versions.f.Set the reference positions for the RMSD and alignment calculations: • In the initial RMSD block, before the atoms block, delete refpositionsfile reference.pdb# PDB or XYZ file (the first highlighted line in the figure below).Leave your cursor on that line.• In the left panel under Editing helpers, select the radio button • refPositionsFile and click Pick File .

g. 3 .
Save your edits: Click the Apply [Ctrl-s] button.Impose a restraint based on the DBC coordinate a. Determine the upper wall of the DBC restraint: • In the Plots and real-time visualizations panel of the dashboard, click Histogram .If you don't see such a button, you need to upgrade your VMD installation.See Software requirements for more details.• From the histogram, estimate the 95th percentile of the bound state's DBC coordinate.Use the cumulative distribution line graph as a guide.The value doesn't need to be precise.We selected 1.5 Å. See Appendix C.2.2 for more details.• Write this value down; you will need it in the next step.b.Impose a flat-bottom harmonic potential: • Open the Biases tab on the Colvars Dashboard and click New bias [Ctrl-n] to create a new biasing potential.• Delete the default text.• From the bias templates: drop-down menu select harmonicWalls and click Insert [Enter] .• Modify the bias to match the following parameters: colvars DBC upperWalls [the DBC's 95th percentile just identified] forceConstant 200

c. 4 .
Save your edits: Click the Apply [Ctrl-s] button.Save the Colvars configuration to a file a. Click Save at the top of the dashboard b.Save your file to stepA_create_DBC/outputs/DBC_restraint.colvarsNote that if you choose to use a different file name or path you will need to update the files in the next step with the new name.
c. Calculate ∆G * site in the Jupyter Notebook: • Navigate back to the tutorial root directory.• Begin a Jupyter session and open the notebook titled SAFEP_Tutorial_Notebook.ipynb.

1 . 2 . 3 . 4 .
Structure file: common/structures/phenol_gas_phase.psf• Coordinate file: common/structures/phenol_gas_phase.pdb• NAMD configuration file: stepC_restraint_perturbation/inputs/run.namdEssential Output: • Colvars configuration file: stepC_restraint_perturbation/outputs/DBC_Restraint_RFEP.colvars• FEP trajectory file: stepC_restraint_perturbation/outputs/RFEP.dcd• Colvars output file: stepC_restraint_perturbation/outputs/RFEP.colvars.trajProcedure: Define the collective variables a. Open VMD and load the input files: • Open VMD.• Open the Tk Console.• Open the Colvars Dashboard.• [Optional] Extract the phenol from the phenol-lysozyme complex by running the following in the tkConsole.Note: We have completed this step for you.The sample files can be found in common/structures.%cd common/structures %mol load psf phenol_lysozyme.psfpdb phenol_lysozyme.pdb%set ligand [atomselect top "resname PHEN"] %cd ../../stepC_restraint_perturbation/outputs %$ligand writepsf phenol_gas_phase.psf%$ligand writepdb phenol_gas_phase.pdb• Load phenol_gas_phase.psf and phenol_gas_phase.pdbb.Define the gas-phase spherical coordinate: • In the Colvars Dashboard, click New [Ctrl-n] .• In the second line of the editor, replace the default name myColvar with COM.• Delete atomNumbers 1 2 and leave your cursor on that line.• Using the Atoms from selection text: tool in the left panel, enter resname PHEN and noh and click Insert [Enter] .• Get the geometric center of the heavy atoms by the following in the Tk Console: measure center [atomselect top "resname PHEN and noh"] • Set the atoms of group 2 to dummyAtom (x0, y0, z0) where x0, y0, and z0 are the coordinates of the geometric center of the ligand you just retrieved in the previous step.Your editor should look similar to the figure below.Note the inclusion of commas in the dummyAtom statement.• Save and close the colvar editor by clicking Apply [Ctrl-s] .c. Define the gas-phase DBC coordinate: • Click New [Ctrl-n] again.• In the second line of the editor, replace the default name myColvar with DBC.• Delete the default distance component distance{...} and leave your cursor on that line.• From the component templates dropdown menu select rmsd and click Insert [Enter] .• As before, add atomPermutation 1 5 3 9 7 11 12 to the rmsd block to define the ligand symmetry.• Delete atomNumbers 1 2 3 and leave your cursor on that line.• In the field labeled Atoms from selection text: enter resname PHEN and noh and click Insert [Enter] .• Add rotateReference off and centerReference off to the atoms block.• Replace the default refPositionsFile @ line using the • refPositionsFile radio button and the Pick file button to select phenol_gas_phase.pdb.• Save and close the colvar editor by clicking Apply [Ctrl-s] .Define the restraints a. Create the spherical restraint: • In the biases tab of the Colvars Dashboard, click New bias (Ctrl-n) , and delete the default text.• From the bias templates dropdown menu, select harmonic walls, and press Insert [Enter] .• Recall the upperWalls value you used for the DBC restraint in subsection 3.b from Step A. You will need this value in this and the next step.• Modify the bias to match the following parameters (see Appendix D): name distance_restraint colvars COM outputEnergy on upperWalls [DBC upperWalls plus 1] forceConstant 200 • Save and close the bias editor by clicking Apply [Ctrl-s] .b. Save the config file: • Click the Save button on the Colvars Dashboard.• Save the file as stepC_restraint_perturbation/outputs/DBC_restraint_RFEP.colvarsc.Create a DBC restraint that gradually releases: • We will use the provided setTI Tcl procedure.• Open stepC_restraint_perturbation/inputs/run.namd in a text editor • Find the block labeled "COLVARS" • Edit the input variables to match the following Run the TI simulation a. Enter the following in your terminal: cd stepC_restraint_perturbation/inputs namd2 +p1 run.namd> ../outputs/DBC_FreeEnergy.log 2> ../outputs/DBC_FreeEnergy.err b. [Optional] Start Step D: If you have access to more computing resources, you can continue on to Step D while the TI calculation is running.Don't forget to return to analyze these data once the simulation is complete.Analyze the output If any of these checks fails, check the Troubleshooting section of the Appendices (Appendix F). a. Visually inspect the trajectory in VMD: • Open VMD.• Load the .psf,.pdb,and .dcdfiles associated with this tutorial step.• The ligand should initially fluctuate roughly in place at the start and gradually explore the COM restraint space as the DBC restraint is released.b.Check the collective variable trajectories: • Open the Colvars Dashboard • Click Load and open the Colvars configuration file DBC_restraint_RFEP.colvars• Select both the COM and DBC restraints • Click Timeline plot • Both coordinates should start low and gradually increase.The COM restraint should plateau near the value of the restraint's upperWalls as shown in the figure below:

Figure 4 .
Figure 4. Screenshot of Colvars Dashboard output showing COM (red) and DBC (black) collective variables as a function of frame number in the decoupling trajectory from subsection 4.b in step C. c. Calculate ∆G DBC in the Jupyter Notebook:

3 .
a. Open VMD and load the psf and pdb files specified in "Required Input."b.Set and write beta values: • Open the Tk Console • Ensure that your Tk Console is in the correct directory: cd stepD_alchemy_bulk/outputs • Set the beta value of all atoms to 0: [atomselect top all] set beta 0 • Set the beta values of the ligand atoms to -1 for decoupling: [atomselect top "resname PHEN"] set beta -1 • Save as a pdb file: [atomselect top all] writepdb alchemy_bulk.pdbRun the ligand decoupling simulation in bulk solvent cd stepD_alchemy_bulk/inputs namd2 +p4 run.namd> ../outputs/alchemy_bulk.log 2> ../outputs/alchemy_bulk.err 4. Analyze the output a. Visually inspect the trajectory in VMD: • Open VMD.• Load the .psf,.pdb,and .dcdfiles associated with your simulation.• The ligand should diffuse normally at the start of the simulation but behave more and more like a gas-phase molecule.b.Calculate ∆G * bulk in the Jupyter Notebook • Open the Jupyter Notebook as in subsection 3.c from Step B.

Figure 6 .
Figure 6.Example results from protein-phenol decoupling calculation.A) Cumulative change in free energy with accumulated error.B) Per-window difference in free energy (∆G λ ) calculated by the BAR estimator (blue), and exponential estimators for forward (orange) and backward (green) samples.C) Free energy hysteresis (δ λ,i ).D) Probability density function of δ λ,i estimated by kernel density estimation (KDE).Error bars indicate uncertainty reported by alchemlyb[20].

Figure 9 .
Figure 9. Free energy change over the course of a TI calculation.A) Restraint free energy (∆G λ ).B) Derivative of the free energy with respect to the coupling parameter, λ.Error bars indicate standard deviation.

Figure 10 .
Figure 10.Screenshot from the Colvars Dashboard showing a bimodal distribution that resulted from using an asymmetric DBC for phenol.The second peak corresponds to a 180 degree rotation about the C-O axis.

□
Only the DBC restraint should be active during FEP (Step B) □ DBC restraint upper walls have the intended value.(3.b) □ DBC restraint force constant is appropriate (100 or 200).(3.b) □ NO lower walls (3.b) □ If the Colvars configuration file contains a "width" keyword, it should be 1.See [28] and the Colvars user guide for more details.(3.b)

Navigate to the cloned repo
This is the starting path for all command line prompts in this tutorial.

forget to return to analyze these data once the simulation is complete.
See the in-line comments in that file and Appendix B for a detailed description of the settings relevant to running FEP in namd.a. Run the decoupling FEP: Enter the following in your terminal window: cd stepB_alchemy_site/inputs/ namd2 +p4 run.namd> ../outputs/alchemy_site.log 2> ../outputs/alchemy_site.err b. [Optional] Start Step C: If you have access to more compute resources, you can continue on to Step C while this FEP calculation is running.Don't Structure file: common/structures/phenol_water.psf• Coordinate file: common/structures/phenol_water.pdb• NAMD configuration file: stepD_alchemy_bulk/inputs/run.namd [21]lete any unfinished analyses in previous steps (i.e.steps B.3, C.4, and D.4).Some issues are only obvious when visualizing the trajectories.Don't skip those steps!2.Open the Jupyter notebook and navigate to the section labeled "Volumetric Restraint Contribution."3.Run the section to calculate the volumetric free energy contribution.See Appendix C.3 for a more detailed explanation.Note: At this point you will either need to have completed all simulations or use the sample data provided.To use the sample data, change the path variables (bound_fep_path, restraint_perturbation_path, and bulk_fep_path) to use the files in their respective ./sample_outputsdirectories.4.Calculate the overall ∆G • bind and compute a titration curve by running the cells in the section "Binding Free Energy." 5. Compare your final ∆G • bind to the literature value, reported as -5.44 kcal/mol[21].6.Compare your titration curve to Figure5below.

Table 1 .
Symbols used in this tutorial.
Equation 22only holds if the change in excess chemical potential is negligible between L ′ and the simulation concentration L. This assumption can be tested by running bulk decoupling (Step D) at both L ′ and L and checking that the resulting change in ∆G bulk is much smaller than the overall error.If we wish to calculate Pocc over a wider concentration range in which this assumption does not hold, we would need to explicitly calculate ∆G * bulk (L) for multiple simulation concentrations L and extrapolate to the intermediate concentrations, as in Ref. 11.