Abstract
The refinement of biomolecular crystallographic models relies on geometric restraints to help address the paucity of experimental data typical in these experiments. Limitations in these restraints can degrade the quality of the resulting atomic models. Here we present an integration of the full all-atom Amber molecular dynamics force field into Phenix crystallographic refinement, which enables a more complete modeling of biomolecular chemistry. The advantages of the force field include a carefully derived set of torsion angle potentials, an extensive and flexible set of atom types, Lennard-Jones treatment of non-bonded interactions and a full treatment of crystalline electrostatics. The new combined method was tested against conventional geometry restraints for over twenty-two thousand protein structures. Structures refined with the new method show substantially improved model quality. On average, Ramachandran and rotamer scores are somewhat better; clash scores and MolProbity scores are significantly improved; and the modelling of electrostatics leads to structures that exhibit more, and more correct, hydrogen bonds than those refined with traditional geometry restraints. We find in general that model improvements are greatest at lower resolutions, prompting plans to add the Amber target function to real-space refinement for use in electron cryo-microscopy. This work opens the door to the future development of more advanced applications such as Amber-based ensemble refinement, quantum mechanical representation of active sites and improved geometric restraints for simulated annealing.
Synopsis The full Amber force field has been integrated into Phenix as an alternative refinement target. With a slight loss in speed, it achieves improved stereochemistry, fewer steric clashes and better hydrogen bonds.
1. Introduction
Accurate structural knowledge lies at the heart of our understanding of the biomolecular function and interactions of proteins and nucleic acids. With close to 90% of structures in the Protein Data Bank (Berman et al., 2000) solved via x-ray diffraction methods, crystallography is currently the pre-eminent method for determining biomolecular structure. Crystal structure refinement is a computational technique that plays a key role in post-experiment data interpretation. Refinement of atomic coordinates entails solving an optimization problem to minimize the residual difference between the experimental and model structure factor amplitudes (Jack & Levitt, 1978; Agarwal, 1978; Murshudov et al., 1997). However, due to inherent experimental limitations and a typically low data to parameter ratio, the employment of additional restraints, commonly referred to as geometry or steric restraints, is key to successful structural refinement (Waser, 1963). These restraints, which can be thought of as a prior in the Bayesian sense, provide additional observations in the optimization target and reduce the danger of overfitting. Their use leads to higher quality, more chemically accurate models.
Most current refinement programs (Afonine et al., 2012; Murshudov et al., 2011; Sheldrick, 2008; Bricogne et al., 2011) employ a set of covalent-geometry restraints first proposed by Engh & Huber in 1991 and later augmented and improved in 2001 (Engh & Huber, 1991, 2001). This set of restraints is based on a survey of accurate high-resolution small molecule crystal structures from the Cambridge Structural Database (Groom et al., 2016) and includes restraints on interatomic bond lengths, bond angles and ω torsion angles. In addition, parameters are added to enforce proper chirality and planarity; multiple-minimum targets for backbone and side chain torsion angles; and repulsive terms to prevent steric overlap between atoms. Those terms are defined from small-molecule and high-resolution macromolecular crystal structure data and from interaction-specified van der Waals radii. They are very similar but not identical between refinement programs.
The Engh & Huber restraints function reasonably well, while the additional terms have been gradually improved, but a number of limitations have been identified over the years. Some of these limitations include: a lack of adjustability to differences in local conformation, protonation, and hydrogen bonding and to their changes during refinement; incomplete or inaccurate atom types and parameters for ligands, carbohydrates, and covalent modifications; use only of repulsive and not attractive steric terms; omission of explicit hydrogen atoms and their interactions; misleading targets resulting from experimental averaging artifacts; inaccurate dihedral restraints; and lack of awareness of electrostatic and quantum dispersive interactions with a consequent lack of accounting for hydrogen bonding cooperativity (Priestle, 2003; Touw & Vriend, 2010; Davis et al., 2003; Moriarty et al., 2014; Tronrud et al., 2010).
Phenix (Adams et al., 2010) includes a built-in system for defining ligand parameters (Moriarty et al., 2009) that by default restrains the explicit hydrogen atoms at electron-cloud-center positions for X-ray and optionally at nuclear positions for neutron crystallography (Williams, Headd et al., 2018). Addition of the Conformation Dependent Library (CDL) (Moriarty et al., 2014), which makes backbone bond lengths and angles dependent on ϕ,ψ values, has improved the models obtained from refinement at all resolutions, and thus is the default in Phenix refinement (Moriarty et al., 2016). Similarly, Phenix uses ribose-pucker and base-type dependent torsional restraints for RNA (Jain et al., 2015). For bond lengths and angles, protein side chains continue to use standard Engh & Huber restraints while RNA/DNA use early values (Gelbin et al., 1996; Parkinson et al., 1996) with a few modifications. This use of combined restraints is here designated CDL/E&H.
An alternative approach is the use of geometry restraints based on all-atom force fields used for molecular dynamics studies. This is not a novel idea. In fact, some of the earliest implementations of refinement programs employed molecular mechanics force fields (Jack & Levitt, 1978; Brünger et al., 1987, 1989). However, at the time, restraints derived from coordinates of ideal fragments (Tronrud et al., 1987; Hendrickson & Konnert, 1980) were found to provide better refinement results. The insufficiency of molecular mechanics-based restraints was mainly attributed to two factors: inaccurate representation of chemical space because of too few atom types, and biases in conformational sampling resulting from unshielded electrostatic interactions. Subsequently, however, the methods of molecular dynamics and corresponding force fields have seen significant development and improvement. Current force fields contain more atom types and are easily adjustable as needed. They are typically parameterized against accurate quantum mechanical calculations, not feasible just a few years ago, as well as using more representative experimental results. Significant methodological advances, such as the development of Particle Mesh Ewald (York et al., 1993; Darden et al., 1993) for accurate calculation of crystalline electrostatics and improved temperature and pressure control algorithms, have greatly increased accuracy. Modern force fields have been shown to agree well with experimental data (Zagrovic et al., 2008; van Gunsteren et al., 2008; Showalter & Brüschweiler, 2007; Grindon et al., 2004; Bowman et al., 2011), including crystal diffraction data (Cerutti et al., 2009; Janowski et al., 2013; Cerutti et al., 2008; Liu et al., 2015; Janowski et al., 2015).
We have made it possible to use of the Amber molecular mechanics force field as an alternative source of geometry restraints to those of CDL/E&H. Here we present an integration of the Phenix software package for crystallographic refinement,phenix.refine (Afonine et al., 2012) and the Amber software package (Case et al., 2018) for molecular dynamics. We present results of paired refinements for 22,544 structures and compare Amber to traditional refinement in terms of model quality, chemical accuracy and agreement with experimental data, studied both for overall statistics and for representative individual examples. We also describe the implementation and discuss future directions.
2. Methods
2.1. Code preparation
The integration of the Amber code into phenix.refine uses a thin client. Amber provides a python API to its sander module, so that a simple “import sander” python command allows Phenix to obtain Amber energies and forces through a method call. At each step of coordinate refinement, Phenix expands the asymmetric unit coordinates to a full unit cell (as required by sander), combines energy gradients returned from Amber (in place of those from its internal geometric restraint routines) with gradients from the X-ray target function, and uses these forces to update the coordinates, either by minimization or by simulated annealing molecular dynamics. Alternate conformers take advantage of the “locally-enchanced-sampling” (LES) facility in sander: atoms in single-conformer regions interact with multiple-copy regions via the average energy of interaction, while different copies of the same group do not interact among themselves (Roitberg & Elber, 1991; Simmerling et al., 1998).
The Amber files required are created by a preliminary AmberPrep program that takes a PDB file as input. It creates both a parameter-topology (prmtop) file used by Amber and a new PDB file containing a complete set of atoms (including hydrogens and any missing atoms) needed to do force field calculations. Alternate conformers, if present in the input PDB file, are translated into sander LES format. For most situations, AmberPrep does not require the user to have any experience with Amber or with molecular mechanics; less-common situations (described below) require some familiarity with Amber. All the code required for both the AmberPrep and phenix.refine steps is included in the current major release, 1.16-3549 and subsequent nightly builds of Phenix.
2.2. Structure selection and overall refinement protocol
To compare refinements using Amber against traditional refinements with CDL/E&H restraints, structures were selected from the Protein Data Bank (Burley et al., 2019) using the following criteria. Entries must have untwinned experimental data available that are at least 90% complete. Each entry’s Rfree was limited to a maximum of 35%, Rwork to 30% and the ΔR (Rfree-Rwork) to a minimum of 1.5%. The lowest resolution was set at 3.5Å. Entries containing nucleic acids were excluded.
Coordinate and experimental data files were obtained directly from the Protein Data Bank (PDB) and inputs prepared via the automated AmberPrep program (see section 2.1 above). Entries containing complex ligands were included if the file preparation program AmberPrep was able to automatically generate and include the ligand geometry data. Details of the internals of AmberPrep will be described elsewhere. Resolution bins (set at 0.1Å) with less than 10 refinement pairs were eliminated to reduce noise caused by limited statistics. Complete graphs are included in the supplemental material. The resulting 22,000+ structures had experimental data resolutions between 0.5Å and 3.2Å, with most of the structures in the 1.0-3.0 Å range (see figure 1).
Each model was then subjected to 10 macrocyles of refinement using the default strategy in phenix.refine for reciprocal space coordinate refinement, with the exception that real space refinement was turned off. By default, the first macrocycle uses a least-squares target function and the rest use maximum likelihood. Other options included optimization of the weight between the experimental data and the geometry restraints. This protocol was performed in parallel, once using CDL/E&H and once using Amber geometry restraints. In addition, Cβ pseudo-torsion restraints were not included in the restraints model. Only one copy of each alternate conformation was considered initially (i.e. alternative location A). The quality of the resulting models was assessed numerically using MolProbity (Williams, Headd et al., 2018) available in Phenix (Adams et al., 2010), by cpptraj (Roe & Cheatham, 2013) available in AmberTools (Case et al., 2018) and by visual inspection with electron density and validation markup in KiNG (Chen et al., 2009). All-atom dots for figure 10 were counted in Mage (Richardson & Richardson, 2001) and figures 5–9 were made in KiNG. To avoid typographical ambiguity, PDB codes are given here with lower case for all letters except L (e.g., 1nLs).
2.3. Weight factor details
The target function optimized in phenix.refine reciprocal space atomic coordinate refinement is of the general form: where all the terms are functions of the atomic coordinates, Txyz is the target residual to be minimized, Texp is a residual between the observed and model structure factors and quantifies agreement with experimental data, Txyz_restraints is the residual of agreement with the geometry restraints and w is a scale factor that modulates the relative weight between the experimental and the geometry restraint terms. In traditional refinement Txyz_restraints is calculated using the set of CDL/E&H restraints:
To implement Phenix-Amber we substitute this term with the potential energy calculated using the Amber force field: where the Amber term is intentionally represented now by an E to emphasize that we directly incorporate the full potential energy function calculated in Amber using the ff14SB (Maier et al., 2015) force field.
In a standard default Phenix refinement, the weight, w, is a combination of a value based on the ratio of gradient norms (Brünger et al., 1989; Adams et al., 1997) and a scaling factor that defaults to ½. This initial weight can be optimized using a procedure described previously (Afonine et al., 2011). This procedure uses the results of ten refinements with a selection of weights, considering the bond and angle rmsd, the R-factors and validation statistics to determine the best weight for the specific refinement at each of the ten macrocycles. The same procedure was used to estimate an optimal weight for the Phenix-Amber refinements. (If faster fixed-weight refinements are desired, we have found that a scaling factor of 0.2, rather than 0.5, scales the Amber gradients to be close to those from the CDL/E&H restraints, allowing the simpler, default, weighting scheme in phenix.refine to be used.)
3. Results
3.1. Full-dataset score comparisons
On average, the Phenix-Amber combination produced slightly higher R-work and R-free (figure 2) but higher quality models (figure 3). The increase in R-factors is most pronounced in the 1.5–2.5Å range. This is a result of the weight optimisation procedure having different limits for optimal weight in this resolution range. The increase was less for R-free than R-work such that the R-delta is less for refinements using Amber gradients. The Phenix-Amber refinements exhibited improved (lower) MolProbity scores and contained fewer clashes between atoms. Plots show the mean of the values in the 0.1Å resolution bin as well as the 95% confidence level of the standard error of the mean (SEM). MolProbity clashscores are particularly striking: for refinement using CDL/E&H restraints, clashscores steadily increase as resolution worsens, often resulting in very high numbers of steric clashes. On the other hand, the mean clash-score with Amber restraints appears to be nearly independent of resolution and remains consistent at about 2.5 clashes per 1000 atoms across all resolution bins. The SEM range is non-overlapping for worse than 1Å indicating that the Amber force field is producing better geometries at mid to low resolution. There are more favored Ramachandran points (backbone ϕ,ψ) and fewer Ramachandran outliers for the Phenix-Amber refinements. This difference is most marked for resolutions worse than 2Å. Phenix-Amber refinement also improves (lowers) the number of rotamer outliers but doesn’t differentiate via the SEM, and increases the proportion of hydrogen bonds. While the rotamer outlier results remain similar, the hydrogen bonding results have a large difference at worse than 1.5Å resulting in nearly double the bonds near 3Å. Common to all the plots is a change near 1.5Å, where the weight optimisation procedure common to both CDL/E&H and Amber refinement loosens the weight on geometry restraints somewhat, to allow more deviations at resolutions where the data is capable of unambiguously showing them. Bond and angle rmsd comparison are less pertinent as the force fields do not have ideal values for parameterisations and comparing the Phenix-Amber bonds and angles to the CDL/E&H values is not a universal metric. The curious can see the plots in figure S1. Overall, improvement with Amber is substantial in the lower resolution refinements.
Models refined with Phenix-Amber are more likely to exhibit electrostatic interactions such as H-bonds and salt links, as well as better van der Waals contacts. Though the resulting atom movements are generally small, these changes can be meaningful, especially when interpreting H-bonding networks or interaction distances at active sites.
One validation metric that is worse for Phenix-Amber refinements is the number of outliers of the Cβ positions. Both the mean and the SEM show clear differentiation. The Cβ deviation gives a combined measure of distortion in the tetrahedron around the Cα atom and with traditional E&H restraints it is quite robustly sensitive to incompatibility between how the backbone and side chain conformations have been modelled (Lovell et al., 2003). For CDL/E&H refinements, however, the percentage of Cβd outliers (>0.25Å) is negligible for low and mid resolutions, only increasing to 0.2% at higher resolutions (see figure 4). This is in line with the CDL/E&H providing tight geometrical restraints out to Cβ at most resolutions, but loosened somewhat at better than 1.5Å resolution where there is enough experimental information to move an angle away from ideal. Note that explicit Cβ restraints were turned off for all Phenix refinements and that the Amber force field does not have an explicit Cβ term; however, if all angles around the Cα are kept ideal then the Cβ position will also be ideal even if it is incorrectly positioned in the structure. The following section analyses specific local examples where output structures show differences for either the positive or the negative trends seen in the overall comparisons, in order to understand their nature, causes and meaning across resolution ranges.
3.2. Examination of individual examples
As noted above, in comparison with the CDL/E&H restraint refinements, the Phenix-Amber refinements have much higher percentages of Cβ deviation outliers, increasing at the low-resolution end to more than 1% of Cβ atoms. Amber refinement also has more bond length and angle outliers. The following examines a sample of cases at high, mid and lower resolutions to understand the starting-model characteristics and refinement behavior that produce these differences.
3.2.1. High resolution: waters, alternates, Cβd outliers and atoms in the wrong peak
In the high-resolution range (better than 1.7Å), it appears that the commonest problems not easily correctable by refinement are caused either by modeling the wrong atom into a density peak or by incorrect modeling, labeling, or truncation of alternate conformations. Such problems are usually flagged in validation either by all-atom clashes, by Cβ deviations and sometimes by bad bond lengths and angles.
Figure 5a shows a case where a water molecule had been modeled in an electron density peak that should really be a nitrogen atom of the Arg guanidinium. CDL/E&H refinement (figure 5b) corrected the bad geometry at the cost of moving the guanidinium even further out of density; Amber refinement changed the guanidinium orientation but made no overall improvement (figure 5c); all three versions have a bad clash. If the water were deleted, then either refinement method would undoubtedly do an excellent job (figure 5d). This type of problem is absent at low resolution where waters are not modeled but occurs quite often at both high and mid resolution, for other branched side chains, for Ile Cδ (for example, 3js8 195) and even occasionally for Trp (e.g. 1qw9 B170).
Cβ deviation outliers (≥0.25Å) are often produced by side chain alternates with quite different Cβ positions but no associated alternates defined along the backbone. Since the tetrahedron around Cα should be nearly ideal, that treatment almost guarantees bad geometry. The rather simple solution, implemented in Phenix, is to define alternates for all atoms until the i+1 and i-1 Cα atoms – as in the “backrub” motion; (Davis et al., 2006). PDB codes 1dy5, 1gwe and 1nLs each have a number of such cases. Figure 6a,b show 1nLs Ser 215, initially with an outlier Cβd, 0.49Å distance between the two Cβ atoms and a single Cα. CDL/E&H refinement pulls the Cβ atoms to be only 0.23Å apart, avoiding a Cβd with only slightly worse fit to the density; Amber reduces the Cβd only slightly, but it does keep this flag of an underlying problem. When alternates are defined for the backbone peptides, both systems improve.
Worse cases occur where one or both alternates have been fit incorrectly as well as not being expanded along the backbone appropriately. Figure 6c shows Thr 196, with a huge Cβd of 0.88Å (not shown) and very poor geometry, because altB was fit incorrectly (just as a shift of altA rather than as a new rotamer). This time even CDL/E&H refinement produces a Cβd outlier, but smaller than for Amber. Figure 6d shows the excellent Amber result after the misfit of altB was approximately corrected.
3.2.2. Mid resolution: backward side chains and rare conformations
An even commoner case at both high and mid resolutions where the wrong atom is fit into a density peak is a backward-fit Cβ-branched residue, well illustrated by a very clear Thr example in 1bkr at 1.1Å (figure 7a). Thr 101 is a rotamer outlier (gold) on a regular α-helix with a Cβd of 0.63Å. The deposited Thr 101 also has a bond-angle deviation of 13.5σ; clashes at the Cγ methyl; its Cβ is out of density; Oγ is in the lower peak; and Cγ is in the higher peak. It is shown in figure 7 with 1.6σ and 4σ 2mFo-DFc contours (but without Cβ deviation and angle markups for clarity). This mistake was not obvious because anisotropic B’s were used too early in the modeling resulting in the Thr Cβ being refined to a 6:1 aniso-axis ratio that covered both the modeled atom and the real position. The figures show the density as calculated with isotropic B factors.
Given this difficult problem for automated refinement, each of the two target functions reacts very differently. Both refinements still have the Cγ methyl clashing with a helix backbone CO in good density, very diagnostic of a problem with the Cγ. It is indeed the wrong atom to have in that peak, as shown also by the relative peak heights. The CDL/E&H refinement (figure 7b) achieves tight geometry and a good rotamer, moving the Cβ into its correct density peak, but pays the price for not correcting the underlying problem by swinging the Oγ out of density. The Amber refinement (figure 7c) achieves an atom in each of the three side chain density peaks, but pays the price for not correcting the underlying problem by having the wrong chirality at the Cβ atom. It still also has bond-angle outliers, which may be a sign of unconverged refinement.
The original PDB entry, the CDL/E&H refinement and the Amber refinement structures for Thr 101 are all very badly wrong, but each in an entirely different way. The deposited model, 1bkr, looks very poor by traditional model validation, but has a misleadingly good density correlation, given the extremely anisotropic Cβ B-factor. The CDL/E&H output looks extremely good on traditional validation except for the clashes and would show a lowered but still reasonable density correlation; however, it is the most obviously wrong upon manual inspection. The Amber output has clashes and currently has modest bond-angle outliers, but it fits the density very closely making it difficult to identify as incorrect by visual inspection. The problem could be recognized automatically by a simple chirality check. Shown in figure 7d, Thr 101 was rebuilt quickly in KiNG, with the p rotamer and a small backrub motion. Either Phenix-CDL/E&H or Phenix-Amber refinement would do a very good job from such a rough refit with the correct atoms near the right places.
At mid resolution, there are also other rotamers and backbone conformations fit into the wrong local minimum and thus difficult to correct by minimization refinement methods, but not always flagged by Cβ deviations or other outliers. Some of these, such as cis-nonPro peptides (Williams, Videau et al., 2018) or very rare rotamers (Hintze et al., 2016) can be avoided by considering their highly unfavorable prior probabilities. Others would require explicit sampling of the multiple minima.
3.2.3. Lower resolution: peptide orientations with CaBLAM and Cβd outliers
At low resolution (2.5–4Å), no waters or alternates are modeled. All other problems continue, but an additional set of common local misfittings occur because the broad electron density is compatible with significantly different models. 1xgo at 3.5Å is an excellent case for testing in this range, because it was solved independently from the 1.75Å 1xgs structure – the same molecule in a different space group. CDL/E&H refinement shows no Cβd outliers, but Amber refinement has six. Comparison with 1xgs shows that each of the Cβd residues has either the side chain or the backbone or both in an incorrect local-minimum conformation uncorrectable by minimization refinement methods (Richardson & Richardson, 2018). For example, figure 8 shows Leu 253 on a helix, with a Cβd from Amber (panel c) and the different, correct 1xgs Leu rotamer in panel d. Those Cβd outliers are thus a feature, not a bug, in Amber: they serve their designed validation function of flagging genuine fitting problems. However, the lack of Cβd outliers in the CDL/E&H refinement is also not a defect, because the tight CDL/E&H geometry is on average quite useful at low resolution.
The 1xgo-vs-1xgs comparison also illustrates many of the ways in which Amber refinement is superior at low resolution. In figure 8, Amber corrects a Ramachandran outlier in the helix and shows a helix backbone shape much closer to the ideal geometry of 1xgs than either the deposited or the CDL/E&H versions.
Since the backbone CO direction cannot be seen at low resolution, the commonest local misfitting is a misoriented peptide (Richardson et al., 2018). Those can be flagged by the new MolProbity validation called CaBLAM, which tests whether adjacent CO directions are compatible with the local Cα backbone conformation (Williams, Headd et al., 2018). Ten such cases were identified in 1xgo, for isolated single or double CaBLAM outliers surrounded by correct structure as judged in1xgs. For six of those 10 cases, neither CDL/E&H nor Amber refinement corrected the problem: His62, Thr70, Gly163, Gly193, Ala217, Glu286 (see stereo figure S2). In two cases CDL/E&H had fewer other outliers than Amber, but did not actually reorient the CO: for Gly193 and for the Gly163 case shown in figure S3. In three of the 10 cases Amber did a complete fix, while CDL/E&H did not improve (Asp88, Gly125, Pro266). For example, in figure 9, 1xgo residues 86-91 (panel a) have a CaBLAM outlier (magenta lines), uncorrected by CDL/E&H refinement (panel b). But Amber refinement (panel c) manages to shift several CO orientations by modest amounts (red balls), enough to fix the CaBLAM outliers and match extremely closely the better backbone conformation of 1xgs (panel d). The Gly 125 example is shown in figure S4. Finally, in one especially interesting case (Lys22) Amber turned the CO about halfway up to where it should be, while CDL/E&H made no improvement. The Amber model still has geometry outliers and further runs moved most of the way up and removed those outliers, showing that Amber refinement had not yet fully converged in 10 macrocycles (see Supplement text and figure S5).
Amber refinement is especially good at optimizing hydrogen-aware all-atom sterics, as calculated by the Probe program (Word, Lovell, LaBean et al., 1999) with H atoms added and optimized by Reduce (Word, Lovell, Richardson et al., 1999). This is illustrated in figure 10 for 3g8L at 2.5Å resolution. The deposited structure of the Asn 182 helix N-cap region, which has many outliers of all kinds (panel a), is improved a great deal by CDL/E&H refinement (panel b). However, the Amber refinement (panel c) is noticeably better, with more H-bonds and better van der Waals contacts as well as fewer clashes. These improvements are plotted quantitatively in figure 11, as measured by a dramatic drop in unfavorable clash spikes (red) and small overlaps (yellow), with a dramatic increase in favorable H-bonds (green) and van der Waals contacts (blue).
4. Discussion
The idea of including molecular mechanics force fields into crystallographic refinements is not a new one, with precedents dating back to early work by (Jack & Levitt, 1978) and the XPLOR program (Brünger & Karplus, 1991) developed in the 1980’s. The notion that a force field could (at least in principle) encode “prior knowledge” about protein structure continues to have a strong appeal and efforts to replace conventional “geometric restraints”, which are very local and uncorrelated, with a more global assessment of structural quality have been explored repeatedly (e. g., Moulinier et al., 2003; Schnieders et al., 2009). Distinguishing features of the current implementation include automatic preparation of force fields for many types of biomolecules, ligands and solvent components as well as close integration with Phenix, a mature and widely used platform for refinement. This has enabled parallel refinements on more than 22,000 protein entries in the PDB and allows crystallographers to test these ideas on their own systems by simply adding flags to an existing phenix.refine command line or adding the same information via the Phenix GUI. Indeed, we expect most users to “turn on” Amber restraints after having carried out a more conventional refinement to judge for themselves the significance and correctness of structural differences that arise. As noted in Section 3.2, an Amber refinement will often flag residues that need manual refitting in ways complementary to the cues provided by more conventional refinement.
The results presented here show that structures with improved local quality (as monitored by MolProbity criteria and hydrogen bond analysis) can be obtained by simple energy minimization, with minimal degradation in agreement with experimental structure factors and with no changes to a current-generation protein force field. Nevertheless, one should keep in mind that the Amber-refined structures obtained here are not very different from those found with more conventional refinement. Both methods require that most local misfittings to be corrected in advance. The hope is that either sampling of explicit alternatives or else optimization using more aggressive conformational search, such as with simulated annealing or torsion-angle dynamics, may find the correct low-energy structures with good agreement with experimental data.
It is likely that further exploration of relative weights between “X-ray” and “energy” terms (beyond the existing and heuristic weight-optimization procedure employed here) and even within the energy terms, will become important. In principle, maximizing the joint probability arising from “prior knowledge” (using a Bolztmann distribution, exp(-EAmberFF/kBT) for some effective temperature) and a maximum likelihood target function (based on a given model and the observed data) is an attractive approach that effectively establishes an appropriate relative weighting. More study will be needed to see how well this works in practice, especially in light of the inevitable limitations of current force fields.
The integration of Amber’s force field into the Phenix software for crystallography also paves the way for the development of more sophisticated applications. The force field can accommodate alternate conformers by using the locally enhanced sampling (LES) approach (Roitberg & Elber, 1991; Simmerling et al., 1998); a few examples are discussed here whilst details will be presented elsewhere. Ensemble refinement (Burnley et al., 2012) could now be performed using a full molecular dynamics force field, thus avoiding poor quality individual models in the ensemble. Similarly, simulated annealing could now be performed with an improved physics-based potential. Extension of the ideas presented to real-space refinement within Phenix is underway, opening a path to new applications to cryo-EM and low-resolution X-ray structures. These developments would all contribute significantly to the future of macromolecular crystallography, reinforcing the transition from a single static-structure-dominated view of crystals to one where dynamics and structural ensembles play a central important role in describing molecular function (Furnham et al., 2006; van den Bedem & Fraser, 2015; Wall et al., 2014).
5. Conclusions
We have presented refinement results obtained by integrating Phenix with the Amber software package for molecular dynamics. Our refinements of over 22,000 crystal structures show that refinement using Amber’s all atom molecular mechanics force field outperforms CDL/E&H restraint refinement in many respects. An overwhelming majority of Amber-refined models display notably improved model quality. The improvement is seen across most indicators of model quality including clashes between atoms, side chain rotamers and peptide backbone torsion angles. In particular, Phenix-Amber consistently outperforms standard Phenix refinement in clashscore, number of hydrogen bonds and MolProbity score. It also consistently outperforms standard refinement for Ramachandran and rotamer statistics at low resolutions and obtains approximately equal results at high (better than 2.0Å) resolutions. Amber does run somewhat more slowly (generally 20-40% longer) and may take more cycles to converge completely if it is making any large local changes (see text for supplementary figure S5). It should be noted that standard refinement consistently outperforms Phenix-Amber in eliminating Cβ deviation and other covalent-geometry outliers across all resolutions, but in many cases the Amber outliers serve to flag a real problem in the model.
As the quality of experimental data decreases with resolution, the improvement in model quality obtained by using Amber, as opposed to CDL/E&H restraints, increases. This improvement is especially striking in the case of clashscores, which appear to be nearly independent of experimental data resolution for Amber refinements. Additional improvement is seen in the modelling of electrostatic interactions, H-bonds and van der Waals contacts, which are currently ignored by conventional restraints. Improving lower-resolution structures is very important, since they include a large fraction of the most exciting and biologically important current structures such as the protein/nucleic acid complexes of big, dynamic molecular machines.
No minimization refinement method, including CDL/E&H and Amber, can in general correct local misfittings that were modeled in an incorrect local-minimum conformation, especially at relatively high resolutions. At lower resolution where the barriers are softer, Amber sometimes can manage such a change, while CDL/E&H still does not. It is, therefore, important and highly recommended that validation flags be consulted for the initial model and as many as feasible of the worst cases be fixed, before starting the cycles of automated refinement with either target.
Software distribution
Amber was implemented in phenix.refine and is available in the 1.16-3549 version of Phenix and later. Instructions for using the phenix.refine Amber implementation are available in the version-specific documentation available with the distribution.
Supporting information
S1. Full-dataset comparisons
Bond and angle rmsd comparisons (see figure S1) show that the bond rmsd values are numerically different but are smaller than the average sigma of 0.02Å (2pm) applied to protein bond restraints. Furthemore the Amber angle rmsd values are approximately 2° across all resolutions – also lower than the average of ~3° applied to protein angle restraints. The increased CDL/E&H rmsd values at high resolution may be result of the looser rmsd limit used past 1.5Å for the weight optimisation process. Comparing the means of the CDL/E&H and Amber rmsd values is not valid as force fields use more complex energetics rather than harmonic targets to ideal values.
S2. Response to Bad Peptide Orientations
S2.1. Background
The low-resolution analysis of Cβ deviations in the main text made use of comparing the 1xgo structure at 3.5Å (Tahirov 1998) versus 1xgs at 1.75Å from the same paper. All six Cβ deviations in the Amber results versus none from CDL/E&H were compared, finding that in each case that Cβd was flagging an underlying problem: either a misfit side chain or an incompatibility between backbone and side chain.
For the issue of bad peptide orientations, however, only one example was illustrated (Figure 9). These problems are common at resolutions worse than 2.5Å, because the backbone CO direction is no longer seen (Richardson et al., 2018). Misoriented peptides are best diagnosed by CaBLAM (Williams 2018). CaBLAM uses virtual dihedral angles of successive Cαs and of successive COs to test whether the orientations of successive CO groups are compatible with the surrounding Cα trace. It flags outliers graphically in magenta on the CO-CO virtual dihedral. Since typically there is an energy barrier between widely different peptide orientations, the presumption is that refinement cannot easily correct these cases. However, that presumption needs to be tested.
S2.2. Most are not correctable by refinement
Ten cases were identified in 1xgo, for isolated single or double CaBLAM outliers (usually with other outliers also), surrounded by correct structure as judged in the same molecule at 1.75Å resolution (1xgs). For 6 of those 10 cases, neither CDL/E&H nor Amber refinement corrected the problem (His62, Thr70, Gly163, Gly193, Ala217, Glu286).
For example, figure S2 shows stereo images of the Glu286-Lys287 hairpin-loop case, where the CaBLAM outlier in 1xgo is accompanied by clashes, Ramachandran and rotamer outliers. Both CDL/E&H and Amber conformations are essentially identical to the original 1xgo, with no peptide improvement. They both remove all the clashes (clusters of hotpink spikes) and remove one of the six side chain outliers (gold) but not into the correct rotamer. In contrast, the high-resolution 1xgs, with very clear electron density (bottom panel), shows the Lys Cα and the two peptide carbonyl oxygens (red balls) differently placed by large distances and dihedral angles, forming a well H-bonded β-hairpin with no outliers of any kind.
S2.3. Other Outliers Often Better
In two cases the CDL/E&H results had fewer other outliers than Amber, although it did not actually reorient the peptide CO (Gly163, Gly193). The Gly163 case is shown in stereo in figure S3, for an S-shaped loop between non-adjacent β-strands, with two CaBLAM flags (magenta) and many other outliers. Both refinements remove the clashes, one of the rotamer outliers and one of the Ramachandran outliers (green). The CDL/E&H results in addition removed one of the CaBLAM outliers and the Cα-geometry outlier (red). However, neither refinement could manage the large rotation needed to correct the 163-164 peptide orientation, as judged by the more convincing conformation of the high-resolution 1xgs at bottom.
S2.4. Amber Sometimes Corrects Well
In three cases Amber managed a complete fix, while in contrast CDL/E&H did not improve (Asp88, Gly125, Pro266). The Asp88-Gly89 tight turn example is shown in Figure 9 of the main text.
Here in figure S4, the Gly125 loop example in a helix-helix connection is shown in stereo, to allow clear visualization of the CO orientation changes. 1xgo residues 121-126 (figure S3a) have two CaBLAM outliers (magenta dihedral lines) unchanged by CDL/E&H refinement (panel b). However, Amber refinement (panel c) manages to shift several CO orientations by up to 80° (red balls), enough to fix the CaBLAM outliers and to match extremely closely the better backbone conformation of 1xgs (panel d).
S2.5. A Partial Correction, Unconverged
Finally, in one especially interesting case (Lys22, in Figure S5a for 1xgo) Amber turned the CO (red circles) about halfway up to where it should be (panels b vs c), while CDL/E&H made no improvement to the peptide. The Amber model eliminated the Ramachandran and one of the CaBLAM outliers, but still had geometry outliers (a bond angle and a Cβ deviation). It seemed likely that Amber refinement had not fully converged and might move the CO all the way if run longer.
A 30-cycle Amber run had earlier been done for 1xgo, without any major changes noticed beyond the 10-cycle. From that endpoint, two further runs were done, first of 30 cycles (“Amber60”), then a further 10 cycles (“Amber70”).
Figure S5d shows the fan of CO positions for all 7 of the deposits and refinements, progressively rotating counterclockwise from 1xgo to 1xgs. Indeed, both Amber60 and Amber70 successfully rotated the Lys22 peptide almost all the way to the good helical position seen in the high-resolution 1xsg (panel e), eliminating both the CaBLAM outlier and the intermediate-stage bond-angle outliers, presumably having crossed an energy barrier in the process.
One other CaBLAM-outlier peptide was corrected in Amber70 as well (Thr71). But for the Ala217 outlier, the wrong peptide was rotated, seduced by H-bonding to an Arg side chain in the wrong position.
In these long refinements, both R-factors and match to electron density suffer somewhat. In the cases examined, this often seems due to incorrect side chain rotamers (almost never correctable by refinement) pushing an otherwise-good backbone conformation a bit out of density (translated upward, for 1xgo Lys22). Future work will try to guide early correction of as many problems as feasible, for the faster and more successful refinement afterward that we now know is possible.
S2.6. Discussion
In summary, it is indeed true that refinement cannot usually correct a peptide orientation that is off by a large amount. The very tight geometry restraints in the CDL/E&H system presumably raise the barriers to peptide rotation. Amber is rather better at that, and about 1/3 of the time managed a good correction, although convergence can be very slow for such large changes. We feel it is crucial to try correcting problems such as flipped peptides in the initial model before refining it, however, crosstalk between backbone and side chains further complicates that process. However, we are enthusiastic about use of the Amber target to realistically improve conformation and especially sterics, once the model is mostly in the right local minima.
Acknowledgements
JSR thanks David Richardson for help with some aspects of the individual-example analyses. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, NIGMS, or DOE.
Footnotes
↵1 Currently at Microsoft
Funding information National Institutes of Health (grant No. GM122086 to David A. Case; grant No. P01GM063210 to Paul D. Adams, Jane S. Richardson); Department of Energy (grant No. DE-AC02-05CH11231 to Lawrence Berkeley National Laboratory).