Abstract
Conformational change mediates the biological functions of proteins. Crystallographic measurements can map these changes with extraordinary sensitivity as a function of mutations, ligands, and time. The isomorphous difference map remains the gold standard for detecting structural differences between datasets. Isomorphous difference maps combine the phases of a chosen reference state with the observed changes in structure factor amplitudes to yield a map of changes in electron density. Such maps are much more sensitive to conformational change than structure refinement is, and are unbiased in the sense that observed differences do not depend on refinement of the perturbed state. However, even minute changes in unit cell dimensions can render isomorphous difference maps useless. This is unnecessary. Here we describe a generalized procedure for calculating observed difference maps that retains the high sensitivity to conformational change and avoids structure refinement of the perturbed state. We have implemented this procedure in an open-source python package, MatchMaps, that can be run in any software environment supporting PHENIX and CCP4. Through examples, we show that MatchMaps “rescues” observed difference electron density maps for near-isomorphous crystals, corrects artifacts in nominally isomorphous difference maps, and extends to detecting differences across copies within the asymmetric unit, or across altogether different crystal forms.
1 Introduction
X-ray crystallography provides a powerful method for characterizing the changes in protein structure caused by a perturbation [13, 15, 3, 6]. For large structural changes and large changes in occupancy of states, it is usually sufficient to refine separate structural models for each dataset and draw comparisons between the refined structures. However, for many conformational changes, coordinate-based comparisons are inaccurate and insensitive.
In crystallography, electron density is not observed directly, but rather a diffraction pattern consisting of reflections with intensities proportional to the squared amplitudes of Fourier components of the electron density (structure factors). Unfortunately, the phases of these structure factors are not observable. These phases correspond in real space to shifts of the sinusoidal waves that add up to an electron density pattern. Accordingly, phases are usually calculated from a refined model. Since phases have a strong effect on the map appearance [19], naive electron density maps calculated using observed amplitudes and model-based phases will tend to resemble the model, a phenomenon known as model bias.
The gold standard for detecting conformational change in crystallographic data is the isomorphous difference map[20]. An isomorphous difference map is computed by combining differences in observed structure factor amplitudes with a single set of phases. The phases are usually derived from a model for one of the two states, chosen as a reference. That is, the “observed” difference density ∆ρ(x) is approximated as: where and are sets of observed structure factor amplitudes from the ON (perturbed) and OFF (reference) datasets respectively, h is shorthand for the triplet of Miller indices (h, k, l), x is shorthand for the point (x, y, z) (in fractional coordinates) and is a set of structure factor phases derived from refining a structural model to the OFF data. Crucially, therefore, isomorphous difference maps are computed without information regarding the ON structure. Any difference electron density relating to the ON data relative to the OFF data (e.g., positive difference density for a bound ligand) is thus guaranteed not to be biased by previous modeling of the ON state.
Many pairs of crystallographic datasets are not perfectly isomorphous, or not at all. Here we first discuss isomorphous difference maps and then introduce an algorithm, MatchMaps, that can calculate difference maps based on observed structure factor amplitudes without reliance on isomorphism. We illustrate the approach through examples for different cases: the near-isomorphous case; protein molecules within the same asymmetric unit; and the case of entirely different crystal forms. The software implementation of MatchMaps is open-source and readily extensible.
1.1 Limitations of isomorphous difference maps
The isomorphous difference map relies on the assumption that a set of “reference” structure factor phases can also be used as the phases of a difference map. Specifically, it must hold true that for structure factors with large differences in amplitude, the phases are similar; thus, the isomorphous difference structure factor is a good approximation of the true difference structure factor. Conversely, when the phase differences are large, the amplitudes need to be similar such that these terms contribute little to the isomorphous difference map[14, 20].
Crystallographic phases are, however, highly sensitive to even small rotations and translations of protein molecules within the unit cell, as well as changes in solvent volume. These effects commonly lead to changes in the crystal’s unit cell. Indeed, in crystallographic lore, a 1 percent variation of each cell axis is the cutoff for proper isomorphism. In Figure 1, we present an example of datasets for E. coli dihydrofolate reductase. For the isomorphous case, we see that the phases are highly similar, and with few large differences in the amplitudes of structure factors that differ significantly in phase. Thus this pair of datasets is suitable for calculation of an isomorphous difference map. In contrast, even a 2% change along a single unit cell dimension (“Nonisomorphous” in Figure 1) is sufficient to obliterate the correlation among phases. Unfortunately, unit cell changes are often responsive to the exact changes one hopes to study (in the example of Figure 2b, a conformational change of the active site).
Of course, often an isomorphous difference map cannot be computed at all, for example due to differences in crystal form or, within an asymmetric unit, lack of crystallographic symmetry.
1.2 Rethinking isomorphous difference maps via the linearity of the Fourier transform
An isomorphous difference map is typically computed by first subtracting the structure factors (in reciprocal space) and then applying the Fourier transform to convert the structure factor differences into a real-space difference map. However, because the Fourier transform and subtraction are both linear operations, their order can be switched without changing the result: one might just as well calculate two electron density maps first and then subtract those maps voxel-by-voxel to yield an isomorphous difference map.
This reordering suggests how difference map computation can be generalized beyond the isomorphous case. Specifically, we see that the step in the algorithm most specific to the assumption of isomorphism is the creation of “hybrid” structure factors of the form:
The OFF phases used in this calculation were computed with regards to a unit cell and molecular orientation that may be incompatible with the ON structure factor amplitudes. The method presented here improves these “hybrid” structure factors by computing phases that account for the (generally uninteresting) shifts in molecular position and orientation without removing any signal associated with “interesting” changes.
2 The MatchMaps algorithm
The goal of MatchMaps is to achieve the best possible real-space difference density map without utilizing a prior model of the structural changes of interest. To compute a real-space difference density map, one first needs to approximate structure factor phases for each dataset. As discussed above, the isomorphous difference map makes the simplifying assumption that the same set of structure factor phases can be used for both structures.
The key to MatchMaps is to improve phases for the ON data via rigid-body refinement of the OFF starting model against the ON structure factor amplitudes. This rigid-body refinement step improves phases by optimally placing the protein model in space. However, the restriction of this refinement to only whole-model rigid-body motion protects these new phases from bias towards modeled structural changes. The result is two sets of complex structure factors which make use of the information encoded in the structure factor amplitudes without relying on a second input model.
Next, each set of complex structure factors is Fourier-transformed into a real-space electron density map. These two real-space maps will not necessarily overlay in space. However, the rotation and translation necessary to overlay the maps can be obtained from the results of rigid-body refinement. Following real-space alignment, the maps can be subtracted voxel-wise to compute a difference map.
In the idealized case—similar structures, oriented identically in space, with identical unit cells—MatchMaps will perform essentially identically to an isomorphous difference map. However, as we show in the examples below, MatchMaps is more capable than a traditional isomorphous difference map of handling datasets that diverge from this ideal. Furthermore, even in seemingly simple cases where isomorphous difference maps perform well, the real-space MatchMaps approach can show distinct improvements.
2.1 Details of algorithmic implementation
The full MatchMaps algorithm is as follows. As inputs, the algorithm requires two sets of structure factor amplitudes (referred to as ON and OFF datasets, for simplicity) and a single starting model (corresponding to the OFF data).
Place both sets of structure factor amplitudes on a common scale using CCP4[1]’s SCALEIT[14] utility and truncate the data to the same resolution range.
Generate phases for each dataset via the phenix.refine[16] program. For each dataset, the OFF starting model is used, and only rigid-body refinement is permitted to prevent the introduction of model bias.
Fourier-transform each set of complex structure factors into a real-space electron density map using the python packages reciprocalspaceship[11] and gemmi[24].
Compute the translation and rotation necessary to overlay the two rigid-body refined models. Apply this translation-rotation to the ON real-space map such that it overlays with the OFF map. These computations are carried out using gemmi.
Place both real-space maps on a common scale.
Subtract real-space maps voxel-wise.
Apply a solvent mask to the final difference map.
We note that MatchMaps is structured such that step 2 can be generalized to refinement of any uninteresting features, not just rigid-body motion, if the user provides a custom PHENIX parameter file, as specified in the online documentation. For example, if the starting model contains multiple protein chains, each chain could be rigid-body-refined separately.
2.2 Installation
MatchMaps can be installed using the pip python package manager (pip install matchmaps). The various pure-python dependencies of MatchMaps are handled by pip. Additionally, MatchMaps requires installation of the popular CCP4 and Phenix software suites for crystallography. Once installed, the above protocol can be run in a single step from the command line.
In addition to the base MatchMaps command-line utility, the utilities matchmaps.ncs and matchmaps.mr provide additional functionalities explored in the examples below and the online documentation. MatchMaps is fully open-source and readily extensible for novel use cases.
For more information, read the MatchMaps documentation at https://rs-station.github.io/matchmaps.
3 Examples
The following examples explore the benefits and functionalities offered by MatchMaps. All examples make use of published crystallographic data available from the Protein Data Bank. Scripts and data files for reproducing the figures can be found on Zenodo.
3.1 MatchMaps map where isomorphous difference map falters
The enzyme Dihydrofolate Reductase (DHFR) is a central model system for understanding the role of conformational change in productive catalytic turnover[21, 5, 4]. Specifically, the active-site Met20 loop of E. coli DHFR can adopt several different conformations, each stabilized by specific bound ligands and crystal contacts[21]. DHFR bound to NADP+ and substrate analog folate adopts a “closed” Met20 loop (PDB ID 1RX2), whereas DHFR bound to NADP+ and product analog (dideazatetrahydrofolate) adopts an “occluded” Met20 loop (PDB ID 1RX4). These structures are highly similar, other than the relevant changes at the active site (Figure 2a, structural changes shown in boxes; RMSD 0.37 Å for protein C-alpha atoms excluding Met20 loop).
Importantly, the presence of the occluded loop conformation leads to altered crystal packing wherein the crystallographic b axis increases by 2%, from 98.91 Å to 100.88 Å (Figure 2b). This means that structures 1RX2 and 1RX4, though extremely similar, cannot be effectively compared by an isomorphous difference map (Figure 2c,g). We illustrated the striking change in phases between these structures in Figure 1. MatchMaps is able to correct for this non-isomorphism and recover the expected difference signal.
First, we focused on ligand rearrangement in the active site. In the occluded-loop structure, the cofactor (Figure 2c-f, left) leaves the active site while the substrate (Figure 2c-f, right) slides laterally within the active site. MatchMaps shows this expected signal, with negative (red) difference density for the cofactor and positive (blue) difference density for the substrate (Figure 2e-f). By contrast, an isomorphous difference map (Figure 2c) is unable to recover this signal. The structural model for the occluded-loop structure is shown for clarity in Figure 2f as blue sticks and clearly matches with the positive difference density. Importantly, this ON model is never used in the computation of the MatchMaps map.
We find a similar result around residues 21-25 of the Met20 loop (Figure 2g-j). Again, MatchMaps shows readily interpretable difference signal for the change in loop conformation between the closed-loop (red) and occluded-loop (blue) structures (Figure 2i-j). The isomorphous difference map, on the other hand, contains no interpretable signal in this region of strong structural change (Figure 2g). The occluded-loop model is shown for visual comparison in Figure 2j but was not used for computation of the MatchMaps map.
Finally, we visualize the “expected” structural changes by first aligning the refined models and placing them in the same unit cell, then computing Fmodel phases for each and subtracting them. The MatchMaps difference maps correspond well with these maps (Figure 2d,h).
3.2 Unbiased modeling of ligand binding
The enzyme Protein Tyrosine Phosphatase 1B (PTP1B) plays a key role in insulin signaling[8], making it a long-standing target for the treatment of diabetes using ortho- and allosteric by drug design [23, 15, 7]. For illustration, we compare recent high-quality room-temperature structures of the apo protein (PDB ID 7RIN) with the protein bound to the competitive inhibitor TCS401 (PDB ID 7MM1)[12]. In addition to the presence/absence of signal for the ligand itself, the apo structure exhibits an equilibrium between “open” and “closed” active-site loops[22], whereas the bound structure shows only the closed loop.
The datasets 7RIN and 7MM1 are sufficiently isomorphous that an isomorphous difference map reveals the main structural changes. MatchMaps performs similarly. Strong positive difference density (blue mesh) is seen for the TCS401 ligand (grey sticks) in both the isomorphous difference map (Figure 3a) and the MatchMaps difference map (Figure 3b). Around residues 180-182 of the active-site loop (known as the WPD loop), both the isomorphous difference map (Figure 3c) and MatchMaps difference map (Figure 3d) show strong signal for the decrease in occupancy (red mesh) of the open loop conformation (red sticks) and an increase in occupancy (blue mesh) of the closed loop conformation (blue sticks).
However, even in this seemingly straightforward case, we find that the isomorphous difference map is susceptible to an artifact resulting from a slight (1.37 degrees) rotation of the protein. The displacement between the original refined structural coordinates of each structure is especially strong around residues 22-25 (Figure 3e, apo model in gray, bound model in blue). In this region, an isomorphous difference map picks up on this artifactual difference between the datasets and displays strong difference signal (blue and red mesh). Remarkably, this signal is similar in magnitude to the “true” signal described above. In contrast, MatchMaps internally aligns the models after the computation of phases and before subtraction. Figure 3f shows residues 22-25 following whole-molecule alignment of the protein models (apo model in red, bound model in blue). Following global alignment of the refined models, it is clear that this region does not contain any “interesting” signal. Sure enough, the MatchMaps difference map contains no strong signal in this region.
3.3 matchmaps.mr: comparing data from different spacegroups
For many protein systems, careful analysis of electron density change is stymied for pairs of similar structures which crystallize in different crystal forms. The MatchMaps algorithm can be further generalized to allow comparison of datasets in entirely different crystal packings or spacegroups. Specifically, the OFF model can serve as a search model for molecular replacement for the ON data. Following this extra step, the algorithm proceeds identically. We implement this modified algorithm in the commandline utility matchmaps.mr.
One such example is the enzyme DHFR, which has been crystallized in many spacegroups[21]. Here, we examine two structures of the enzyme bound to NADP+, in spacegroups P 212121 (PDB ID 1RX1) and C2 (PDB ID 1RA1), visualized in Figure 4a. These structures are overall similar, but differ in the active site (Figure 4b-d). Here, we visualize these structural changes directly in electron density without introducing model bias.
Specifically, in the P 212121 structure, the active site Met20 loop adopts a closed conformation. In the C2 structure, the Met20 loop adopts an “open” conformation, which is stabilized by a crystal contact in this crystal form[21]. The difference between the open and closed loops is exemplified by residues 17-24 (Figure 4c). The closed loop is stabilized by the formation of a key hydrogen bond between the Asn23 backbone and the Ser148 sidechain. In the open conformation, Asn23 is too far from Ser148 to form a hydrogen bond (Figure 4d).
Remarkably, the positive difference density (red) for the closed loop is strong and readily interpretable in Figures 4c-d. The MatchMaps map was computed only using the P 212121 (blue) closed-loop model. This means that the signal for the open loop conformation is derived only from the observed structure factor amplitudes for the open-loop state in an unrelated crystal form!
3.4 matchmaps.ncs: comparing NCS-related molecules
The real-space portion of the MatchMaps algorithm can be repurposed to create “internal” difference maps across non-crystallographic symmetry (NCS) operations. As an example, we examined the crystal structure of the fifth PDZ domain (PDZ5) from the Drosophila protein Inactivation, no after-potential D (INAD). This domain plays an essential role in terminating the response of photoreceptors to absorbed photons by modulation of its ability to bind ligands [17]. In particular, the binding cleft of PDZ5 can be locked by formation of a disulfide bond between residues C606 and C645. PDZ5 was found to crystallize in a form with three molecules in the asymmetric unit (Figure 5a). Remarkably, each of these molecules adopts a different state. Specifically, chain C contains a disulfide bond between residues Cys606 and Cys645, whereas chain B does not. Chain A adopts a bound state by binding the C terminus of chain C. MatchMaps enables calculation of an internal difference map, yielding a clearly interpretable difference map of the formation of the disulfide bond (Figure 5c).
4 Discussion
The isomorphous difference map has been the gold standard for detecting conformational change for many years [14, 20]. However, we show above that the same inputs—one structural model and two sets of structure factor amplitudes—can be combined to compute a difference map that shares the strength of an isomorphous difference map while ameliorating a key weakness. Specifically, structure factor phases are highly sensitive not only to structural changes (“interesting” signal) but also to changes in unit cell dimensions and model pose (“uninteresting” signal). The introduction of rigid-body refinement minimizes the contribution of this uninteresting signal to the final difference map. In Figure 2, we illustrate a case where a loss of isomorphism significantly degrades the signal of an isomorphous difference map. In this case, MatchMaps is still able to recover the expected difference signal. Figure 3 shows a situation where the isomorphous difference map performs well for the strongest difference signal. However, even in this seemingly straightforward use case, the isomorphous difference map is still susceptible to “uninteresting signal”. MatchMaps is able to successfully remove this artifact.
In our experience, the results from crystallographic perturbation experiments are often shelved due to changes in unit cell constants. Interesting conformational changes often slightly alter the unit cell. Unit cells constants are also sensitive to temperature[10], radiation damage[18], pressure[2], and humidity[9], meaning that even data collected on the same crystal may not be isomorphous. MatchMaps removes, in principle, the requirement for isomorphism and allows for the analysis of far more crystallographic differences.
Furthermore, the computation of an isomorphous difference map is entirely incompatible with data from different crystal forms. The matchmaps.mr extension of MatchMaps allows for model-bias-free comparisons of electron densities regardless of crystal form, opening up a new world of structural comparisons. For instance, an isomorphous difference map cannot characterize the impacts of crystal packing. As shown above, MatchMaps can create such a map and thus allows enhanced understanding of the often subtle role of crystal packing on protein structure.
MatchMaps depends only on the common CCP4 and Phenix crystallographic suites along with various automatically installed pure-python dependencies. MatchMaps runs in minutes on a modern laptop computer. The only required input files are a PDB or mmCIF file containing the protein model, two MTZ files containing structure factor amplitudes and uncertainties, and any CIF ligand restraint files necessary for refinement. These are the same inputs required for many common purposes (such as running phenix.refine) and would likely already be on hand. As outputs, MatchMaps produces real-space maps in the common MAP/CCP4/MRC format which can be readily opened in molecular visualization software such as PyMOL or Coot. For these reasons, MatchMaps should slot naturally into the crystallographer’s workflow for analysis of related datasets. Additionally, MatchMaps is open source and can be easily modified for a new use case by an interested developer. The authors welcome issues and pull requests on GitHub for the continued improvement of the software.
5 Acknowledgements
We thank Harrison Wang (Harvard University) for testing of the MatchMaps code. We thank Marcin Wojdyr (Global Phasing Ltd.) for assistance with the gemmi library. This work was supported by the NIH Director’s New Innovator Award (DP2-GM141000, to D.R.H.).