Modeling the Correlation between Z and B in an X-ray Crystal Structure Refinement

We have examined how the refined B-factor changes as a function of Z (the atomic number of a scatterer) at the sulfur site of the [4Fe:4S] cluster of the nitrogenase iron protein by refinement. A simple model is developed that quantitatively captures the observed relationship between Z and B, based on a Gaussian electron density distribution with a constant electron density at the position of the scatterer. From this analysis, the fractional changes in B and Z are found to be similar. The utility of B-factor refinement to potentially distinguish atom types reflects the Z dependence of X-ray atomic scattering factors; the weaker dependence of electron atomic scattering factors on Z implies that distinctions between refined values of B in an electron scattering structure will be less sensitive to the atomic identity of a scatterer than for the case with X-ray-diffraction. This behavior provides an example of the complementary information that can be extracted from different types of scattering studies.


Introduction
We recently reported a series of selenium-incorporated nitrogenase iron (Fe) protein crystal structures in which mixed occupancies of sulfur and selenium were observed at the chalcogenide sites of the [4Fe:4X] cluster (X = S, Se (Buscagan et al., 2022)). Occupancy refinements of these sites were correlated with shifts in the B-factor, reflecting the wellrecognized correlation between occupancy and B-factor parameters that is most frequently encountered with solvents during protein structure refinement (Watenpaugh et al., 1978, Kundrot & Richards, 1987, Bhat, 1989, Jensen, 1990. By fixing the B-factors of the X atoms to match the B-factors of the Fe atoms, consistent with observations on other metalloprotein systems (Wittenborn et al., 2018, Jeoung et al., 2022, we refined occupancies at the mixed chalcogenide sites with minimal residual density in the Fobs -Fcalc difference maps. From that study, we became interested in characterizing the correlation between the atomic number, Z (as a proxy for occupancy), and the refined B-factor, to address two questions: were modeled with different elements in place of sulfur, and the B-factors refined. In these 3 in Appendix A. Over a Z range from 7 (N) to 34 (Se), the expected positive correlations between Z and refined B values are evident for both structures (Figure 1). A linear fit to this data for 8 < Z < 25 indicates that over this range, B varies approximately linearly with changes in Z; the fractional change in B relative to the fractional change in Z was found to be approximately 0.9 and 0.8 for the 7TPW and 7TPY data sets, respectively (Appendix A).
Thus, the answer to the first question is that for this system, a 10% change in Z results in an approximately 8-9% change in B.

Modeling the relationship between Z and B
To model the relationship between Z and B, we represent a scatterer by a single Gaussian with atomic number Z and overall temperature factor B (Teneyck, 1977). The electron density is then described: Eq. 1 with Eq. 2 It is important to recognize that the B in Eq. 1 and Eq. 2 includes contributions from both the atomic scattering factor B0 and the isothermal temperature factor Biso, with B = B0 + Biso.
From Eq. 2 and calculated with the Cromers and Mann atomic scattering factors (Cromer & Mann, 1968) and Biso = 16 Å 2 , B0 is found to be approximately 8 Å 2 and 6 Å 2 for N and S, respectively. If the true Z/B for a given atom are Z1 and B1, but the refinement is conducted with Z2, the corresponding B2 will be shifted from the true value to compensate for the incorrect occupancy. We developed two simple models to capture the possible relationship between Z2 and B2: . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 4, 2023. ; https://doi.org/10.1101/2023.07.04.547724 doi: bioRxiv preprint Model 1: B2 is calculated for a given Z2 such that the density at the atomic position, , has the same value as for Z1, B1. For a single Gaussian, this is equivalent to equating in Eq. 2 calculated for either Z1, B1 or Z2, B2, which gives Eq. 3 Eq. 4 The ratio Z2/Z1 corresponds to the occupancy of the Z1 scatterer at the site (to within the approximation that the shape of the atomic scattering factor is independent of Z).

Model 2:
In this case, B2 is calculated for a given Z2 to minimize the square of the difference density over the atomic volume: Eq. 5 From the condition that at the minimum, one can derive (Appendix B) Eq. 6 The variations in B2,iso as a function of Z2/Z1 were evaluated from Eq. 4 and Eq. 6 ( Figure 1).
For these calculations, Z1 = 16 eand B0 = 6 Å 2 , with B1,iso = 12.0 Å 2 and 19.8 Å 2 for the 7TPW and 7TPY structures, respectively. (These B1,iso values correspond to the average B- . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 4, 2023. ; https://doi.org/10.1101/2023.07.04.547724 doi: bioRxiv preprint factor for the two Fe sites in each structure; Appendix A.) As illustrated in Figure 1, while both Eq. 4 and Eq. 6 fit the refined B values reasonably well for Z2/Z1 < 1, the fit of Eq. 4 is superior over the entire range tested. This was a surprising result to us, as we anticipated that the ∆r 2 model would better capture the structure refinement process; instead, the isolated atom approximation (reflected in the upper limit of r = µ in Eq. 5) for a macromolecular structure refinement is evidently less accurate relative to the localized treatment implicit in the derivation of Eq. 4.

Discussion
We recognize that the approximations used to derive the relationship between Z and B in Eq.
4 are inferior to the results of a full structure refinement. Nevertheless, Eq. 4 can provide a useful starting point to evaluate the relationship between Z and B for scatterers involving either species of unknown atomic identity (such as the original analysis of the interstitial ligand in the nitrogenase FeMo-cofactor (Einsle et al., 2002), that prompted the initial development of model 2); partial occupancy (such as solvents (Watenpaugh et al., 1978)); mixed atomic composition (exchange reactions or disorder, as in (Spatzal et al., 2015, Buscagan et al., 2022); or incorrect modeling of residues adopting distinct flipped orientations, such as the side chains of asparagine and glutamine residues where flipping interchanges N and O atoms (Word et al., 1999, Weichenberger & Sippl, 2006. The utility of B-factor refinement to potentially distinguish atom types reflects the Z dependence of X-ray atomic scattering factors. An instructive comparison may be drawn to electron atomic scattering factors that depend on the Coulomb potential of the scatterer and have a less significant dependence on Z. In particular, for fully occupied C, N and O atoms calculated using electron scattering factors (parameterized in (Saha et al., 2021)) with ρ 0 ( ) . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 4, 2023. ; https://doi.org/10.1101/2023.07.04.547724 doi: bioRxiv preprint a B = 16 Å 2 vary by less than 2%, while the corresponding variation between the C and O atoms using X-ray scattering factors is approximately 50%. Consequently, the distinctions between the C, N, and O atoms in an electron scattering map are less evident than in X-ray scattering maps. This property of electron scattering was reflected in our recent structure determination of an antibiotic peptide by micro-electron diffraction, where the orientation of a histidine sidechain in a novel cross link could not be established from an analysis of the Bfactors for the two distinct rotamers (Miller et al., 2022). A "multi-messenger" approach using combinations of X-ray (with anomalous scattering, if applicable), electron, and neutron diffraction can provide additional experimental restraints to help resolve ambiguities arising in structure refinements from the correlation between Z and/or occupancy with the B-factor of scatterers. . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 4, 2023. ; https://doi.org/10.1101/2023.07.04.547724 doi: bioRxiv preprint Figure 1 Refined B2,iso values as a function of Z2/Z1, the ratio of the atomic number of the scatterer refined in the chalcogenide site (Z2) relative to Z1 = 16 (the true scatterer, sulfur), in the [4Fe:4S] cluster of the nitrogenase Fe protein (PDB data sets 7TPW (red circles) and 7TPY (blue circles)). The solid and dashed lines represent the fits to Eq. 4 and Eq. 6, respectively, with Z1 = 16 eand B0 = 6 Å 2 for both structures, and B1,iso = 12.0 Å 2 and 19.8 Å 2 for the 7TPW and 7TPY structures, respectively.

Acknowledgments
. CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 4, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 The B-factors are refined as a function of the element in the chalcogenide site of the [4Fe:4S] cluster for two nitrogenase Fe protein structures (PDB IDs 7TPW and 7TPY, with resolutions of 1.18 Å and 1.48 Å, respectively)) detailed in (Buscagan et al., 2022). The cluster sits on a crystallographic twofold axis, so that the crystallographically unique sites are designated Fe1, Fe2, X3 and X4. Least squares lines fit to the Bavg values of the scatterers at the X position over the range 8 < Z < 25 yielded slopes of 0.692 and 0.987 for the 7TPW and 7TPY data sets, respectively; when normalized to the appropriate Z and B values for S, the fractional changes in B with a fractional change in Z are found to be 0.896 and 0.786, respectively. . CC-BY 4.0 International license made available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprint this version posted July 4, 2023. ; https://doi.org/10.1101/2023.07.04.547724 doi: bioRxiv preprint

Appendix B: Derivation of Equation 6
Eq. 6 can be derived from Eq. 5 as follows. The integral and derivative were evaluated with Mathematica ® .