research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 69| Part 4| April 2013| Pages 625-634
ADDENDA AND ERRATA

A correction has been published for this article. To view the correction, click here.

Bulk-solvent and overall scaling revisited: faster calculations, improved results

CROSSMARK_Color_square_no_text.svg

aLawrence Berkeley National Laboratory, One Cyclotron Road, MS64R0121, Berkeley, CA 94720, USA, bDepartment of Bioengineering, University of California, Berkeley, Berkeley, CA 94720, USA, cIGBMC, CNRS–INSERM–UdS, 1 Rue Laurent Fries, BP 10142, 67404 Illkirch, France, and dUniversité de Lorraine: Département de Physique – Nancy 1, BP 239, Faculté des Sciences et des Technologies, 54506 Vandoeuvre-lès-Nancy, France
*Correspondence e-mail: pafonine@lbl.gov

(Received 13 December 2012; accepted 5 January 2013; online 14 March 2013)

A fast and robust method for determining the parameters for a flat (mask-based) bulk-solvent model and overall scaling in macromolecular crystallographic structure refinement and other related calculations is described. This method uses analytical expressions for the determination of optimal values for various scale factors. The new approach was tested using nearly all entries in the PDB for which experimental structure factors are available. In general, the resulting R factors are improved compared with previously implemented approaches. In addition, the new procedure is two orders of magnitude faster, which has a significant impact on the overall runtime of refinement and other applications. An alternative function is also proposed for scaling the bulk-solvent model and it is shown that it outperforms the conventional exponential function. Similarly, alternative methods are presented for anisotropic scaling and their performance is analyzed. All methods are implemented in the Computational Crystallo­graphy Toolbox (cctbx) and are used in PHENIX programs.

1. Introduction

Macromolecular crystals typically contain a substantial amount of disordered solvent, ranging from approximately 20 to 90% of the crystal volume, with a mean of 55%, in the Protein Data Bank (PDB; Bernstein et al., 1977[Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535-542.]; Berman et al., 2000[Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242.]). Anisotropy in the diffracted intensities is another common feature of macromolecular crystals that arises from various sources including crystal lattice vibrations (Shakked, 1983[Shakked, Z. (1983). Acta Cryst. A39, 278-279.]; Sheriff & Hendrickson, 1987[Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118-121.]). When modelling diffracted intensities, for example in structure refinement or automated model building, it is therefore critical to account for these two phenomena (see, for example, Jiang & Brünger, 1994[Jiang, J. S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]; Urzhumtsev & Podjarny, 1995[Urzhumtsev, A. G. & Podjarny, A. D. (1995). Jnt CCP4/ESF-EACMB Newsl. Protein Crystallogr. 31, 12-16.]; Kostrewa, 1997[Kostrewa, D. (1997). CCP4 Newsl. Protein Crystallogr. 34, 9-22.]; Badger, 1997[Badger, J. (1997). Methods Enzymol. 277, 344-352.]; Urzhumtsev, 2000[Urzhumtsev, A. G. (2000). CCP4 Newsl. Protein Crystallogr. 38, 38-49.]; Fokine & Urzhumtsev, 2002a[Fokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. A58, 72-74.]; Fenn et al., 2010[Fenn, T. D., Schnieders, M. J. & Brunger, A. T. (2010). Acta Cryst. D66, 1024-1031.]). The flat bulk-solvent model (Phillips, 1980[Phillips, S. E. (1980). J. Mol. Biol. 142, 531-554.]; Jiang & Brünger, 1994[Jiang, J. S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]) combined with overall anisotropic scaling in either exponential (Sheriff & Hendrickson, 1987[Sheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118-121.]) or polynomial (Usón et al., 1999[Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158-1167.]) forms is a well established and computationally efficient approach. Alternatives have been proposed (Tronrud, 1997[Tronrud, D. E. (1997). Methods Enzymol. 277, 306-319.]; Vassylyev et al., 2007[Vassylyev, D. G., Vassylyeva, M. N., Perederina, A., Tahirov, T. H. & Artsimovitch, I. (2007). Nature (London), 448, 157-162.]), but are not currently in wide use.

In the commonly used approach, the total structure factor is defined as

[{\bf F}_{\rm model} = k_{\rm total}({\bf F}_{\rm calc}+ k_{\rm mask}{\bf F}_{\rm mask}), \eqno(1)]

where ktotal is the overall Miller-index-dependent scale factor, Fcalc and Fmask are the structure factors computed from the atomic model and the bulk-solvent mask, respectively, and kmask is a bulk-solvent scale factor. The mask can be computed efficiently using exact asymmetric units as described in Grosse-Kunstleve et al. (2011[Grosse-Kunstleve, R. W., Wong, B., Mustyakimov, M. & Adams, P. D. (2011). Acta Cryst. A67, 269-275.]).

The overall scale factor ktotal can be thought of as the product

[k_{\rm total} = k_{\rm overall}\,k_{\rm isotropic} \,k_{\rm anisotropic}, \eqno(2)]

where koverall is the overall scale factor and kisotropic and kanisotropic are the isotropic and anisotropic scale factors, respectively.

koverall is a scalar number that can be obtained by minimizing the least-squares residual

[{\rm LS} = \textstyle \sum (F_{\rm obs}- k_{\rm overall}| {\bf F}_{\rm model}'|)^2, \eqno(3)]

where Fobs are the observed structure factors and

[{\bf F}_{\rm model}' = k_{\rm isotropic}\,k_{\rm anisotropic}({\bf F}_{\rm calc}+k_{\rm mask}{\bf F}_{\rm mask}). \eqno(4)]

The sum is over all reflections. Solving ∂LS/∂koverall = 0 leads to

[k_{\rm overall} =\textstyle \sum F_{\rm obs}|{\bf F}_{\rm model}'|/\textstyle \sum |{\bf F}_{\rm model}'|^2. \eqno(5)]

In the exponential model the anisotropic scale factor is defined as

[k_{\rm anisotropic} = \exp(- 2\pi^2{\bf s}^t{\bf U}_{\rm cryst}{\bf s}), \eqno(6)]

where Ucryst is the overall anisotropic scale matrix equivalent to U* defined in Grosse-Kunstleve & Adams (2002[Grosse-Kunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477-480.]); st = (hkl) is the transpose of the Miller-index column vector s.

Usón et al. (1999[Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158-1167.]) define a polynomial anisotropic scaling function that can be rewritten in matrix notation as follows:

[k_{\rm anisotropic} = {\bf s}^t{\bf V}_0{\bf s} + ({\bf s}^t{\bf V}_1{\bf s})s^2, \eqno(7)]

where V0 and V1 are symmetric 3 × 3 matrices, s2 = stG*s and G* is the reciprocal-space metric tensor. Expression (7)[link] is equivalent to the first terms in the Taylor series expansion of the exponential function (6)[link],

[\exp(-2\pi^2{\bf s}^t{\bf U}_{\rm cryst}{\bf s}) \simeq 1 - 2{\pi^2}{\bf s}^t{\bf U}_{\rm cryst}{\bf s} + 2\pi^4({\bf s}^t{\bf U}_{\rm cryst}{\bf s})({\bf s}^t{\bf U}_{\rm cryst}{\bf s}) ,\eqno(8)]

with the constant term omitted. The omission of the constant 1 means that kanisotropic is equal to zero for the reflection F000, as follows from (7)[link]. Therefore, in this work we modify (7)[link] by adding the constant

[k_{\rm anisotropic} = 1 + {\bf s}^t{\bf V}_0{\bf s} + ({\bf s}^t{\bf V}_1{\bf s})s^2. \eqno(9)]

The bulk-solvent scale factor is traditionally defined as

[k_{\rm mask} = k_{\rm sol}\exp(-B_{\rm sol}s^2/4), \eqno(10)]

where ksol and Bsol are the flat bulk-solvent model parameters (Phillips, 1980[Phillips, S. E. (1980). J. Mol. Biol. 142, 531-554.]; Jiang & Brünger, 1994[Jiang, J. S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115.]; Fokine & Urzhumtsev, 2002b[Fokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. D58, 1387-1392.]).

Depending on the calculation protocol, kisotropic may be assumed to be a part of kanisotropic or it can be assumed to be exponential: kisotropic = exp(−Bs2/4), where B is a scalar parameter. Alternatively, it may be determined as described in §[link]2.3 below.

The determination of the anisotropic scaling parameters (Ucryst or V0 and V1) and the bulk-solvent parameters ksol and Bsol requires the minimization of the target function (3)[link] with respect to these parameters. Despite the apparent simplicity, this task is quite involved owing to a number of numerical issues (Fokine & Urzhumtsev, 2002b[Fokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. D58, 1387-1392.]; Afonine et al., 2005a[Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005a). Acta Cryst. D61, 850-855.]). Previously, we have developed a robust and thorough procedure (Afonine et al., 2005a[Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005a). Acta Cryst. D61, 850-855.]) to address these issues. This procedure is used routinely in PHENIX (Adams et al., 2010[Adams, P. D. et al. (2010). Acta Cryst. D66, 213-221.]). However, owing to its thoroughness the procedure is relatively slow and may account for a significant fraction of the execution time of certain PHENIX applications (for example, phenix.refine).

In this paper, we describe a new procedure which is approximately two orders of magnitude faster than the approach described in Afonine et al. (2005a[Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005a). Acta Cryst. D61, 850-855.]) and often leads to a better fit of the experimental data. The speed gain is the result of an analytical determination of the optimal bulk-solvent and scaling parameters. The better fit to the experimental data is partially the result of employing a more detailed model for kmask compared with the exponential model in equation (10)[link] and is partially a consequence of the new analytical optimization method. Analytical optimization eliminates the possibility of becoming trapped in local minima, which exists in all iterative local optimization methods, including the procedure used previously.

2. Methods

2.1. Anisotropic scaling: exponential model

To obtain the elements of the anisotropic scaling matrix (6)[link], the minimization of (3)[link] is replaced by the minimization of

[{\rm LSL} = \textstyle \sum \limits_{\bf s} [\ln(F_{\rm obs}) - \ln(|{\bf F}_{\rm model}|)]^2. \eqno(11)]

For this, we assume that Fobs and |Fmodel| are positive. We also assume that the minima of (3)[link] and (11)[link] are at similar locations. This assumption is not obvious and, as discussed below, may not always hold (see §[link]3.3 and Table 2). Expression (11)[link] can be rewritten as

[{\rm LSL} = (2\pi^2)^2\textstyle \sum \limits_{\bf s} (Z + {\bf s}^t{\bf U}_{\rm cryst}{\bf s})^2. \eqno(12)]

Here, Z = [1/(2π2)]ln[Fobs(koverallkisotropic|Fcalc + kmaskFmask|)−1]. Defining

[\widetilde {\rm LSL} = {\rm LSL}/(2\pi^2)^2 \eqno(13)]

and using

[{\bf U}_{\rm cryst} = \left({\matrix{ U_{11} & U_{12} & U_{13} \cr U_{12} & U_{22} & U_{23} \cr U_{13} & U_{23} & U_{33}}} \right), \eqno(14)]

the target function determining the optimal Ucryst is

[\eqalignno {\widetilde {\rm LSL} &= \textstyle \sum \limits_{\bf s} (Z + U_{11}h^2 + U_{22}k^2+ U_{33}l^2 \cr &\ \quad +\ 2U_{12}hk + 2U_{13}hl + 2U_{23}kl )^2. &(15)}]

The Ucryst values that minimize (15)[link] are determined from the condition [{\nabla _{\bf{U}}}\widetilde {LSL}] = 0, which gives a system of six linear equations

[{\bf M}\,{\bf U}_{\rm cryst} = {\bf b}. \eqno(16)]

where M = [\textstyle \sum_{\bf s} {\bf V} \otimes {\bf V}], V = (h2, k2, l2, 2hk, 2hl, 2kl)t, ⊗ denotes the outer product and b = [-\textstyle \sum_{\bf s} Z{\bf V}].

The desired Ucryst matrix is determined by solving the system (16):[link]

[{\bf U}_{\rm cryst} = {\bf M}^{-1}{\bf b}. \eqno (17)]

Crystal-system-specific symmetry constraints can be incorporated via a constraint matrix (C), which we derive from first principles by solving the system of linear equations RtUR = U for all rotation matrices R of the crystal-system point group. Alternatively, symmetry constraints are often derived manually and tabulated (Nye, 1957[Nye, J. F. (1957). Physical Properties of Crystals. Oxford: Clarendon Press.]; Giacovazzo, 1992[Giacovazzo, C. (1992). Fundamentals of Crystallography. Oxford University Press.]). For example, the constraint matrix for the tetragonal crystal system is

[{\bf C} = \left({\matrix{ 1 & 1 & 0 & 0 & 0 & 0 \cr 0 & 0 & 1 & 0 & 0 & 0}} \right). \eqno(18)]

The number of rows in C determines the number of independent coefficients of Ucryst. Let Uind be the column vector of independent coefficients; the (redundant) set of six coefficients Ucryst is then obtained via

[{\bf U}_{\rm cryst} = \left({\matrix{U_{11}& U_{22}& U_{33}& U_{12}& U_{13}& U_{23}}}\right) = {{\bf C}}^{t}\,{\bf U}_{\rm ind}. \eqno(19)]

The constraint matrix C is introduced into equations (16)[link] and (17)[link] above as follows:

[{\bf M}_{\rm C}{\bf U}_{\rm ind} = {\bf b}_{\rm C} \eqno(20)]

with MC = [\textstyle\sum_{\bf h}{\bf V}_{\rm C} \otimes {\bf V}_{\rm C}], VC = CV, bC = [- \textstyle\sum_{\bf h} Z{\bf V}_{\rm C}] and

[{\bf U}_{\rm ind} = {\bf M}_{\rm C}^{-1}{\bf b}_{\rm C}. \eqno (21)]

The full Ucryst is then determined via equation (19)[link].

2.2. Anisotropic scaling: polynomial model

The polynomial model (Usón et al., 1999[Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158-1167.]) for anisotropic scaling allows the direct use of the residual (3)[link] to find the optimal coefficients for V0 and V1 in equation (9)[link]. An advantage of this model is that no assumptions about the similarity of the location of the minima of targets (3)[link] and (11)[link] are required. Conceptually, a disadvantage of equation (9)[link] is that it is only an approximation of equation (6)[link], as was shown above. However, the number of parameters is doubled in equation (9)[link] compared with equation (6)[link], since V0 and V1 are treated independently. The increased number of degrees of freedom may therefore compensate for approximation in­accuracies.

Similarly to §[link]2.1, the optimal coefficients for V0 and V1 are determined by the condition ∇VLS = 0 and can be obtained by solving a system of 12 linear equations. We follow the arguments of Usón et al. (1999[Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158-1167.]) for not using symmetry constraints in this case.

2.3. Bulk-solvent parameters and overall isotropic scaling

Defining K = k−2total = (koverallkisotropickanisotropic)−2, the determination of the desired scaling parameters kisotropic and kmask is reduced to minimizing

[{\rm LS}_s(K,k_{\rm mask}) = \textstyle \sum \limits_s (| {\bf F}_{\rm calc} + k_ {\rm mask}{\bf F}_{\rm mask}|^2 - KI)^2 \eqno(22)]

in resolution bins, where koverall and kanisotropic are fixed. This minimization problem is generally highly overdetermined because the number of reflections per bin is usually much larger than two.

Introducing w = |Fmask|2, v = (Fcalc, Fmask) and u = |Fcalc|2 and substitution into (22)[link] leads to

[{\rm LS}_s(K,k_{\rm mask}) = \textstyle \sum \limits_s [(k_{\rm mask}^2w + 2k_{\rm mask}v + u) - KI]^2. \eqno(23)]

Minimizing (23) with respect to K and kmask leads to a system of two equations:

[\cases { {\displaystyle {{\partial} \over {\partial k}}} {\rm LS}_s(K,k_{\rm mask}) = - \textstyle \sum \limits_s [(k_{\rm mask}^2w_s + 2k_{\rm mask}v_s + u_s) - KI_s]I_s \cr \quad\quad\quad\quad\quad\quad\quad= 0 \cr {\displaystyle {{\partial} \over {\partial k_{\rm mask} }}} {\rm LS}_s(K,k_{\rm mask}) = 2\textstyle \sum \limits_s [(k_{\rm mask}^2w_s + 2k_{\rm mask}v_s + u_s) - KI_s] \cr \quad\quad\quad\quad\quad\quad\quad\quad\quad \times\, (k_{\rm mask}w_s + v_s) = 0. } \eqno(24)]

Developing these equations with respect to kmask,

[\cases {k_{\rm mask}^2\textstyle \sum \limits_s w_sI_s + 2k_{\rm mask}\textstyle \sum \limits_s v_sI_s + \textstyle \sum \limits_s u_sI_s - K\textstyle \sum \limits_s I_s^2 = 0 \cr k_{ \rm mask}^3\textstyle \sum \limits_s w_s + 3k_{\rm mask}^2\textstyle \sum \limits_s w_sv_s + k_{\rm mask}\textstyle \sum \limits_s (2v_s^2 + u_sw_s - KI_sw_s) \cr \quad\quad\quad\quad\quad\quad\quad + \textstyle \sum \limits_s u_sv_s - K\textstyle \sum \limits_s I_sv_s = 0,} \eqno(25)]

and introducing new notations for the coefficients, we obtain

[\cases {k_{\rm mask}^2 C_2 + k_{\rm mask}B_2 + A_2 - KY_2 = 0 \cr k_{\rm mask}^3D_3 + k_{\rm mask}^2C_3 + k_{\rm mask}(B_3 - KC_2) + A_3 - KY_3 = 0.} \eqno(26)]

Multiplying the second equation by Y2 and substituting KY2 from the first equation into the new second equation, we obtain a cubic equation

[\eqalignno {k_{\rm mask}^3&(D_3Y_2 - C_2^2) + k_{\rm mask}^2(C_3Y_2 - C_2B_2 - C_2Y_3) &(27)\cr &+ k_{\rm mask}(B_3Y_2 - C_2A_2 - Y_3B_2) + (A_3Y_2 - Y_3A_2) = 0. }]

The senior coefficient in (27)[link] satisfies the Cauchy–Schwarz inequality:

[D_3Y_2 - C_2^2 = \textstyle \sum \limits_s w_s^2\textstyle\sum \limits_s I_s^2 - \textstyle \sum \limits_s w_sI_s\textstyle \sum \limits_s w_sI_s \,\gt \,0. \eqno(28)]

Therefore, equation (27)[link] can be rewritten as

[k_{\rm mask}^3 + ak_{\rm mask}^2 + bk_{\rm mask} + c = 0 \eqno(29)]

and solved using a standard procedure.

The corresponding values of K are obtained by substituting the roots of equation (29)[link] into the first equation in (26)[link]:

[K = (k_{\rm mask}^2C_2 + k_{\rm mask}B_2 + A_2)/Y_2. \eqno(30)]

If no positive root exists kmask is assigned a zero value, which implies the absence of a bulk-solvent contribution. If several roots with kmask ≥ 0 exist then the one that gives the smallest value of LSs(K, kmask) is selected.

If desired, one can fit the right-hand side of expression (10)[link] to the array of kmask values by minimizing the residual

[{\rm LS} = \textstyle \sum \limits_{\bf s}[k_{\rm mask}-k_{\rm sol} \exp(-B_{\rm sol} s^{2}/4)]^{2} \eqno(31)]

for all kmask > 0. This can be achieved analytically as described in Appendix A[link]. Similarly, one can fit koverallexp(−Boveralls2/4) to the array of K values.

2.4. Presence of twinning

In case of twinning with N twin-related domains, the total model intensity is

[I_{\rm model}({\bf s}) = \textstyle \sum \limits_{j = 1}^N {\alpha_j}I_{j}({\bf T}_j{\bf s}), \eqno(32)]

where αj is the twin fraction of the jth domain, Tj is the corresponding twin operator (a 3 × 3 rotation matrix) and

[I_j({\bf T}_{j}{\bf s}) = k_{\rm total}({\bf T}_{j}{\bf s})|{\bf F}_{\rm calc}({\bf T}_{j}{\bf s})+{k}_{\rm mask}({\bf T}_{j}{\bf s}) {\bf F}_{\rm mask}({\bf T}_{j}{\bf s})|^{2}. \eqno(33)]

ktotal includes all scale factors (overall, isotropic and anisotropic). We make the reasonable assumption that ktotal and kmask are identical for all twin domains.

Finding the twin fractions αj can be achieved by solving the minimization problem

[{\rm LS}({\alpha}_{1}, \ldots, {\alpha}_{N}) = \textstyle \sum \limits_{\bf s}\left[\textstyle \sum \limits_{j = 1}^{N}{\alpha}_{j}I_{j}({\bf s}_{j})-I({\bf s})\right]^{2}, \eqno(34)]

with the constraint condition

[C({\alpha}_{1}, \ldots, {\alpha}_{N}) = \textstyle \sum \limits_{j = 1}^{N}{\alpha}_{j}-1 = 0, \eqno(35)]

where I(s) = F2obs and sj = Tjs. This constrained minimization problem can be reformulated as an unconstrained minimization problem by the standard technique of introducing a Lagrange multiplier:

[{\rm LS}({\alpha}_{1}, \ldots, {\alpha}_{N},\lambda) = {\rm LS}({\alpha}_{1}, \ldots, {\alpha}_{N})+ \lambda C({\alpha}_{1}, \ldots, {\alpha}_{N}). \eqno(36)]

The values {α1, …, αN, λ} that minimize (36)[link] are the solution of the system of N + 1 linear equations with N + 1 variables:

[\cases {\partial {\rm LS}(\alpha_{1}, \ldots, \alpha _{N},\lambda)/\partial \alpha_{1} = 0 \cr \quad\quad\quad\quad\quad\ldots \cr \partial {\rm LS} (\alpha_{1}, \ldots, \alpha_{N},\lambda)/\partial \alpha_{N} = 0\cr \partial {\rm LS}(\alpha_{1}, \ldots, \alpha_{N},\lambda)/\partial \lambda = 0} \eqno(37)]

or

[\cases {\textstyle \sum \limits_{\bf s}\left[\textstyle \sum \limits_{j = 1}^{N}\alpha_{j}I_{j}({\bf s}_{j})-I({\bf s})\right]I_{1}({\bf s}_{1})+\lambda = 0\cr \quad\quad\quad\quad\quad\quad \quad\quad\ldots \cr \textstyle \sum \limits_{\bf s}\left[\textstyle \sum \limits_{j = 1}^{N}\alpha_{j}I_{j}({\bf s}_{\bf j})-I({\bf s})\right]I_{N}({\bf s}_{N})+\lambda = 0\cr \textstyle \sum \limits_{j = 1}^{N}\alpha _{j}-1 = 0.} \eqno(38)]

The solution of this system is

[({\tilde{\alpha}}_{1}, \ldots, {\tilde{\alpha}}_{N},{\tilde{\lambda}})^{t} = {{\bf M}}^{-1}{\bf b} \eqno(39)]

with the (N + 1) × (N + 1) matrix

[{\bf M} = \left (\matrix{ {\textstyle \sum \limits_{\bf s}} {{\bf V} \otimes {\bf V}} & {\bf 1} \cr {\bf 1} & 0 } \right), \eqno(40)]

and

[{\bf V} = [I_{1}({\bf s}_{1}), \ldots, {I}_N({\bf s}_{N})]. \eqno(41)]

Here, 1 is a row or column containing N unit elements to complete the matrix M and

[{\bf b} = \left[\textstyle \sum \limits_{\bf s} I({\bf s})I_1({s_1}), \ldots, \textstyle \sum \limits_{\bf s} I({\bf s})I_N({s_N}),1\right]^t. \eqno(42)]

The values of are expected to be between 0 and 1, and λ is proportional to the sum of squared intensities. Therefore, it is numerically beneficial to multiply the λC(α1, …, αN) term in (36)[link] by a constant [\textstyle \sum_{\bf s} I^2({\bf s})] in order to make the value for λ numerically similar to the values for the twin fractions α.

Once the twin fractions have been found, the procedure described in §[link]2.3 can be used to obtain the overall and bulk-solvent scale factors. Similarly to (23)[link], we can write

[{\rm LS}_s(K,k_{\rm mask}) = \textstyle \sum \limits_s \left[\textstyle \sum \limits_{j = 1}^N \alpha_j| {\bf F}_{\rm calc}({\bf s}_j) + k_{\rm mask}{\bf F}_{\rm mask}({\bf s}_j)|^2 - KI \right]^2, \eqno(43)]

where αj are known twin fractions and K and kmask are the scale factors to be determined. Similarly to §[link]2.3, we obtain

[\eqalignno {&\textstyle \sum \limits_{j = 1}^N \alpha _j| {\bf F}_{\rm calc}({\bf s}_j) + k_{\rm mask}{\bf F}_{\rm mask}({\bf s}_j)|^2 = \textstyle \sum \limits_{j = 1}^N \{ \alpha_j| {\bf F}_{\rm calc}({\bf s}_j)|^2 \cr &+ 2k_{\rm mask}\alpha _j[{\bf F}_{\rm calc}({\bf s}_j){\bf F}_{\rm mask}({\bf s}_j)] + k_{\rm mask}^2\alpha_j|{\bf F}_{\rm mask}({\bf s}_j)|^2 \}.&(44)}]

Introducing new variables as before for equation (23)[link] leads to

[{\rm LS}_s(K,k_{\rm mask}) = \textstyle\sum \limits_s [(k_{\rm mask}^2w + 2k_{\rm mask}v + u) - KI]^2. \eqno(45)]

The determination of the twin fractions α and scales ktotal and kmask are iterated several times until convergence. The determination of α does not guarantee that the individual twin fractions αj are in the range 0–1. For any αj outside this range the corresponding twin operation is ignored for the current iteration and the new smaller set of twin fractions and scales are redetermined. However, in the next iteration the full set of α is tried again.

3. Results

3.1. Implementation of the new protocol

The scale factors involved in the calculation of Fmodel according to equation (1)[link] are highly correlated. Therefore, the order of their determination is important. Empirically, we found that the determination of kisotropic and kmask followed by the determination of kanisotropic works optimally in most cases. The determination of (kmask, kisotropic) and kanisotropic is re­peated several times until the R factor decreases by less than 0.01% between cycles. The number of cycles required to reach convergence is typically between 1 and 5.

To determine kanisotropic, our protocol can make use of three available scaling methods: polynomial (poly; §[link]2.2), exponential with analytical calculation of the optimal parameters (expanal; §[link]2.1) and exponential with the optimal parameters obtained via L-BFGS (Liu & Nocedal, 1989[Liu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503-528.]) minimization (expmin; Afonine et al., 2005a[Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005a). Acta Cryst. D61, 850-855.]). The three methods can be tested independently, in which case the result with the lowest R factor is accepted. However, because expmin is up to an order of magnitude slower than the other two methods it is not expected to be used routinely.

The calculation of kisotropic and kmask requires dividing the data into resolution bins (§[link]3.2). If oscillation of kmask between bins occurs, smoothening (Savitzky & Golay, 1964[Savitzky, A. & Golay, M. J. E. (1964). Anal. Chem. 36, 1627-1639.]) is applied to the bin-wise determined values of kmask such that it reduces the oscillations without altering the monotonic behavior of kmask as a function of resolution (see Fig. 1[link]). Finally, the smoothed values are assigned to individual reflections using linear interpolation. The kisotropic scales are updated using equation (5)[link] in order to account for the changed kmask.

[Figure 1]
Figure 1
Examples of smoothening of kmask. The original kmask (blue; obtained as the solution of equation 29[link]) and that after smoothening (red) are shown for three PDB entries with the PDB codes shown on the plots.

As illustrated in §[link]3.2, the minimum of the R-factor function

[R = \textstyle \sum \big|F_{\rm obs} - | {\bf F}_{\rm model}| \big|/\textstyle \sum |F_{\rm obs}| \eqno(46)]

and the minimum of the least-squares function (22[link]) can be at significantly different locations in the (kmask, kisotropic) parameter space. To assure that the final (kmask, kisotropic) values correspond to the lowest R factor, a fast grid search is performed around the optimal values of the least-squares function.

3.2. Binning

The goal of binning is to group data by common features to characterize each group by a set of common parameters. Here, the key parameter is the resolution d of reflections. Binning schemes with bins containing an approximately equal number of reflections (i.e. the resolution range is uniformly sampled in d−3) or a predefined number of bins are typically used. Since the low-resolution region of the data is sparse, such binning schemes tend to produce only one or very few low-resolution bins, which is insufficient to best model the bulk-solvent contribution. Unfortunately, decreasing the number of reflections per bin will disproportionally increase the number of bins (Nbins) at higher resolution and may still provide insufficient detail for the low-resolution data (Table 1[link]).

Table 1
Comparison of binning schemes performed with d−3 and ln(d) spacing for three selected PDB data sets: 1kwn , 3hay and 3gk8

All three data sets have very low completeness in the lowest resolution bin, which d−3 binning obscures while ln(d) binning makes clear even when using approximately half the number of bins. Completeness in the high-resolution region is similar in the two binning schemes. For each binning method three columns of data are presented: resolution range (Å), completeness and number of reflections.

  1kwn 3hay 3gk8
Bin No. d−3 ln(d) d−3 ln(d) d−3 ln(d)
1 19.96–3.25 0.967 4363 19.96–7.87 0.860 301 44.86–13.44 0.932 715 44.86–17.61 0.852 300 22.18–5.00 0.906 1938 22.18–8.16 0.610 300
2 3.25–2.58 0.997 4280 7.87–6.33 0.971 300 13.43–10.71 1.000 716 17.58–14.23 1.000 301 5.00–3.98 0.994 2052 8.15–7.00 0.993 300
3 2.58–2.26 0.999 4214 6.33–5.10 0.966 564 10.71–9.37 1.000 688 14.22–11.51 1.000 556 3.98–3.48 0.997 2060 7.00–6.01 0.996 452
4 2.26–2.05 1.000 4218 5.10–4.10 0.961 1037 9.37–8.52 1.000 693 11.51–9.31 1.000 1011 3.48–3.16 0.995 2051 6.01–5.16 0.994 700
5 2.05–1.90 0.990 4135 4.10–3.30 0.986 1987 8.52–7.91 1.000 679 9.31–7.53 1.000 1853 3.16–2.93 0.976 1988 5.16–4.43 0.993 1087
6 1.90–1.79 0.993 4133 3.30–2.66 0.997 3772 7.91–7.45 1.000 673 7.53–6.10 1.000 3448 2.93–2.76 0.968 1973 4.43–3.81 0.996 1735
7 1.79–1.70 0.992 4119 2.66–2.14 0.999 7177 7.45–7.08 1.000 675 6.10–4.99 0.997 5905 2.76–2.62 0.958 1902 3.81–3.27 0.996 2716
8 1.70–1.63 0.989 4070 2.14–1.72 0.993 13453 7.08–6.77 1.000 657   2.62–2.51 0.952 1961 3.27–2.81 0.979 4149
9 1.63–1.57 0.988 4094 1.72–1.38 0.990 25516 6.77–6.51 1.000 672   2.51–2.41 0.954 1941 2.81–2.41 0.955 6410
10 1.57–1.51 0.990 4093 1.38–1.20 0.989 28106 6.51–6.29 1.000 671   2.41–2.33 0.941 1876 2.41–2.07 0.931 9748
11 1.51–1.46 0.987 4036   6.28–6.09 1.000 657   2.33–2.26 0.933 1897 2.07–1.85 0.827 9681
12 1.46–1.42 0.990 4073   6.09–5.92 1.000 655   2.26–2.19 0.940 1881  
13 1.42–1.39 0.993 4088   5.91–5.76 1.000 666   2.19–2.13 0.931 1876  
14 1.39–1.35 0.992 4057   5.76–5.62 1.000 656   2.13–2.08 0.914 1838  
15 1.35–1.32 0.992 4077   5.62–5.49 1.000 667   2.08–2.03 0.897 1834  
16 1.32–1.29 0.995 4052   5.49–5.38 1.000 653   2.03–1.99 0.891 1766  
17 1.29–1.27 0.991 4047   5.38–5.27 1.000 635   1.99–1.95 0.865 1765  
18 1.27–1.24 0.991 4045   5.27–5.17 1.000 663   1.95–1.92 0.825 1645  
19 1.24–1.22 0.988 4026   5.17–5.08 1.000 660   1.91–1.88 0.767 1537  
20 1.22–1.20 0.972 3993   5.08–4.99 0.973 623   1.88–1.85 0.732 1497  

An alternative approach which divides the resolution range uniformly on a logarithmic scale ln(d) (Urzhumtsev et al., 2009[Urzhumtsev, A., Afonine, P. V. & Adams, P. D. (2009). Acta Cryst. D65, 1283-1291.]) efficiently solves this problem. The flowchart of the algorithm is shown in Fig. 2[link]. This scheme allows the higher resolution bins to contain more reflections than the lower resolution bins and more detailed binning at low resolution without increasing the total number of bins. An additional reason for using logarithmic binning is that the dependence of the scales on resolution is approximately exponential (see previous sections), which makes the variation of scale factors more uniform between bins when a logarithmic binning algorithm is used. Table 1[link] compares binning performed uniformly in d−3 and in ln(d) spacing for three data sets (PDB entries 3hay , 1kwn and 3gk8 ). Note the data completeness of the low-resolution bins.

[Figure 2]
Figure 2
Flowchart of the logarithmic resolution-binning algorithm.

3.3. Systematic tests

We evaluated the performance of the new scaling protocol by applying it to approximately 40 000 data sets selected from the PDB. The structures were selected by evaluating all PDB entries using phenix.model_vs_data (Afonine et al., 2010[Afonine, P. V., Grosse-Kunstleve, R. W., Chen, V. B., Headd, J. J., Moriarty, N. W., Richardson, J. S., Richardson, D. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2010). J. Appl. Cryst. 43, 669-676.]) and excluding all entries for which the recalculated Rwork was greater than the published value by five percentage points.

To score the test results three crystallographic R factors (46)[link] were computed using all reflections, using only low-resolution reflections and using only high-resolution reflections. Low-resolution reflections were selected using the condition dmin > 8 Å but selecting at least the 500 lowest resolution reflections. High-resolution reflections were taken from the highest resolution bin. Each of the three anisotropic scaling methods (poly, expanal and expmin) was tested independently within each run. Additionally, two other tests were performed: one combining poly and expanal as described in §[link]3.1 (referred to as poly+expanal) and the other using the protocol of Afonine et al. (2005a[Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005a). Acta Cryst. D61, 850-855.]) (referred to as old).

Fig. 3[link] shows a comparison of the alternative methods for determining kanisotropic (see §[link]3.1). Comparing the polynomial model (poly) versus the analytical exponential model (expanal), with a few minor exceptions poly results in slightly lower R factors overall and for the low-resolution reflections, while expanal results in lower R factors for the high-resolution reflections. Comparing poly versus the original exponential model using minimization (expmin), the R factors are very similar overall and for the high-resolution reflections, while poly often results in lower R factors for the low-resolution reflections. Comparing the two different exponential models, expmin results in lower R factors overall and nearly identical results for low-resolution reflections, but expanal results in lower R factors for the high-resolution reflections. Fig. 4[link] compares the new protocol combining poly and expanal with the old protocol. With very few exceptions, the new protocol performs better for all three resolution groups.

[Figure 3]
Figure 3
A comparison of the new scaling protocol using different models for the anisotropic scale factor. R versus R factor scatter plots for (a) poly versus expanal, (b) poly versus expmin and (c) expanal versus expmin R factors were computed using all reflections (left), low-resolution reflections only (middle) and high-resolution reflections only (right). See §[link]3.3 for details.
[Figure 4]
Figure 4
R versus R factor scatter plots comparing the new scaling protocol using poly+expanal for the anisotropic scale factor with the old protocol. For each structure the full set of structure factors available from the PDB was used to calculate scale factors and to calculate R factors (left). Using the same scale-factor values the R factors were calculated separately for the low-resolution reflections (middle) and high-resolution reflections (right). A large spread of points in the vertical direction above the diagonal (red line) in these latter plots indicates that in many cases the scale factors produced by the old protocol resulted in a poorer fit to the data at low and high resolutions, while the new protocol generates scale factors with a good fit across all resolution ranges. See §[link]3.3 for details.

As described above, occasionally the minima of the R-factor function (46)[link] and the LS function (22)[link] are at significantly different locations in the (kmask, kisotropic) parameter space (see Fig. 5[link]). For example, considering kisotropic to be a single-value scalar the pair (kmask, kisotropic) that minimizes the R factor in the low-resolution range of PDB data set 1kwn is (0.2913, 0.0961), while the pair (0.3218, 0.0863) minimizes the LS function. The corresponding R factors are 0.3073 and 0.3372, respectively. The data for PDB entry 1hqw lead to an even more dramatic difference, in which the pairs (kmask, kisotropic) that minimize the R factor and the LS function are (0.25, 0.0131) and (0.6166, 0.0151), respectively, and the corresponding R factors are 0.2924 and 0.5046. We made a similar observation for the overall anisotropic scale kanisotropic, as illustrated in Table 2[link]. For this, the best values for Ucryst were determined via a systematic search for the minima of the functions (3)[link], (11)[link] and (46)[link] for three combinations of structures and high-resolution cutoffs. Note the difference in the optimal Ucryst values and the corresponding R factors.

Table 2
Comparison of Ucryst corresponding to the minima of the functions LS (3)[link], LSL (11)[link] and R factor (46)[link]

To improve readability, the Ucryst are shown as B values with respect to a Cartesian basis (Grosse-Kunstleve & Adams, 2002[Grosse-Kunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477-480.]). To reduce the runtimes for the systematic parameter searches (see text), we have selected examples with symmetry constraints leading to all-zero off-diagonal elements.

PDB code Optimization target B11, B22, B33 R factor
2fih R factor −2.15, −1.85, −1.60 0.1935
LS −4.20, −3.90, −3.35 0.2179
[\widetilde {\rm LSL}] −2.65, −1.95, −1.60 0.1939
2fih (data cut at 2.5 Å) R factor −9.30, −10.20, −10.35 0.2417
LS −18.35, −19.65, −20.75 0.2599
[\widetilde {\rm LSL}] −38.25, −42.15, −46.20 0.3769
1ous (data cut at 6.5 Å) R factor 9.25, −2.20, 4.35 0.2082
LS 2.90, −2.45, 8.60 0.2086
[\widetilde {\rm LSL}] 19.55, 6.55, 12.85 0.2088
[Figure 5]
Figure 5
Plots of R factors (with kisotropic = 0.0961) and the LS function (with kisotropic = 0.0863) for PDB entry 1kwn (left) and R factors (with kisotropic = 0.0131) and the LS function (with kisotropic = 0.0151) for PDB entry 1hqw (right), illustrating that the minima of the R-factor function (46)[link] and the LS function (22)[link] can be at significantly different locations in parameter space. In such cases, a line search around the value of kmask obtained by minimization of the LS function is necessary in order to obtain a value that minimizes the R factor. For plotting purposes, the values of the LS function were scaled to be similar to the R factors.

The parameterization of the total model structure factor (1)[link] does not make any assumption about the shape of kmask; for example, it does not assume it to be exponential (10)[link]. This provides an opportunity to explore the behavior of kmask as a function of resolution and compare it with kmask obtained via (10)[link]. Fig. 6[link] illustrates the differences between the two methods of determining kmask for six representative PDB entries selected from approximately 40 000 entries after inspection of the kmask values. We observe that the plots of the values obtained using our new approach are in general significantly different from the exponential function. This observation is in line with Fig. 1[link] of Urzhumtsev & Podjarny (1995[Urzhumtsev, A. G. & Podjarny, A. D. (1995). Jnt CCP4/ESF-EACMB Newsl. Protein Crystallogr. 31, 12-16.]).

[Figure 6]
Figure 6
Plots of kmask as a function of resolution (s2) for six selected PDB entries. The blue lines show kmask as determined using the new method. The red lines show kmask based on the exponential function (10)[link] using optimized ksol and Bsol parameters.

At very low resolution the structure factors computed from the atomic model are approximately anticorrelated to the structure factors computed from the bulk-solvent mask:

[{\bf F}_{\rm mask} \simeq - p{\bf F}_{\rm calc}. \eqno(47)]

Here, p is a scale factor (Urzhumtsev & Podjarny, 1995[Urzhumtsev, A. G. & Podjarny, A. D. (1995). Jnt CCP4/ESF-EACMB Newsl. Protein Crystallogr. 31, 12-16.]). Relation (47)[link] is the basis for alternative bulk-solvent scaling methods that employ the Babinet principle (Moews & Kretsinger, 1975[Moews, P. C. & Kretsinger, R. H. (1975). J. Mol. Biol. 91, 201-225.]; Tronrud, 1997[Tronrud, D. E. (1997). Methods Enzymol. 277, 306-319.]). Substitution of relation (47)[link] into equation (1)[link] yields

[{\bf F}_{\rm model}\simeq k_{\rm total}(1-p\,k_{\rm mask}){\bf F}_{\rm calc}. \eqno(48)]

Obviously, Fmodel is invariant for any combination of scale factors ktotal and kmask satisfying the condition

[{k}_{\rm total} (1-p\,k_{\rm mask}) = {\rm const}. \eqno(49)]

Since our new scaling procedure determines kmask and kisotropic (which are part of ktotal) simultaneously, without imposing constraints on their values, these scale factors may assume unusual values in the low-resolution range. However, we observe that in practice this only happens for a very small number of the test cases.

4. Discussion

A new method for overall anisotropic and bulk-solvent scaling of macromolecular crystallographic diffraction data has been developed which is an improvement over the existing algorithm of flat (mask-based) bulk-solvent modeling and overall anisotropic scaling, versions of which are routinely used in various refinement packages such as CNS (Brunger, 2007[Brunger, A. T. (2007). Nature Protoc. 2, 2728-2733.]), REFMAC (Murshudov et al., 2011[Murshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355-367.]) and phenix.refine (Afonine et al., 2012[Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352-367.]). In the process of developing this method, we concluded that the bulk-solvent scale factor kmask deviates quite significantly from the exponential model that has traditionally been used. This new method is approximately two orders of magnitude faster than the previous implementation and yields similar or often better R factors. Table 3[link] compares runtimes for a number of selected cases covering a broad range of resolutions and atomic model sizes. Therefore, the computational speed of the new method makes it possible to robustly compute bulk-solvent and anisotropic scaling parameters even as part of semi-interactive procedures.

Table 3
Runtime comparison for selected PDB entries

Absolute runtimes for the new protocol range from a few hundredths of a second to a second.

PDB code Resolution (Å) No. of atoms No. of reflections Speed gain
1us0 0.66 3679 511265 105
1akg 1.10 136 4471 132
1ous 1.20 3784 104889 86
1yjp 1.80 66 495 64
1f8t 1.95 3593 28288 104
1av1 4.00 6588 16201 110
1jl4 3.99 4474 7428 78
2i07 4.0 12157 20412 126
2gsz 4.2 16344 17131 166

An inherent feature of the mask-based bulk-solvent model is that it relies on the existing atomic model to compute the mask. This in turn implies that any unmodeled (as atoms) parts of the unit cell are considered to belong to the bulk-solvent region. This may obscure weakly pronounced features in residual maps such as partially occupied solvent or ligands. This is common to all mask-based bulk-solvent modeling methods, leading to the development of algorithms to account for missing atoms (Roversi et al., 2000[Roversi, P., Blanc, E., Vonrhein, C., Evans, G. & Bricogne, G. (2000). Acta Cryst. D56, 1316-1323.]). In the future, improved maps may be obtained by combining this latter approach with the new fast overall anisotropic and bulk-solvent scaling method that we have presented.

The new method is implemented in the cctbx project (Grosse-Kunstleve et al., 2002[Grosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126-136.]) and is used in a number of PHENIX applications since v.1.8 of the software, most notably phenix.refine (Afonine et al., 2005b[Afonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005b). CCP4 Newsl. Protein Crystallogr. 42, contribution 8.], 2012[Afonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352-367.]), phenix.maps and phenix.model_vs_data (Afonine et al., 2010[Afonine, P. V., Grosse-Kunstleve, R. W., Chen, V. B., Headd, J. J., Moriarty, N. W., Richardson, J. S., Richardson, D. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2010). J. Appl. Cryst. 43, 669-676.]). The cctbx project is available at https://cctbx.sourceforge.net under an open-source license. The PHENIX software is available at https://www.phenix-online.org .

All results presented are based on PHENIX v.1.8.1.

APPENDIX A

Analytical derivation of a one-Gaussian approximation of a one-dimensional discrete data set

Our goal is to approximate a set of data points {Y(x)}Nj = 1 with a Gaussian function,

[a \exp(-bx^{2}). \eqno (50)]

For this, we use the standard approach of minimizing a least-squares (LS) function,

[{\rm LS} = \textstyle \sum\limits_{j = 1}^{N}[Y(x_{j})-a \exp(-bx_{j}^{2})]^{2}. \eqno (51)]

If Y(xj) ≥ 0 ∀ xj, j = 1, N, the minimization of LS can be replaced by the minimization of

[{\rm LSL} = \textstyle \sum \limits_{j = 1}^{N}\{\ln[Y(x_{j})]-\ln[a\exp(-bx_{j}^{2})]\}^{2}. \eqno (52)]

The minimum of this LSL function can be determined analytically,

[\eqalignno {{\rm LSL} &= \textstyle \sum \limits_{j = 1}^{N}\{[\ln(Y(x_{j})]-\ln[a \exp(-bx_{j}^{2})]\}^{2} \cr &= \textstyle\sum \limits_{j = 1}^{N}\{\ln(a)-b{x}_{j}^{2}-\ln[Y(x_{j})]\}^{2}. & (53)}]

Defining u = ln(a), vj = xj2, dj = ln[Y(xj)], we obtain

[{\rm LSL} = \textstyle \sum \limits_{j = 1}^{N}(u-bv_{j}-d_{j})^{2}. \eqno (54)]

The variables {a, b} minimizing the LSL function are determined by the condition

[\cases { {\displaystyle {{\partial {\rm LSL}} \over {\partial u}}} = 0 \cr {\displaystyle {{\partial {\rm LSL}} \over {\partial b}}} = 0.} \eqno (55)]

This leads to

[\cases { - 2\textstyle \sum \limits_{j = 1}^N (u - bv_{j} - d_{j}) = 0 \cr - 2\textstyle \sum \limits_{j = 1}^N (u - bv_{j} - d_{j})v_{j} = 0} \eqno (56)]

and

[\cases {uN - b\textstyle \sum \limits_{j = 1}^N v_{j} - \textstyle \sum \limits_{j = 1}^N d_{j} = 0 \cr u\textstyle \sum \limits_{j = 1}^N v_{j} - b\textstyle \sum \limits_{j = 1}^N v_j^2 - \textstyle \sum \limits_{j = 1}^N v_{j}d_{j} = 0.} \eqno (57)]

Defining p = [\textstyle \sum _{j = 1}^{N}{d}_{j}], q = [\textstyle \sum _{j = 1}^{N}{v}_{j}], r = [\textstyle \sum _{j = 1}^{N}{v}_{j}^{2}] and s = [\textstyle \sum _{j = 1}^{N}{v}_{j}{d}_{j}], we obtain

[\cases {uN-bq-p = 0\cr uq-br-s = 0} \eqno (58)]

and

[\cases {u = {\displaystyle {{1}\over{N}}}(bq+p)\cr b = {\displaystyle {{1}\over{r}}}(uq-s).} \eqno (59)]

From this, we obtain

[u = {{p-{\displaystyle{{sq}\over{r}}}}\over{N-{\displaystyle{{{q}^{2}}}\over{r}}}},\quad b = {{1}\over{r}}\left(uq-s\right) \eqno (60)]

and finally

[a = \exp(u), \quad b = {{1}\over{r}}(uq-s). \eqno (61)]

Acknowledgements

The authors thank the NIH (grant GM063210) and the PHENIX Industrial Consortium for support of the PHENIX project. This work was supported in part by the US Department of Energy under Contract No. DE-AC02-05CH11231.

References

First citationAdams, P. D. et al. (2010). Acta Cryst. D66, 213–221.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAfonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005a). Acta Cryst. D61, 850–855.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAfonine, P. V., Grosse-Kunstleve, R. W. & Adams, P. D. (2005b). CCP4 Newsl. Protein Crystallogr. 42, contribution 8.  Google Scholar
First citationAfonine, P. V., Grosse-Kunstleve, R. W., Chen, V. B., Headd, J. J., Moriarty, N. W., Richardson, J. S., Richardson, D. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2010). J. Appl. Cryst. 43, 669–676.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationAfonine, P. V., Grosse-Kunstleve, R. W., Echols, N., Headd, J. J., Moriarty, N. W., Mustyakimov, M., Terwilliger, T. C., Urzhumtsev, A., Zwart, P. H. & Adams, P. D. (2012). Acta Cryst. D68, 352–367.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationBadger, J. (1997). Methods Enzymol. 277, 344–352.  CrossRef PubMed CAS Web of Science Google Scholar
First citationBerman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242.  Web of Science CrossRef PubMed CAS Google Scholar
First citationBernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535–542.  CrossRef CAS PubMed Web of Science Google Scholar
First citationBrunger, A. T. (2007). Nature Protoc. 2, 2728–2733.  Web of Science CrossRef CAS Google Scholar
First citationFenn, T. D., Schnieders, M. J. & Brunger, A. T. (2010). Acta Cryst. D66, 1024–1031.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFokine, A. & Urzhumtsev, A. (2002a). Acta Cryst. A58, 72–74.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationFokine, A. & Urzhumtsev, A. (2002b). Acta Cryst. D58, 1387–1392.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGiacovazzo, C. (1992). Fundamentals of Crystallography. Oxford University Press.  Google Scholar
First citationGrosse-Kunstleve, R. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 477–480.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGrosse-Kunstleve, R. W., Sauter, N. K., Moriarty, N. W. & Adams, P. D. (2002). J. Appl. Cryst. 35, 126–136.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationGrosse-Kunstleve, R. W., Wong, B., Mustyakimov, M. & Adams, P. D. (2011). Acta Cryst. A67, 269–275.  Web of Science CrossRef IUCr Journals Google Scholar
First citationJiang, J. S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100–115.  CrossRef CAS PubMed Web of Science Google Scholar
First citationKostrewa, D. (1997). CCP4 Newsl. Protein Crystallogr. 34, 9–22.  Google Scholar
First citationLiu, D. C. & Nocedal, J. (1989). Math. Program. 45, 503–528.  CrossRef Web of Science Google Scholar
First citationMoews, P. C. & Kretsinger, R. H. (1975). J. Mol. Biol. 91, 201–225.  CrossRef PubMed CAS Web of Science Google Scholar
First citationMurshudov, G. N., Skubák, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011). Acta Cryst. D67, 355–367.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationNye, J. F. (1957). Physical Properties of Crystals. Oxford: Clarendon Press.  Google Scholar
First citationPhillips, S. E. (1980). J. Mol. Biol. 142, 531–554.  CrossRef CAS PubMed Web of Science Google Scholar
First citationRoversi, P., Blanc, E., Vonrhein, C., Evans, G. & Bricogne, G. (2000). Acta Cryst. D56, 1316–1323.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationSavitzky, A. & Golay, M. J. E. (1964). Anal. Chem. 36, 1627–1639.  CrossRef CAS Web of Science Google Scholar
First citationShakked, Z. (1983). Acta Cryst. A39, 278–279.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationSheriff, S. & Hendrickson, W. A. (1987). Acta Cryst. A43, 118–121.  CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationTronrud, D. E. (1997). Methods Enzymol. 277, 306–319.  CrossRef CAS PubMed Web of Science Google Scholar
First citationUrzhumtsev, A. G. (2000). CCP4 Newsl. Protein Crystallogr. 38, 38–49.  Google Scholar
First citationUrzhumtsev, A., Afonine, P. V. & Adams, P. D. (2009). Acta Cryst. D65, 1283–1291.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationUrzhumtsev, A. G. & Podjarny, A. D. (1995). Jnt CCP4/ESF–EACMB Newsl. Protein Crystallogr. 31, 12–16.  Google Scholar
First citationUsón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H. J. & Sheldrick, G. M. (1999). Acta Cryst. D55, 1158–1167.  Web of Science CrossRef IUCr Journals Google Scholar
First citationVassylyev, D. G., Vassylyeva, M. N., Perederina, A., Tahirov, T. H. & Artsimovitch, I. (2007). Nature (London), 448, 157–162.  Web of Science CrossRef PubMed CAS Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Journal logoBIOLOGICAL
CRYSTALLOGRAPHY
ISSN: 1399-0047
Volume 69| Part 4| April 2013| Pages 625-634
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds