Scrutinizing the protein hydration shell from molecular dynamics simulations against consensus small-angle scattering data

Biological macromolecules in solution are surrounded by a hydration shell, whose structure differs from the structure of bulk solvent. While the importance of the hydration shell for numerous biological functions is widely acknowledged, it remains unknown how the hydration shell is regulated by macromolecular shape and surface composition, mainly because a quantitative probe of the hydration shell structure has been missing. We show that small-angle scattering in solution using X-rays (SAXS) or neutrons (SANS) provide a protein-specific probe of the protein hydration shell that enables quantitative comparison with molecular simulations. Using explicit-solvent SAXS/SANS predictions, we derived the effect of the hydration shell on the radii of gyration Rg of five proteins using 18 combinations of protein force field and water model. By comparing computed Rg values from SAXS relative to SANS in D2O with consensus SAXS/SANS data from a recent worldwide community effort, we found that several but not all force fields yield a hydration shell contrast in remarkable agreement with experiments. The hydration shell contrast captured by Rg values depends strongly on protein charge and geometric shape, thus providing a protein-specific footprint of protein–water interactions and a novel observable for scrutinizing atomistic hydration shell models against experimental data.


Introduction
Water molecules play key roles in protein functions such as folding, molecular recognition, enzymatic activity, and proton transfer. [1][2][3] During such functions, water interacts with the geometrically rough and chemically heterogeneous protein surface by the formation of hydrogen bonds with polar and ionic groups as well as by long-ranged Coulomb and Van-der-Waals forces. Protein-water interactions together with water-internal interactions lead to the formation of a water layer with different structural and dynamic properties as compared to bulk water, termed protein hydration shell. The modified water dynamics in the hydration shell have been studied by NMR and Terahertz spectroscopy, time-dependent fluorescence Stokes shift, inelastic neutron scatting, molecular dynamics (MD) simulations, and several other techniques. [4][5][6][7][8][9][10][11][12][13] These data revealed that geometric constraints and the hydrogen bond network lead to a mild slowdown of water dynamics by factors of 3-5 and to an increased water ordering as compared to bulk water. 14 The importance of protein-water interactions is further augmented in crowded cellular environments, where macromolecules typically adopt 25% to 40% of the total volume. 15 In such environments, up to 70% of the total water is part of a biomolecular hydration shell, 3 demonstrating that biology largely involves non-bulk-like water. 16 Whereas the dynamics of the protein hydration shell has been investigated in great quantitative detail by spectroscopy, the overall structure and contrast of the hydration shell is far less understood. Small-angle scattering (SAS) with X-rays (SAXS) or neutrons (SANS)

Results
Explicit-solvent SAS calculations reveal the hydration shell effect on R g The three-dimensional solvent density around xylanase is illustrated in Fig. 1B, computed from a simulation carried out with the ff99SBws protein force field and the TIP4P/2005s water model. 36,39 The density reveals the first hydration layer, which is structured by the formation of favorable interactions with the protein surface (orange mesh), as well as the second hydration layer (blue mesh), which is more dispersed. By averaging the density over the protein surface, the solvent density is obtained as a function of the distance from the Van-der-Waals surface of xylanase, revealing, in addition to the pronounced first and second hydration layer a shallow third layer at a distance of ∼7Å (Fig. 1C), as reported by many previous MD studies (Ref. 19 and references therein). Thus, explicit-solvent MD simulations yield the structure of the hydration shell that differs from the structure of bulk solvent and, thereby, manifests as a modified radius of gyration R g detected by SAS experiments. 17 Using explicit-solvent SAS calculations, 29,40 we computed from MD simulations SAXS curves, SANS curves in H 2 O, and SANS curves in D 2 O as function of momentum transfer q, where q = 4π sin(θ)/λ with the scattering angle 2θ and the wavelength λ of the X-ray beam (Fig. 1D). Two approaches may be used to extract R g from the SAS intensity curves  Figure 1: (A) Simulation of xylanase obtained with ff99SBws and TIP4P/2005s water. Water molecules within the envelope (blue surface) contribute to SAS calculations. Water outside of the envelope is not shown for clarity. The protein is shown in green cartoon, water as red/white sticks. (B) Electron density of solvent inside the envelope in shades from light grey (bulk water) to blue to orange, revealing the first (orange) and the second (mostly blue) hydration layers. (C) Solvent density versus distance R from the VdW surface of the protein, averaged over the protein surface, revealing two pronounced and a third weak hydration shell. (D) Calculated intensity curves for SAXS (purple), SANS in H 2 O (orange), and SANS in D 2 O (blue) obtained from MD simulations. Curves are shown in absolute units of e 2 for SAXS and squared neutron scattering lengths (nsl 2 ) for SANS. Inset: Guinier plots of SAS curves (colored lines) and linear fits (dotted black lines) used to obtain the SASderived radii of gyration R g . (E) Difference between SAS-derived R g relative to the R g of the pure protein (R Prot g ) for SAXS, SANS/H 2 O, and SANS/D 2 O (color code as in panel D). R g differences were computed from simulations with restrained heavy atoms (left), restrained backbone (middle), or from free MD without restraints (right). (F) Differences between R g from SAXS relative to SANS/H 2 O (pink), and from SAXS relative to SANS/D 2 O (grey). All R g differences are a footprint of the protein hydration shell. the forward scattering intensity (Fig. 1D, inset); or (ii) via the the pair distance distribution function (PDDF), also referred to as P (r) function, which is obtained from the SAS curve via a regularized inverse Fourier transform, 41,42 providing the radius of gyration via R 2 g = r 2 P (r) dr/ 2 P (r) dr . From the simulations, we report R g obtained with the Guinier fit, yet we validated the agreement with the R g obtained from the PDDF. Both, R g and I 0 are influenced by contrast of the hydration shell relative to the bulk solvent. However, because the experimental uncertainties of R g are by far smaller as compared to uncertainties of I 0 , we validated MD simulations against experimental R g values in this study.
Since our SAXS and SANS calculations take explicit water molecules in the hydration shell into account, the R g and also I 0 are fully controlled by the water and protein force fields (together with MD parameters such as cutoffs). We quantified the effect of the hydration shell on the R g by computing the difference ∆R g = R SAS g − R Prot g between the R g from the SAS curve, R SAS g , and R g calculated from the atomic positions of protein atoms, R Prot g .
The ∆R g values calculated from simulations of xylanase with restraints on heavy atoms or on backbone atoms or from free MD simulations are shown in Fig. 1E (upper panel), demonstrating that the hydration shell modulates R g of xylanase by up to 0.9 Å.
R g difference from SAXS relative to SANS/D 2 O enables quantitative comparison

between MD simulations and SAS experiments
The effect of the hydration shell on R g is different in SAXS as compared to SANS experiments ( Fig. S2). Since X-rays scatter at the electrons whereas neutrons scatter at the nuclei, SAXS curves report on the electron density contrast, whereas SANS curves report on the contrast of the neutron scattering length density. Many globular proteins exhibit a hydration shell with an increased electron density relative to the bulk solvent. 17,29,43 For such proteins, both, the protein and the hydration shell exhibit a positive electron density contrast relative to the bulk (Fig. S2A), leading in a SAXS experiment to an increased R g (∆R g > 0, Fig. 1E, purple bars). For SANS in D 2 O, the protein exhibits a negative contrast of the neutron scattering length density whereas the hydration shell exhibits a positive contrast relative to bulk, resulting typically in a decreased R g (∆R g < 0, Fig. S2B, Fig. 1E, blue bars). For SANS in H 2 O, the contrast of the protein is positive whereas the contrast of the hydration shell is close to zero, leading to a small influence by the hydration shell on R g (∆R g ≈ 0,  Table S1.
Comparison of the hydration shell from 18 force field combinations with consen-

sus SAS data
Next, we studied the effect of 18 different combinations of force fields for protein and water on the hydration shell, as quantified by ∆R g and ∆R SAS g values. We considered widely used force field combinations such as CHARMM36m-TIP3P 31,44 as well as, to dissect effects of the protein force field relative to the water model, uncommon combinations such as CHARMM36m-SPC/E 31,45 (Table S1). Figure Experiment D2O  SPC/E  cTIP3P  TIP3P  TIP3P-FB  OPC3  OPC  TIP4P-D  a99SBdisp-water  TIP4P/2005  TIP4P/ suggesting that the simulation with CHARMM36m-OPC adopted an unusual conformation, as confirmed by visual inspection of the trajectory (Fig. S6)     Together, these observations suggest that ∆R SAS g values represent a footprint of proteinwater interactions that is independent of previously considered observables, thus providing an additional observable for validating and further improving protein-water interactions in MD simulations.
Irrespective of the applied force field, ∆R SAS g values differed considerably between different proteins, in agreement with the data of the round-robin SAS study. 26 The highly anionic glucose isomerase exhibited the largest ∆R SAS g among the five proteins considered in this study, indicative of a tightly packed hydration shell. These findings are in line with a SAS study of a highly anionic GFP variant 20 and demonstrate that the anionic aspartate and glutamate residues impose a densely packed hydration shell. Among the four proteins with zero or with a small net charge, lysozyme exhibited larger ∆R SAS g values as compared to urate oxidase, xylanase, and RNaseA (Fig. 4B). Urate oxidase exhibits the shape of a hollow cylinder with a large solvent-filled cavity, which may explain the low ∆R SAS g values (Fig. 4B) as well as a nearly vanishing ∆R g for SANS/D 2 O, in contrast to all other proteins (Fig. 4A).
Thus, variations of ∆R SAS g are experimentally accessible footprints of protein-specific hydration shells reflecting specific geometric shapes or distributions of charged and polar moieties on the protein surface.
Since the ∆R SAS g values are in the range of only 1 Å to 2.5 Å, the comparisons presented here require highly accurate SAS data. Considering that SAS data may be subject to minor systematic errors, which may be difficult to detect, SAS data obtained at a single instrument may not yield the required accuracy, even if data collection and analysis follows established quality controls. 24,25 Instead, the use of consensus data collected at different SAS instruments, if possible, by independent researchers, 26 is a rigorous means for obtaining data with unprecedented accuracy and, thereby, enables quantitative validation of the hydration layer as shown here. To validate the hydration shell of other biomolecules such as RNA or protein/RNA complexes, future benchmark studies similar to the round-robin study designed by Trewhella, Vachette, and coworkers would be of utmost value. 26 To enable quantitative comparison with the experiments, the MD simulations should match the experimental conditions and require control calculations. We carefully evaluated the effects of (i) protein flexibility (Figs. 1E/F and S3), (ii) use of salt as compared to use of only counter ions (Fig. S9-S12), (iii) refined sodium-carboxylate interaction parameters ( Fig. S13) and (iv) Lennard-Jones cutoff settings (Fig. S14, see Supplementary Methods).
We found that these factors modulate ∆R SAS g only by a small fraction of an Ångström.
Nevertheless, since such effects are clearly detectable in explicit-solvent SAS predictions of ∆R SAS g , they require consideration upon comparison with experiments.

Conclusions
We showed that the hydration shell contrast, as reported by SAS-derived R g values, strongly depends on the geometric shape and surface composition of proteins, thus providing a probe of protein-specific protein-solvent interactions. As readout of the hydration shell structure, we focused on R g from SAXS relative to SANS experiments in D 2 O (∆R SAS g ), which we computed from MD simulations with explicit-solvent SAS calculations to enable quantitative comparison with experimental SAS data. For many force fields, ∆R SAS g values from MD simulations revealed excellent agreement with consensus SAS data from a recent worldwide round-robin study, 26 suggesting that simulations accurately capture the hydration shell contrast relative to the bulk. Since we furthermore observed differences among force fields, our calculations provide the basis for further improving the accuracy of protein-water interactions in molecular simulations. This study establishes the combination of high-precision SAS experiments with explicit-solvent calculations as tool for scrutinizing atomistic models of the protein hydration shell.

Simulation setup and parameters
Initial structures of lysozyme, RNaseA, xylanase, glucose isomerase, and urate oxidase were taken from the protein data bank (PDB codes: 2VB1, 60  Each protein was simulated using 18 combinations of protein force field and water model (Table S1). Interactions of the proteins were described with one of the following force fields: All MD simulations were carried out with the GROMACS software, version 2020. 3. 70 After 400 steps of minimization with the steepest decent algorithm, the systems were equili-brated for 100 ps with harmonic position restraints applied to the heavy atoms of the proteins (force constant 1000 KJ mol −1 nm −2 ). Subsequently, the production runs were started without restraints on the atoms or with restraints applied to the heavy atoms (force constant 2000 KJ mol −1 nm −2 ) or applied to the backbone atoms (force constant 2000 KJ mol −1 nm −2 ) of the protein. The equations of motion were integrated using a leap-frog algorithm. 71 The temperature was controlled at 298.15 K, using velocity rescaling (τ =1 ps). 72 The pressure was controlled at 1 bar with the Berendsen thermostat (τ =1 ps) 73 and with the Parrinello-Rahman thermostat (τ =5 ps) 74 during equilibration and productions simulation, respectively. The geometry of the water molecules was constrained with the SETTLE algorithm 75 and LINCS 76 was used to constrain all other bond length. A time step of 2 fs was used. Dispersive interactions and short-range repulsion were described by a Lennard-Jones potential.
For simulations with AMBER variants, LJ interactions were cut off at 1 nm. For simulations with CHARMM36m, LJ forces were gradually switched off between 1 nm and 1.2 nm, if not stated otherwise. In simulations with AMBER variants, the pressure and energy were corrected of missing dispersion corrections beyond the cut-off. Neighbor lists were updated with the Verlet scheme. Coulomb interactions were computed with the smooth particle-mesh Ewald (PME) method. 77, 78 We used a Fourier spacing of approx. 0.12 nm, which was optimized by the GROMACS mdrun module at the beginning of each simulation. Systems with restraints on heavy atoms or on the backbone were simulated for 50 ns. Free simulations were carried out for 230 ns.
The 3D solvent density shown in Fig. 1B was computed with the rerun functionality of GROMACS-SWAXS using the environment variable GMX_WAXS_GRID_DENSITY=1 and GMX_WAXS_GRID_DENSITY_MODE=2. 79

Explicit-solvent SAS calculations
The SAXS and SANS calculations were performed with GROMACS-SWAXS (version 2021.5), a modified version of the GROMACS simulation software that implements explicit-solvent SAXS 29 and SANS calculations. 80 GROMACS-SWAXS is furthermore used by the web server WAXSiS for automated explicit-solvent SAXS predictions 30 and is freely available at GitLab (https://gitlab.com/cbjh/gromacs-swaxs). For more details on the rationale behind explicitsolvent SAS calculations including differences relative to implicit-solvent SAS calculations, we refer to previous reviews. 79,81 A spatial envelope (Fig. 1A) was constructed at a distance of 9 Å from all protein atoms. Solvent atoms (water and ions) inside the envelope contributed to the calculated SAXS/SANS curves, thereby taking the hydration shell into account. The buffer subtraction was carried out using 2251 simulations frames of pure solvent simulation box, which was simulated for 50 ns and large enough to enclose the envelope. The orientational average was carried out using 200 q-vectors for each absolute value of q, and the solvent electron density was corrected to the experimental water density of 334 e/nm 3 , as described previously. 29 In this study, a small number of only 200 q-vectors per absolute value of q was sufficient because we computed the SAS curves only up to small angles to carry out the Guinier analysis. The density correction is required because certain water models, in particular TIP3P or cTIP3P, lead to solvent densities that differ from the experiential value.
No fitting parameters owing to the hydration layer or excluded solvent were used, implying that the radius of gyration was not adjusted by the fitting parameters but fully imposed by the force field (together with other MD parameters such as cutoffs, temperature, etc.). SAXS and SANS curves were computed from 2251 simulation frames taken from the time interval between 5 ns and 50 ns or between 30 ns and 230 ns for restrained and free simulations, respectively. Statistical errors of calculated SAS curves were obtained by binning the trajectories into 10 time blocks of 4.5 ns or 20 ns for simulations with or without restraints, respectively. Likewise, the pure-solvent simulations were binned into independent blocks.
Then the SAS curves were computed from independent pairs of solute and pure-solvent trajectories. Critically, the use of independent pure-solvent trajectory blocks is mandatory to exclude correlations between the SAS curves computed from time blocks. Reported error bars denote one standard error. SAXS data reported by the round-robin benchmark revealed only a marginal effect (if any) upon replacing H 2 O with D 2 O in SAXS experiments. 26 Thus, for the prediction of SANS/D 2 O curves, we did not use force fields for heavy water 82 or force fields that would account for the deuteration of amino acids. Instead, we assigned the neutron scattering length of deuterium to water hydrogen atoms and to polar protein atoms. Hydrogen atoms of the amine backbone groups were assumed to be deuterated with a probability of 90%.
To test whether computed R g values from Guinier analysis agree with R g values from the P (r) function, we computed one SAXS curve of xylanase up to q = 3 nm −1 , obtained the P (r) with GNOM, 83 and computed R g from P (r). The R g values from Guinier and P (r) analysis were identical and equaled 1.59 nm. Thus, we used computed R g from Guinier analysis for the remainder of this study for simplicity.

Supporting Information Available
Supplementary Discussion on the choice of SAS data for comparison with MD simulations, comparison with R g values from P (r) vs. Guinier analysis. Supplementary Method with additional control SAS calculations on roles of counter ions outside of the envelope, use of salt vs. only counter ions, cutoff settings for Lennard-Jones parameters, refined sodiumcarboxylate Lennard-Jones parameters (CUFIX), importance of bulk solvent density corrections for R g predictions. Supplementary Figures S1-S14 and Tables S1-S7.