Abstract
Microcrystal electron diffraction (MicroED) is a powerful technique utilizing electron cryo-microscopy (cryo-EM) for protein structure determination of crystalline samples too small for X-ray crystallography. Electrons interact with the electrostatic potential of the sample, which means that scattered electrons carry informing about the charged state of atoms and can provide strong contrast for visualizing hydrogen atoms. Accurately identifying the positions of hydrogen atoms, and by extension the hydrogen bonding networks, is of importance for drug discovery and electron microscopy can enable such visualization. Using subatomic resolution MicroED data obtained from triclinic hen egg-white lysozyme, we identified hundreds of individual hydrogen atom positions and directly visualize hydrogen bonding interactions and the charged states of residues. Over a third of all hydrogen atoms are identified from strong difference peaks, the most complete view of a macromolecular hydrogen network visualized by electron diffraction to date. These results show that MicroED can provide accurate structural information on hydrogen atoms and non-covalent hydrogen bonding interactions in macromolecules. Furthermore, we find that the hydrogen bond lengths are more accurately described by the inter-nuclei distances than the centers of mass of the corresponding electron clouds. We anticipate that MicroED, coupled with ongoing advances in data collection and refinement, can open further avenues for structural biology by uncovering and understanding the hydrogen bonding interactions underlying protein structure and function.
Main
Microcrystal electron diffraction (MicroED) has been successful in structure determination of crystalline biological specimens using electron cryo-microscopy (cryo-EM) (Nannenga, Shi, Leslie et al., 2014; Nannenga, Shi, Hattne et al., 2014; Yonekura et al., 2015), including novel structures (Rodriguez et al., 2015; Sawaya et al., 2016; Xu et al., 2019; Clabbers et al., 2021), as well as difficult to crystallize membrane proteins in detergents and lipids (Liu & Gonen, 2018; Martynowycz et al., 2020; Martynowycz, Shiriaeva et al., 2021). As electrons interact more strongly with matter than X-rays (Henderson, 1995), the crystal volume required for useful diffraction is typically about a million times smaller. Electrons are scattered by the electrostatic potential and the strength of scattering depends on the charged state of atoms (Cowley, 1995). The effects of charge distribution are already apparent at moderate to low resolution (Yonekura et al., 2015, 2018), and the charged state of residues in macromolecules has previously been investigated using electron crystallography (Kimura et al., 1997; Mitsuoka et al., 1999; Yonekura et al., 2015).
Electrostatic potential maps obtained from electron scattering can provide strong contrast for identifying hydrogen atoms, which has enabled localizing hydrogens in electron diffraction structures of small molecule organics and peptide fragments (Rodriguez et al., 2015; Sawaya et al., 2016; Dorset, 1995; Palatinus et al., 2017; Gruene et al., 2018; Jones et al., 2018; Clabbers et al., 2019; Takaba et al., 2021). Identifying the positions of hydrogen atoms and visualizing their resulting hydrogen bonding networks are crucial for understanding protein structure and function such as resolving precise drug or ligand binding interactions (Purdy et al., 2018; Clabbers et al., 2020; Martynowycz, Shiriaeva et al., 2021) or elucidating mechanisms for substrate transfer in membrane protein transporters and channels (Gonen et al., 2005; Liu & Gonen, 2018). In single-particle cryo-EM imaging, individual hydrogen atom positions were localized from reconstructions of apoferritin at 1.2 Å resolution (Nakane et al., 2020; Maki-Yonekura et al., 2021) and for the GABAA receptor at 1.7 Å resolution (Nakane et al., 2020). Here, hydrogen atoms were identified by omitting them from the model and inspecting the peaks in a calculated Fo – FC difference map following refinement in Servalcat based on crystallographic refinement routines implemented in REFMAC5 (Murshudov et al., 2011; Yamashita et al., 2021). Since resolution is a local feature in cryo-EM, the accuracy of hydrogen identification varies across the map.
Visualizing hydrogen atoms in macromolecular crystallography generally requires (sub-) atomic resolution data, and the accuracy of localizing hydrogens varies with local structural flexibility which is reflected by the temperature factors. Typically, crystals of macromolecules are more disordered than peptides or small molecules and have a much higher solvent content. Therefore, in absence of atomic resolution data, identification of hydrogen atoms in macromolecular MicroED structures has remained elusive.
Identifying hydrogen atoms in MicroED data
Recently, we reported the structure of triclinic hen egg-white lysozyme at 0.87 Å resolution using electron-counted MicroED data (Martynowycz, Clabbers, Hattne et al., 2021). MicroED data were collected from 16 crystal lamellae and the structure was phased ab initio as described previously (Supplementary Fig. 1) (Martynowycz, Clabbers, Hattne et al., 2021). Following density modification individual atoms could be resolved at sub-Ångström resolution, enabling automated model building of the entire structure without reference to a previously determined homologous model (Martynowycz, Clabbers, Hattne et al., 2021). The improvement in data accuracy and resolution in this study compared to previous efforts was realized by combining focused ion-beam milling to produce approximately 300 nm thin crystalline lamellae ideal for cryo-EM at 300 kV (Martynowycz, Clabbers, Unge et al., 2021), and collecting data in electron-counting mode at a significantly reduced exposure of only 0.64 e-.Å-2 per crystal dataset (Martynowycz, Clabbers, Hattne et al., 2021). A low exposure rate is required for electron counting as it ensures that the rate of scattered electrons remains within the linear range of the camera. Lowering the total exposure also reduces the effects of radiation damage that can affect the structural integrity of the protein and the ability to localize hydrogen atoms (Hattne et al., 2018; Leapman & Sun, 1995).
Crystallographic quality indicators and data completeness plotted as function of resolution for triclinic lysozyme at 0.87 Å resolution (see Supplementary Table 1).
We set out to further refine the ab initio model resulting from automated building against the subatomic resolution MicroED data to closely examine individual hydrogen atom positions. First, the structural model was refined using electron scattering factors, isotropic atomic displacement parameters, and the default riding hydrogen model in REFMAC5 (Murshudov et al., 2011). Twelve alternate side-chain conformations were modeled upon visual inspection using Coot (Emsley et al., 2010), and their occupancies were refined. The model was then refined using anisotropic B-factors until convergence (Supplementary Table 1). A crystallographic mFO – DFC difference map was calculated using a model without hydrogen atoms (Yamashita et al., 2021). Peaks in the difference hydrogen omit map at ≥2.0σ were then identified using PEAKMAX (Winn et al., 2011), and those within 0.5 Å distance from any idealized riding position were identified as potential hydrogen atoms. In this manner, we located 376 out of 1067 possible hydrogen atoms corresponding to about 35% of the entire structure. Lowering the threshold to 1.0σ revealed a total of 562 hydrogen atom positions, approximately 53%. At contour levels below 2.0σ, the difference map is nosier, increasing the chance of false positives and making it more challenging to unambiguously identify peaks as hydrogen atoms. Nevertheless, these results are the most complete hydrogen bonding network visualized to date by macromolecular MicroED (Table 1).
Visualizing hydrogens and hydrogen bonding networks
Overall, the protein main chain is expected to be more rigid than the side chains; we consequently expect more hydrogen atoms to be found in the backbone than in the protein side chains. At the 2.0σ threshold, we identified 61 out of 141 possible Cα-H hydrogens and 76 out of 127 peptide N-H hydrogen bonds corresponding to approximately 43 and 60% of the entire structure, respectively (Table 1, Supplementary Tables 2, 3, and 8). The backbone hydrogen atoms are structurally important and can be involved in forming and stabilizing secondary structural elements via non-covalent hydrogen-bonding interactions. For example, the structure of lysozyme has two short antiparallel ß-strands and we could identify three strong difference peaks at >3.0σ indicating the positions of those hydrogen atoms involved in hydrogen bonding interactions (Fig. 1a, Supplementary Video 1). The average N-H distance in the β-strands is 1.14(26) Å distance, and the distance between the amide group hydrogen donor and carbonyl acceptor is 2.76(9) Å (Table 2). Interestingly, whereas the Asp52 and Gly54 N-H distances are close to the idealized positions, the difference peak for the Asn44 N-H is located at an almost equal distance shared between the donor and Asp52 carbonyl acceptor (Fig. 1a, Table 2). The structure of lysozyme is further composed of several short helices, and we could identify a total of 15 hydrogen bonding interactions in the three standard α-helices (Table 2). For example, in the longest 12-residue α-helix we identified 6 out of 10 possible hydrogen bonds based on strong difference peaks at >2.7σ (Fig. 1b, Supplementary Video 2). The average hydrogen atom peptide N-H distance for the α-helices is 0.97(14) Å with an average distance between donor and acceptor of 2.84(13) Å (Table 2).
Difference peaks for individual hydrogen atoms are displayed as green spheres with their σ values shown for (a) two short anti-parallel β-strands (residues 42-45 and 51-54, respectively), and (b) an α-helix (residues 88-101). Hydrogen atoms were assigned from a hydrogen-only omit map for peaks at ≥2.0σ that are within 0.5 Å distance from their idealized position. Hydrogen bonding interactions are indicated by dashed black lines and their respective bond distances and angles are listed in Table 2. Electrostatic potential 2mFo-DFc maps are contoured at 4.0σ (blue) and mFo-DFc difference maps are shown at 2.5σ (green and red for positive and negative, respectively). Carbon atoms are shown in brown, nitrogen in blue, and oxygen in red.
Higher flexibility and alternate conformations can affect localizing hydrogen atoms in the side chains. Nevertheless, we could successfully localize side-chain hydrogen atoms in the data and identify several hydrogen-bonding interactions between side-chain atoms (Fig. 2, Supplementary Table 2). For example, a difference peak at 2.4σ can be resolved between His15-NE2 and Thr89-OG1 indicating a possible shared hydrogen bond between both side chains (Fig. 2a). As expected at pH 4.5, the data shows the solvent-exposed histidine to be protonated at ND1, although the hydrogen distance and angle are different from idealized geometry (Fig. 2a). Another example of hydrogen bonding interactions is illustrated for Tyr53-OH acting as a hydrogen donor to Asp66-OD1 with a strong difference peak at 3.4σ (Fig. 2b, Supplementary Table 2).
Hydrogen atoms (green difference peaks) are shown with their σ values for the side chains of different residues and for several water molecules. Hydrogen atoms were assigned from a hydrogen-only omit map for peaks at ≥2.0σ and within 0.5Å from their idealized positions. (a) Strong difference peaks indicate hydrogen atom positions for His15, as well as a possible hydrogen bond interaction between His15-NE2 and a neighboring Thr89-OG1. The histidine residue appears to be protonated at ND1 which is consistent with pH 4.5 of the crystallization condition. (b) Hydrogen atoms are indicated by difference peaks for two residues, as well as a potential hydrogen bonding interaction between Tyr53-OH and Asp66-OD1. (c) Acidic side-chains showing well resolved atoms. Strong difference peaks for side chain hydrogen atoms can be observed in asparagine and glutamine residues. (d) Illustration of a hydrogen bonding network involving water molecules and several protein residues. The inset shows the hydrogen bond distances for the water molecules. Electrostatic potential 2mFo-DFc maps are contoured at (a) 2.5σ (blue) and (b-d) at 3.0σ (blue), mFo-DFc difference maps are shown at 2.3σ (green and red for positive and negative, respectively). Carbon atoms are shown in brown, nitrogen in blue, and oxygen in red.
In single-particle cryo-EM, it was previously observed that acidic side chains were poorly resolved at moderate to low resolution owing to radiation damage and due to the rapid fall off of the electron scattering factors for negatively charged atoms at lower scattering angles (Yonekura et al., 2015, 2018; Maki-Yonekura et al., 2021). In the MicroED data, the acidic aspartate and glutamate residues and their negatively charged side-chain carboxyl groups are generally well resolved (Fig. 2c). Additionally, clear difference peaks at >2.3σ were identified in the data for the amide side-chain nitrogen for asparagine and glutamine residues, making it possible to clearly distinguish between the nitrogen and oxygen atoms of the side-chain amide group (Fig. 2c).
Difference peaks were also identified for several water molecules that are involved in hydrogen bonding interactions with the protein backbone and side chains (Fig. 2d, Supplementary Video 3). Such hydrogen bonding networks can act as long-range proton transfer wires. For example, a water molecule is coordinated with the adjacent Ser91, Leu56, and Tyr53 residues and shows two strong difference peaks at ≥2.7σ (Fig. 2d). Two additional water molecules show hydrogen atom peaks at ≥2.2σ and are involved in hydrogen bonding interactions with each other and residues of the neighboring protein backbone (Fig. 2d). The O-H hydrogen bond lengths and angles of the water molecules are reasonably close to ideal values, except for one O-H distance for w1001 which is significantly shorter at 0.64 Å. The hydrogen bond distance between the w1001-O proton donor and the Tyr53-O proton acceptor is however close to ideal values at 2.75 Å.
Hydrogen bond distances
The sheer numbers of hydrogens visualized in this study allow us to measure and report hydrogen bond distances in a way previously not possible in cryoEM (Supplementary Tables 2–10; Figure 3). Electrons are scattered by the potential field generated from electron clouds and the nuclei; similar to neutron diffraction. The peaks in an electrostatic potential map are therefore expected to reflect the inter-nuclei distances more than distances between centers of mass of electron clouds as observed in X-ray diffraction. We refined the structure using the default riding hydrogen model based on hydrogen distances between the electron cloud centroids using restraints derived from X-ray scattering. We analyzed the identified hydrogen atom difference peaks in the data at ≥2.0σ and calculated the average distance for each of the hydrogen bond types (Table 1, Figure 3, Supplementary Tables 2–10). The number of observations for some bond types is insufficient for a rigorous statistical analysis. We do however find an average Cα-H distance for the main chain of 1.11(13) Å for 61 hydrogen bonds, compared to idealized values of 0.98 and 1.10 Å for X-ray and neutron diffraction, respectively (Table 1, Figure 3, Supplementary Table 3). The average distance for all N-H bonds is 1.03(16) Å for 83 observations, compared to idealized values of 0.98 and 1.10 Å for X-ray and neutron diffraction, respectively (Supplementary Tables 2 and 8). Interestingly, the distances for the amide N-H bonds that are involved in hydrogen bonding interactions with neighboring residues are slightly longer compared to those that are not involved in such electrostatic interactions (Table 1, Figure 3).
Hydrogen bond distances in Å are shown as histogram plots with a normal distribution fitted to the data. Idealized hydrogen bond lengths between electron cloud centroids used in X-ray diffraction are indicated by a teal dotted line, idealized inter-nuclei hydrogen bond lengths used in neutron diffraction are indicated by the orange dotted line (see also Table 1, Supplementary Tables 2, 3, 6–8) (Gruene et al., 2014).
These results suggest an elongation of the hydrogen bond lengths compared to the electron cloud centroid distances assumed in the riding hydrogen mode, although the number of observations for each type is rather limited and the standard deviations from the mean value are quite large (Table 1, Figure 3). Nevertheless, we find an overall trend that the Cα-H and N-H bond lengths are closer inter-nuclei distances (Gruene et al., 2014; Williams et al., 2018). This observation agrees with previous electron diffraction and imaging experiments that show an apparent elongation of the hydrogen bond lengths compared to X-ray diffraction (Clabbers et al., 2019; Takaba et al., 2021; Nakane et al., 2020; Maki-Yonekura et al., 2021). Refinement of structural models derived from electron scattering would therefore benefit from more appropriate restraints specific for electrons, including a more accurate riding hydrogen model, as well as taking the electrostatic potential of the crystal into account (Yonekura et al., 2015, 2018; Murshudov et al., 2011; Yamashita et al., 2021; Gruene et al., 2014; Williams et al., 2018).
Conclusions
The results demonstrate that hydrogen atom positions can be accurately identified in macromolecular MicroED data. As with X-ray crystallography, this will typically require atomic resolution data or better (Walsh et al., 1998; Howard et al., 2004; Wang et al., 2007; Eriksson et al., 2013; Ogata et al., 2015). In comparison, the structure of triclinic lysozyme was determined previously using X-ray diffraction at 120 K and room temperature to 0.93 and 0.95 Å resolution, respectively (Walsh et al., 1998). The single-crystal low-temperature structure is of high quality and generally has more clearly visible hydrogen atoms than the room temperature model merged from three crystal datasets. Difference maps contoured at 1.9σ visualize hydrogen atoms in residues within the better-defined regions of the structure, and at 1.8σ contour level, 77 out of 127 peptide N-H atoms (61%) are identified (Walsh et al., 1998). The number of hydrogen atoms localized in the low-temperature structure is similar to the MicroED structure at comparable resolution, even though the intensity and model statistics are worse (Supplementary Table 1, Supplementary Fig. 1) (Martynowycz, Clabbers, Hattne et al., 2021). The higher level of inaccuracy in the MicroED data can in part be attributed to non-isomorphism from merging of 16 crystal datasets and lower completeness in the highest resolution shells (Supplementary Fig. 1). Additional factors that contribute to the errors are inelastic scattering and inaccurate modeling of the electron form factors and the electrostatic potential in structure refinement (Yonekura et al., 2015, 2018; Murshudov et al., 2011; Yamashita et al., 2021). Compared to X-ray diffraction, electrons are expected to provide better contrast for identifying hydrogen atoms at a similar resolution as the scattering factors fall off less steeply with decreasing atomic number. The lighter hydrogen atoms are therefore expected to be better resolved next to the heavier atoms, which might explain why we can identify many hydrogen atoms even though the MicroED data appear noisier. This is further supported by a comparison between apoferritin models from X-ray crystallography and single-particle cryo-EM showed that hydrogen atoms were visibly more clearly in the latter (Yamashita et al., 2021). More recently, a significantly higher resolution structure of triclinic lysozyme was solved ab initio at 0.65 Å by X-ray diffraction (Wang et al., 2007). At this resolution, approximately 31% of all hydrogen atoms in main and side chains could be identified at 3.0σ or higher. We would anticipate major improvements in hydrogen atom localization in MicroED data upon further increasing the resolution.
Previously, hydrogen atoms were successfully identified in protein complexes by single-particle cryo-EM. In comparison to the results presented here, these studies reported that about 70% of the expected number of hydrogen atoms could be identified above a threshold level of 2.0σ using hydrogen-only omit maps from atomic resolution reconstructions of apoferritin (Yamashita et al., 2021; Maki-Yonekura et al., 2021). Remarkably, about 17% of possible hydrogen atoms could be identified from data as low as 1.84 Å resolution (Yamashita et al., 2021). In imaging, the phase information is retained and during reconstruction, images are filtered to remove noise and to select a specific conformational state. The resolution is therefore a local feature of the map whereas the B-factor is a global parameter applied in map sharpening or blurring. This is unlike a crystallographic map, where the resolution is a global feature of the entire dataset and structural flexibility or disorder is modeled locally using alternate conformation and per atom refined B-factors. In crystallography, resolving detailed features such as hydrogen atoms is affected by local disorder. In the MicroED structure, twelve residues are modeled with alternate side-chain conformations at low occupancy, making it more challenging to identify hydrogen at these positions. The mean B-factor over all atoms in the model is 11.98 Å2, and the majority of outliers are on the outside of the protein facing the solvent. Lower completeness in the higher resolution shells and non-isomorphism from merging data of multiple crystals could both contribute to increased B-factors. Especially the last two C-terminal residues have high temperature factors, these were also poorly resolved in the high-resolution X-ray diffraction structure (Walsh et al., 1998). Indeed, most hydrogens can be identified within the more stable core of the protein relative to the residues on the outside facing the solvent having higher flexibility and B-factors.
In all, 377/1067 (35%) hydrogen atoms could be located at ≥2.0σ and we illustrate several examples of well resolved hydrogen atom positions and hydrogen bonding interactions between protein residues and solvent molecules. This is the most complete hydrogen network map for macromolecular MicroED data to date, and these results provide a glimpse of the information that can be obtained by electron scattering, opening up new avenues for further experiments investigating hydrogen bonding networks in protein structures. At the current stage, the difference map becomes increasingly noisy at contour levels below 2.0σ, making it more challenging to unambiguously identify hydrogen peaks. Future efforts that can enhance the localization of hydrogen atoms should be focused on improving data accuracy and resolution even further. Energy filtration can improve data quality by discarding inelastically scattered electrons, improving the detection of weak peaks at high resolution and at the lower scattering angles that are shaded by the direct beam (Yonekura et al., 2015, 2019). It would also mean the exposure could be lowered even further without losing the weak signal from high-resolution reflections to the noise of the background. Energy filtering does not exclude multiple elastic scattering which may affect the measured kinematic intensities (Fujiwara, 1959; Cowley, 1995). For any typical hydrated protein crystal, these effects are suggested to be far less detrimental to data quality compared to inelastic scattering (Latychevskaia & Abrahams, 2019; Martynowycz, Clabbers, Unge et al., 2021). Dynamical structure refinement can enhance the localization of hydrogen atoms in small molecule structures (Palatinus et al., 2017), but its implementation is computationally expensive and has yet to be extended to macromolecules that include bulk solvent that cannot be modeled. In recent experiments, recording MicroED data using a direct electron detector in electron counting mode significantly improved data quality, and we expect further benefits from faster readout and better electron-counting algorithms using electron-event representation (Guo et al., 2020; Nakane et al., 2020).
Methods
Crystallization and sample preparation
Crystalline lamellae of triclinic lysozyme were prepared as described previously (Martynowycz, Clabbers, Hattne et al., 2021). Briefly, crystals of hen egg-white lysozyme (Gallus gallus) were grown by dissolving 10 mg/ml protein in a solution of 0.2 M sodium nitrate and 50 mM sodium acetate at pH 4.5. After incubation overnight at 4 °C an opaque suspension was observed. After further incubation for one week at room temperature a crystalline slurry appeared containing microcrystals. Samples were prepared by depositing 3 μl of the crystalline slurry onto a glow-discharged EM grid (Quantifoil, Cu 200 mesh, R2/2 holey carbon). Excess liquid was blotted away and the sample was vitrified using a Leica GP2 vitrification robot. Grids were transferred to an Aquilos dual-beam FIB/SEM (Thermo Fisher) and crystals were milled to lamellae with an optimal thickness of approximately 300 nm as described previously (Martynowycz, Clabbers, Hattne et al., 2021; Martynowycz, Clabbers, Unge et al., 2021).
Data collection and processing
Electron-counted MicroED data were collected on a Titan Krios 3Gi TEM (Thermo Fisher) operated at 300 kV as described previously (Martynowycz, Clabbers, Hattne et al., 2021). Briefly, the TEM was set up for low exposure data collection using a 50 μm C2 aperture, spot size 11, and a beam diameter of 25 μm. A 100 μm SA aperture was used, corresponding to an area of 2 μm diameter on the specimen. Crystal lamellae were continuously rotated over a range of 84° at a rotation speed of 0.2°/s over 420s with a total exposure of approximately 0.64 e-.Å-2 per dataset. Data were recorded on a Falcon 4 direct electron detector (Thermo Fisher) in electron counting mode operating at an internal frame rate of 250 Hz. Data of 16 crystal lamellae were integrated, scaled and merged using XDS (Kabsch, 2010) and AIMLESS (Evans & Murshudov, 2013). The structure was phased ab initio by placing a three-residue idealized α-helix fragment using PHASER (McCoy et al., 2007) followed by density modification in ACORN (Foadi et al., 2000). The entire structure was built automatically using BUCCANEER (Cowtan, 2006) and refined in REFMAC5 (Murshudov et al., 2011) using electron scattering factors.
Identification of hydrogen atoms
The structure was manually inspected and remodeled using Coot (Emsley et al., 2010), and re-refined with REFMAC5 (Murshudov et al., 2011) using electron scattering factors. Hydrogen atoms were added in idealized riding positions. A hydrogen-only omit was calculated from the final structural model by REFMAC5 (Murshudov et al., 2011). Peaks in the mFo – DFC difference map at a threshold above 2.0σ were identified and listed using PEAKMAX in the CCP4 software package (Winn et al., 2011). Difference peaks that fell within 0.5 Å of the idealized distance for the known positions were assigned as hydrogen atoms.
Figure preparation
Figures were prepared using ChimeraX and assembled in powerpoint and photoshop.
Data availability
Coordinates and structure factors have been deposited to the PDB.
Acknowledgements
This study was supported by the National Institutes of Health P41GM136508. The Gonen laboratory is supported by funds from the Howard Hughes Medical Institute.