Locating macromolecular assemblies in cells by 2D template matching with cisTEM

For a more complete understanding of molecular mechanisms, it is important to study macromolecules and their assemblies in the broader context of the cell. This context can be visualized at nanometer resolution in three dimensions (3D) using electron cryo-tomography, which requires tilt series to be recorded and computationally aligned, currently limiting throughput. Additionally, the high-resolution signal preserved in the raw tomograms is currently limited by a number of technical difficulties, leading to an increased false-positive detection rate when using 3D template matching to find molecular complexes in tomograms. We have recently described a 2D template matching approach that addresses these issues by including high-resolution signal preserved in single-tilt images. A current limitation of this approach is the high computational cost that limits throughput. We describe here a GPU-accelerated implementation of 2D template matching in the image processing software cisTEM that allows for easy scaling and improves the accessibility of this approach. We apply 2D template matching to identify ribosomes in images of frozen-hydrated Mycoplasma pneumoniae cells with high precision and sensitivity, demonstrating that this is a versatile tool for in situ visual proteomics and in situ structure determination. We benchmark the results with 3D template matching of tomograms acquired on identical sample locations and identify strengths and weaknesses of both techniques, which offer complementary information about target localization and identity.


Abstract 23
Over the last decade, single-particle electron cryo-microscopy has become one of the main 24 techniques contributing to the growing library of high-resolution structures of macromolecules 25 and their assemblies. For a full understanding of molecular mechanisms, however, it is important 26 to place them into the broader context of a cell. Traditionally, this context can be visualized in 27 3D by electron cryo-tomography, and more recently, has also been studied by template matching 28 of 2D images of cells and viruses. A current limitation of the latter approach is the high 29 computational cost that limits the throughput and widespread adoption of this method. We 30 describe here a GPU-accelerated implementation of 2D template matching in the image 31 processing software cisTEM that allows for easy scaling and improves the accessibility of this 32 approach. We apply 2D template matching to identify ribosomes in images of frozen-hydrated 33 Mycoplasma pneumoniae cells and demonstrate that it can function as a versatile tool for in situ 34 visual proteomics and in situ structure determination. We compare the results with 3D template 35 matching of tomograms acquired on identical sample locations. We identify strengths and 36 weaknesses of both techniques which offer complementary information about target localization 37 and identity. 38

Introduction 39
A major goal in structural biology over the last 70 years has been to understand the molecular 40 mechanisms of biological processes that occur inside cells by studying the underlying proteins 41 and their assemblies, collectively referred to here as complexes, and the many biochemical 42 reactions catalyzed by them. X-ray crystallography and electron microscopy, in particular 43 electron cryo-microscopy (cryo-EM), have generated high-resolution density maps of these 44 complexes that could be interpreted by atomic models, allowing a detailed description of 45 molecular mechanisms (Berman et al., 2002). Of these two structural techniques, cryo-EM has 46 emerged as the more versatile, being applicable to 2D and 3D crystals, helical assemblies, non-47 crystalline material (single particles), cells and tissues. All cryo-EM techniques have greatly 48 benefited from technical advances over the last decade, including direct electron detectors with 49 improved image quality and speed allowing for movie data collection (Brilot et  algorithms (Lyumkis et al., 2013;Scheres, 2012), leading to the so-called "resolution revolution" 52 (Kuhlbrandt, 2014). Using single-particle cryo-EM, it has become easier to obtain near-atomic 53 resolution structures of a broad range of complexes, following earlier resolution breakthroughs 54 with 2D crystals, helical assemblies, and highly symmetrical virus particles (Grigorieff & 55 Harrison, 2011; Hasler et al., 1998;Sachse et al., 2008). A particular strength of the single 56 particle technique, as opposed to crystallography, is its ability to sort particles according to their 57 3D structure using image classification techniques, and to deliver structures of several distinct 58 states of a complex observed under near-native conditions (Abeyrathne et al., 2016). This 59 structural "deconvolution revolution" has been equally important in driving the success of cryo-60 EM, and often yields a more complete picture of a mechanism than can be obtained from the best 61 resolved structure of a single state. 62 While the list of high-resolution structures is rapidly growing, a full understanding of molecular 63 mechanisms, and of the role these complexes play within the host organism, requires the broader 64 context of the cell (Alberts, 1998). This context is necessary to understand the functional 65 coupling of different complexes, for example by transient interaction between them, by efficient 66 shuttling of substrates and products, or by a common regulatory mechanism (e.g., transcription-67 translation coupling (O'Reilly et al., 2020), turnover of nuclear pore complexes (Allegretti et al.,68 2020) or co-translational protein transport (Braunger et al., 2018)). More native conditions can 69 be maintained in a sample by employing rapid purification of cell lysate (Behrmann et al., 2015). 70 The rapid purification aims to better preserve transient interactions and states compared with 71 traditional purification techniques, but at the expense of purity. The "impure" sample can then be 72 purified in silico by modern image classification techniques that are now commonplace in cryo-73 EM. To achieve close-to native conditions, cryo-EM can also be used to image molecules and 74 complexes directly inside frozen-hydrated cells and tissue at high resolution, making it one of the 75 most promising approaches to add cellular context to structures that have been visualized in 76 vitro. To date, the most developed cryo-EM technique to visualize the "molecular sociology" of 77 cells (Beck & Baumeister, 2016) is electron cryo-tomography (cryo-ET) (Oikonomou & Jensen,78 2017). 79 Similar to single-particle cryo-EM, cryo-ET has seen significant development over the past 80 decade (Wan & Briggs, 2016), including the extension of subtomogram averaging to sub-nm 81 resolution (Schur et al., 2013). By adapting techniques from single-particle cryo-EM to work on 82 small 3D volumes within a tomogram, subtomogram averaging can boost the resolution of 83 reconstructions of complexes visualized in situ, thus avoiding the difficulties related to sample 84 purification (A. Bartesaghi  in the tomogram, which is still one of the major bottlenecks for the method (Pfeffer & Mahamid,91 2018; Zhang, 2019). This can be accomplished by matching with a 3D template density 92 (Frangakis et al., 2002), which may be obtained from a small preliminary set of manually 93 selected subtomograms, or from a different experiment, for example single-particle cryo-EM. 3D 94 template matching (3DTM) has been particularly successful in the study of large cytoplasmic 95 complexes, such as ribosomes, proteasomes and chaperonins (Eibauer et al., 2012;Pfeffer et al., 96 2018) and even proteasomes inside the nucleus, close to nuclear pore complexes (Albert et al.,97 2017). However, the nucleus as a whole has proven to be more challenging, due to molecular 98 crowding and the higher density produced by nucleic acids in cryo-EM images (Spahn et al., 99 2000). This points to a more fundamental problem in cryo-ET: density annotation depends 100 critically on recognizing the overall shape, i.e., the particle envelope, which may be obscured in 101 regions of the cell with high molecular density, or in regions where several complexes connect to 102 form a single continuous density (Grünewald et al., 2002). 3DTM is also susceptible to false-103 positive detections as the limited resolution of a tomogram (~20 Å (Frank, 2006)) makes 104 correlation coefficients relatively insensitive to internal structure. 105 Recently, we described a 2D template matching (2DTM) technique (Rickgauer et al., 2017) that 106 may overcome some of the limitations of 3DTM. It matches projections of 3D templates to 107 features found in single-exposure (2D) images of nominally untilted specimens. Avoiding 108 multiple exposures and high specimen tilt angles helps preserve the high-resolution signal in 109 these 2D images (Brilot et al., 2012), and therefore, 2DTM can utilize this signal to detect 110 complexes with high specificity, as well as high angular and positional in-plane accuracy (x,y 111 coordinates). This added signal comes at the expense of increased structural noise in the images 112 due to overlapping density from other molecules in the cell, and a relatively large error in 113 localizing the depth of the targets within the sample (z coordinate). 2DTM also requires a fine-114 grained angular search composed of millions of reference projections and correlation maps to be 115 calculated, therefore making the computational workload of a 2D template search relatively high 116 compared to a more coarse-grained search that is normally done with 3DTM. We describe here 117 an implementation of 2DTM in the software package cisTEM (Grant et

GPU-accelerated 2DTM implemented in cisTEM 131
The large search space required for 2DTM makes this method computationally demanding; a 132 single 1850 x 1850 pixel image required 1000 CPU-hours for a search in the proof-of-principle 133 MATLAB implementation (Rickgauer et al., 2017). To make 2DTM accessible to more users 134 and a broader range of biological questions, we implemented 2DTM in cisTEM (Grant et al.,135 2018) (Figure 1a) using C++ and achieved a roughly 23x speed-up compared to the MATLAB 136 implementation. The core 2DTM algorithm is unchanged from its original description as 137 depicted in the flowchart in Figure 1 -figure supplement 1. The cisTEM implementation may 138 be run as a standalone from the command line interface, or alternatively, using the cisTEM 139 graphical user interface (GUI). In addition to a user-friendly interface, the GUI affords several 140 advantages: Firstly, project metadata and image-specific information, such as CTF estimation, 141 are tracked in a database; secondly, the search is easily divided over many CPUs or computers 142 using cisTEM's MPI-like dispatch via run-profiles; and thirdly, the results are displayed in an 143 interactive manner allowing for easy interpretation and comparison across multiple search 144 conditions. A screenshot showing the results of a template matching search in the cisTEM GUI is 145 shown in Figure 1a. The GUI displays the image searched (Figure 1b), maximum intensity 146 projection (MIP) (Figure 1c) and the plotted results (Figure 1d) sphere to divide the search space (Figure 2a). The GPU implementation exploits further 157 parallelism via threading, which combined with CUDA streams allows for multiple kernels to 158 execute on the GPU simultaneously (Figure 2b). This in turn allows us to create a dynamic load 159 balancing that results in full occupancy of the GPU over a wide range of problem sizes. 160 To evaluate the performance improvements of our GPU 2DTM implementation relative to the 161 CPU 2DTM implementation, we compared two high-end GPUs (Nvidia GV100) against two 162 high-end 28-core CPUs (Intel Xeon Platinum 8280) installed in the same general-purpose 163 workstation, with all other hardware and variables unchanged. By this metric, the GPU-164 accelerated implementation of 2DTM achieved an 8.5x speed up relative to the CPU-only 165 implementation (Figure 2c). Using IEEE 754 half-precision floating point values (FP16) for the 166 arrays used to track search statistics, namely the pixel-wise sum and sum-of-squares over all 167 orientations, resulted in further acceleration and a reduced memory footprint. The total speed-up 168 was 10.5x (Figure 2c). The algorithm scales nearly linearly with the number of GPUs used, as 169 shown for different NVIDIA GPU architectures in Figure 2d. The relative computational cost of 170 each step of the GPU-accelerated 2DTM inner loop algorithm is detailed in Figure 2 - Figure  189 3a) and identified 6,558 50S large ribosomal subunits in 220 2D images. This search did not 190 distinguish between isolated subunits and subunits bound to the 30S small ribosomal subunit. 191 The output from a 2DTM search are SNR values that depend on both the agreement between the 192 template and the target as well as noise in the image (Sigworth, 2004). The noise is 193 predominantly shot noise, as well as background generated by molecules and other cellular 194 material overlapping the target in projection. A target is detected when the SNR value exceeds a 195 threshold at which the average number of false positives per search is set to a user-specified 196 number, usually one (Rickgauer et al., 2017). To improve the match between template and 197 recorded signal, the template can be low-pass filtered to approximate physical blurring of the 198 image due to radiation damage and beam-induced motion, as well as discrepancies between the 199 target in the template, for example due to conformational differences. To low-pass filter the 200 template, we applied a range of B-factors to the initial map generated using pdb2mrc (( Tang et  201 al., 2007), see Methods) and found that the average SNR was maximized using a B-factor of 202 about 85 Å 2 (Figure 3 -figure supplement 1a). Using this value, we found that searching an 203 image composed of exposure-weighted frames (32 e -/Å 2 ) (Grant & Grigorieff, 2015) increased 204 the average SNR compared to using only the first 8 frames (12.8 e -/Å 2 ) with or without exposure 205 weighting (Figure 3 -figure supplement 1b (Figure 3 -figure supplement 1c). In addition to increasing the number of detected 212 targets, the defocus search also provides a rough estimate of the out-of-plane position of each 213 50S. Earlier simulations predicted that protein background would result in reduced target 214 detection in images collected at higher defocus (Rickgauer et al., 2017). In the present 215 experiment, we did not observe a consistent relationship between mean SNR and image defocus 216 over the range of ~500 to ~2200 nm underfocus (Figure 3 -figure supplement 1d), suggesting 217 that detection of complexes as large as 50S ribosomal subunits is not prevented by collecting 218 images further from focus. 219 220 De novo structure determination using 2DTM 221 In our current 2DTM implementation, each detected target is assigned an x,y location, three 222 Euler angles and a defocus value (z coordinate) (Figure 3a). These parameters can be used to 223 calculate a 3D reconstruction using standard single-particle methods. It is well known that a 224 reconstruction calculated from particles that were identified using a template may suffer from 225 significant template bias (Henderson, 2013), reproducing only features of the template and not 226 the targets. However, as shown by Rickgauer et al. (2017) and discussed below, applying an 227 absolute (rather than relative) threshold based on a known noise distribution, also limits template 228 bias and new features not present in the template may be visible in a reconstruction derived from 229 detected targets. To test this, we calculated a reconstruction using the results from searching 220 230 images of M. pneumoniae cells (see above), selecting targets only from the best images that had 231 more than nine detected targets and at least one target with an SNR value above 9. Using the 232 5,080 targets that met these criteria, out of the total of 6,558 detected targets, we calculated a 3D 233 reconstruction (Figure 3b). The 20 Å-filtered reconstruction reproduces the 50S template as 234 expected, but also shows clear, albeit weaker density for the 30S subunit that was not present in 235 the template, and thus cannot be due to template bias (Figure 3b,c). 236 The difference map calculated using diffmap (Grigorieff, 2021b), between the M. pneumoniae 237 50S template and the 3D reconstruction shows density in regions of the 50S that have been 238 shown to be flexible from in vitro reconstructions of the ribosome, specifically, around the L1 239 stalk and the L7/L12 stalk (Figure 3c). The Escherichia coli L1 stalk moves ~45 -60Å relative 240 to 50S during translocation (reviewed in (Ling & Ermolenko, 2016)), and the C-terminal domain 241 of L10 and N-terminal domain of L12 are known to be flexible (Diaconu et al., 2005). Moreover, 242 we observed density in each of the three tRNA binding sites on the small subunit, which were 243 not present in the template (Figure 3d). The density is consistent with tRNAs representing 244 multiple sates and may also reflect the binding of other factors, such as translational GTPases in 245 the A-site. It is therefore likely that the calculated reconstruction represents an average of 246 different conformations adopted by the ribosome in vivo, as expected in actively growing cells. 247 This average differs therefore from the single conformation used for the template, giving rise to 248 the features observed in the difference map. Smaller features in the difference map may also 249 correspond to noise in the reconstruction, as well as features that result from inaccuracies in the 250 resolution scaling of the template against the reconstruction before subtraction. More accurate 251 scaling may be achieved by scaling according to local resolution estimates, which were not 252 obtained here, as well as masking to exclude parts in both maps that do not overlap. 253

2DTM reveals species-specific structural features 255
To test the specificity of 2DTM and to evaluate whether species-specific structural differences to ~4.7Å. Using B. subtilis 50S as a template, we identified 2,874 50S locations, less than half 263 that were identified using M. pneumoniae template (Figure 4b), and with significantly lower 264 2DTM SNR values (P <0.0001, Mann-Whitney U test, Figure 4c). Again, to limit targets to the 265 best images, we included only images that had more than two detected targets and at least one 266 target with an SNR value above 9. Using the 1,172 targets from these images, we calculated a 3D 267 reconstruction (Figure 4d). As before, the 20 Å-filtered reconstruction reproduces the 50S 268 template with clear density for the 30S subunit and shows features consistent with translating 269 ribosomes (Figure 4d-e). However, unlike before, the 3D reconstruction also shows extensive 270 differences in the 50S relative to the B. subtilis 50S template (Figure 4e). Since this 271 reconstruction was generated using ~5 fold fewer particles, some of the additional density likely 272 reflects supplement 1c). We conclude that a B. subtilis 50S template can be used as a homology model 284 to identify M. pneumoniae ribosomes and distinguish species-specific features, and that 285 including high-resolution signal in 2DTM does not overly bias the 3D reconstruction when a 286 high SNR threshold is used. These results further demonstrate the reliability of the 2DTM and its 287 potential to directly resolve complex structure in situ. 288 289

Detection of 50S ribosomal subunits by 3DTM 290
Cryo-ET combined with 3DTM is currently one of the most commonly used approaches for 291 locating molecules within cells using available structural information. Rather than using high-292 resolution templates to search 2D images, as is done in 2DTM, 3DTM locates molecules in 293 tomograms using templates filtered to 30 Å or lower (Himes & Zhang, 2018). To compare the 294 detection of 50S ribosomal subunits by 2DTM and 3DTM, we collected 19 2D images of M. 295 pneumoniae cells followed by tomograms of an overlapping area. These tomograms were 296 collected after the 32 e -/Å 2 exposure for images used with 2DTM, using standard protocols 297 (Figure 5a). The pre-exposure of 32 e -/Å 2 affects the signal in the tomograms at higher 298 resolution but is expected to have only a small effect in the resolution range relevant for 3DTM, 299 i.e., 20 Å and lower (Grant & Grigorieff, 2015). To identify 50S subunits with 3DTM, we 300 followed established protocols using PyTOM and a 30 Å low-pass filtered 50S template 301 generated in a previous study (O'Reilly et al., 2020). The found targets were ranked according to 302 their cross-correlation score and the top 600 hits were selected for each tomogram, followed by 303 alignment and 3D classification with RELION 3.0.8 (Zivanov et al., 2018). The selection of the 304 top 600 hits ensured that more than 90% of potential targets in one tomogram, including free 50S 305 and 50S in assembled 70S, were included. Combining RELION classification results with cross-306 correlation ranking (Figure 5 -figure supplement 1) shows that the hits with the lowest scores 307 contained fewer than 10% of 50S and 70S targets. 308 We aligned the coordinates of the 652 50S subunits identified by 2DTM with 983 50S subunits 309 identified by 3DTM (see Methods), both containing 50S targets corresponding to single 50S 310 subunits and 70S ribosomes (Figure 5b). Within the area common to the 2D images and 311 tomograms, we identified 576 2DTM targets that were within a 100-Å distance in the x,y plane, 312 and within 20° in each Euler angle of a 3DTM target. These limits included ~95% of all paired 313 targets (Figure 5c-d). The paired targets represent ~90% of all 2DTM targets and ~60% of all 314 3DTM targets in this area (Figure 5c, upper). We found that the proportion of 2DTM targets 315 with a corresponding 3DTM target was consistent across the images examined (Figure 5d). In 316 contrast, the proportion of the 3DTM targets detected in the 2DTM search was variable and 317 showed a negative correlation with sample thickness (Figure 5e). This is consistent with our and 318 prior observations that 2DTM is particularly sensitive to sample thickness (Figure 5 - figure  319 supplement 2) (Rickgauer et al., 2017(Rickgauer et al., , 2020, which likely contributes to the high false negative 320 rate of 2DTM relative to 3DTM. 321 The number of targets detected in 2DTM depends on the SNR threshold, which is determined by 322 the desired false-positive rate. Assuming a Gaussian noise distribution, we calculated a threshold 323 SNR of 7.61 to allow for a false positive rate of one per image (Rickgauer et al., 2017). To test if 324 a lower SNR threshold improves the agreement between 2DTM and 3DTM, we determined the 325 proportion of detected 2DTM 50S targets with a matched 3DTM target at different SNR 326 thresholds (Figure 5f). Decreasing the SNR threshold to 7.00 increases the number of detected 327 targets >2-fold (from 652 to 1442), but the number of 2DTM targets with a corresponding 328 3DTM target increased only ~1.2-fold (from 576 to 714) (Figure 5c, lower). The plot of 329 matching targets in Figure 5f shows that at SNR thresholds >7.6, the proportion of 2DTM / 330 3DTM paired targets was ~90-100%, while lowering the SNR threshold below 7.6 resulted in a 331 sharp decrease, indicating that below this threshold, more non-matching targets were found than 332 matching targets. The value of 7.6 agrees closely with the 7.61 threshold at which one false 333 positive per image is expected (Figure 5f, dashed line). We conclude that while lowering the 334 SNR threshold below this value increases the number of true targets identified by 2DTM, based 335 on validation by matching 3DTM targets (Figure 5c, lower), the lower threshold also introduces 336 more non-matching targets, many of them likely false positives. This experimentally validates 337 our use of a Gaussian noise model and the Neyman-Pearson threshold criterion and shows that 338 the SNR threshold can be used to set a desired false positive rate suitable for a particular 339 experimental design. 340 At a threshold of 7.61, the number of 2DTM targets not detected in the 3DTM search was 76, >4 341 times higher than the expected 19 false positives for the 19 searched images and representing 342 ~12% of the 2DTM peaks (Figure 5g). This is consistent with the estimated 3DTM false 343 negative rate of ~10% ( Figure 5 -figure supplement 1), and suggests that these may represent 344 false negatives in the 3DTM search. We also noted that in samples of ~100 nm we detected 345 ~90% of the 50S identified with 3DTM (Figure 5e) i.e., larger than the rotational range of the 30S relative to the 50S. Of the 700 50S locations 370 searched, 314 (~45%) had detectable 30S (Figure 6b), fewer than the ~70% of the 3DTM targets 371 that were classified as 70S using RELION. We speculate that this is due to structural 372 discrepancies that likely exist between the 30S template and specific instances of the small 373 ribosomal subunit in our images. 30S is known to undergo significant conformational changes 374 during the functional cycle of the ribosome in addition to inter-subunit rotation (see Discussion). 375 We also compared the x,y locations (Figure 6c), z coordinates (Figure 6e) and orientations 376 (Figure 6d) assigned by 2DTM to those assigned by 3DTM. Comparing the 2DTM results with 377 the refined 3DTM results shows that ~95% (576/603) of the targets aligned in x and y had an 378 angular distance of 20° or smaller, with a median of 7° (using Eq. (1)) (Figure 6d). The median 379 in-plane distance of 12 Å is likely due to deformation of the sample under the electron beam, 380 which is more pronounced with the higher electron dose used in tilt series, and to a lesser extent 381 due to the lower resolution of 3DTM. In contrast to the x,y locations, which were calculated in 382 steps of one pixel (1.27 Å), the defocus values of 2DTM were sampled in 200 Å steps and, 383 correspondingly the median out-of-plane distance between detected 2DTM and 3DTM targets 384 was much higher at +/-84 Å (Figure 6e). We used the program refine_template (see Methods) to 385 perform a local defocus refinement, which reduced the median out-of-plane distance to +/-59 Å 386 (Figure 6e), less than 1/3 the diameter of the 50S. The 5-fold greater error in z relative to the x,y 387 plane is consistent with the previously reported weaker dependence of the 2DTM SNR values on 388 defocus (Rickgauer et al., 2017). 389 We note that some of the 50S subunits detected by 2DTM overlap in the image and were found 390 in comparable locations by 3DTM (e.g., Figure 6f- advantage of using a template that matches the structure of the particle to be reconstructed is 443 more discriminatory picking, and the possibility to exclude contaminants and other features in an 444 image that do not represent valid particles. Using a template for picking can also lead to template 445 bias, i.e., a 3D reconstruction that reproduces the template rather than the true particle structure 446 (Henderson, 2013;Subramaniam, 2013;Van Heel, 2013). It is therefore important to validate 447 features in a reconstruction derived from particles identified by template matching. This is 448 commonly done by observing features in the reconstruction that were not present in the template, 449 and that are known to be true. For example, templates can be low-pass filtered to 20 Å or lower 450 (Scheres & Chen, 2012), and the emergence of high-resolution features such as secondary 451 structure or amino acid side chains will then validate the reconstruction. In the current release of 452 cisTEM (Grant et al., 2018), a Gaussian blob is used for particle picking, and recognizable low 453 and high-resolution features visible in the reconstruction serve as validation. Since 2DTM uses 454 high-resolution features to identify targets, reconstructions have to be validated differently, for 455 example by identifying additional a priori-known density features that were not present in the 456 template. In the case of using the 50S templates for 2DTM, the reconstructions showed clear 457 density for the 30S subunit, as well as at tRNA binding sites, both of which validate the 458 reconstructions (Figure 3c-d) there are also several smaller differences between the template and the reconstruction within the 467 region of the 50S subunit that can be related to conformational changes occurring during 468 translation (Figure 3c-d To realize an approach that combines 2DTM and 3DTM, a better understanding of how 473 resolution affects 2D and 3D template searches will be needed. In this study, we found that 474 applying a B-factor of 85 Å 2 to the 50S template maximized the mean detection SNR in this 475 dataset (Figure 3 -figure supplement 1a).  Figure 6b). It is unlikely 533 that this low detection rate is due to a bias of 2DTM towards detecting isolated 50S subunits 534 because (i) a 3D reconstruction based on detected 50S subunits shows clear density for the 30S 535 subunit (Figure 3d) and (ii) there was no significant difference in the detection of 50S and 70S 536 targets, as validated by 3DTM (Figure 6 -figure supplement 1). Since the 30S subunit is highly 537 dynamic and can adopt multiple conformations, it is likely that our false negatives arose from 538 conformations that are not sufficiently similar to the 30S template that we used in our search. 539 Below an SNR of 7.2, the number of detected 2DTM targets without a 3DTM match is less than 540 statistically expected (Figure 5g). This suggests that the statistics, which are based on a Gaussian 541 noise model, might suffer from bias, for example because the correlation coefficients calculated 542 for each search location are not completely independent from each other. While further 543 investigation will be needed to model the noise in correlation maps more accurately, the current 544 bias due to residual correlations between search locations results in a slight overestimation of the 545 number of false positives at a given SNR. Incorporating more accurate noise models, and 546 properly accounting for uncertainty in the model (reference structure) will require the 547 replacement of the SNR values currently used to evaluate 2DTM results by a more general 548 probabilistic framework. The use of likelihood ratios, for example, will make 2DTM more 549 robust, as well as open up new avenues to explore molecular heterogeneity. 550

2DTM as a tool to investigate diverse species 552
One of the limitations of 2DTM is a reliance on pre-existing high-resolution structures. Outside 553 of a few model organisms, high-resolution structures are not available for the vast majority of 554 species. We show that, despite differences in their structures, a B. subtilis 50S template can 555 detect 50S in M. pneumoniae cells (Figure 4). Thus, it is possible to use structures from related 556 organisms to identify the locations and orientations of complexes with 2DTM. Mismatches 557 between the template and cellular target resulting from species-specific structures reduce the 558 2DTM SNR (Figure 4c). By comparing the 2DTM SNRs of a series of structures from different 559 species, it may be possible to infer evolutionary relationships in a manner analogous to DNA 560 sequence comparison. Structural comparison with 2DTM would present the additional advantage 561 of defining structural conservation which may not be evident by sequence comparison alone 562 without the need to build a detailed molecular model of homologs in each species. 563 We have shown that it is possible to generate in situ 3D reconstructions of ribosomes from 2D 564 images of cells, without the need to collect a tilt series (Figures 3&4). Template bias (discussed 565 above) becomes especially pertinent when using a template from a different species when there 566 is no existing structure to validate the obtained map. In a 3D reconstruction from 50S located by 567 2DTM with a B. subtilis 50S template we identified density corresponding to M. pneumoniae 568 specific protein structures, and failed to detect density corresponding to a B. subtilis specific 569 protein (Figure 4f-g). This demonstrates that by using a sufficiently high threshold to prevent a 570 preponderance of false positives, targets identified using 2DTM templates from different species 571 can be averaged to generate in situ 3D reconstructions with minimal template bias. Many 572 biologically important organisms are difficult to grow at scales necessary for protein-purification 573 and in vitro structure determination. Cryo-ET is also often limited by computationally 574 demanding image processing, thus hindering routine in situ structure determination. 2DTM could 575 be used as an alternative tool to accelerate in situ structure determination in non-model 576 organisms. 577 578 2DTM and 3DTM complement each other 579 A significant challenge in using cryo-EM to study complexes in situ arises from the thick and 580 crowded nature of the sample. The structural noise from neighboring particles is only correlated 581 at low-resolution in projections recorded at different sample tilts. 3DTM is able to benefit from 582 this indirectly by searching reconstructions which essentially "disentangle" the targets from this 583 structural noise. While overlapping density cannot be removed in single-tilt images used for 584 2DTM, overlapping targets can still be separated based on their different x,y coordinates and 585 defocus values (z coordinate). For example, using 2DTM we were able to discern overlapping 586 50S ribosomes, as validated by comparison with the 3DTM results (Figure 6f-i). Thus, 587 separating overlapping particles does not require collecting a tilt series. Furthermore, because the 588 structural noise is only correlated at low-resolution, a fine-grained search as in current versions 589 of 2DTM, carried out with two or three tilted images may provide a better detection strategy to 590 be implemented in the future. 591 While the results of 3DTM do not seem to be correlated with sample thickness within the tested 592 range (Figure 5 -figure supplement 2), it is difficult to tell if this is due to a better overall 593 performance of 3DTM in thicker samples, for example due to the separation of overlapping 594 densities, or if 3DTM is simply less sensitive to the signal degradation associated with thicker 595 samples, for example by multiple and inelastic scattering. The latter seems more likely since 596 3DTM depends primarily on low-resolution signal, which in cryo-EM data is usually 597 substantially stronger than high-resolution signal and therefore remains detectable in 598 thicker samples despite signal degradation. While the stronger reliance on low-resolution 599 features makes 3DTM less sensitive to sample thickness and image-degrading factors, its high 600 false positive rate requires extensive expert curation, often exceeding the computational run time 601 of the original search. Even where time is not a concern, manual curation is difficult if not 602 impossible in dense regions of a cell, like the nucleus, or for particles smaller than a ribosome 603 that are not easily visually discernible. Particle classification approaches such as those 604 implemented in RELION can be useful for removing false positives. However, their performance 605 on noisy subtomograms of relatively small complexes remains problematic. 606 Using the full electron dose in a single exposure allows for the inclusion of high-resolution 607 information in a 2DTM search, which improves its precision and enables detection in dense 608 molecular environments. However, as we show, this comes at the expense of lower recall in thick 609 samples, and the lack of multiple sample tilts lowers the positional accuracy perpendicular to the 610 image plane. In principle, 3DTM could also take advantage of high-resolution information, but in 611 practice this is currently not achieved, possibly due to increased specimen motion on multiple this has not yet been tested, machine learning may also improve the performance of 2DTM for 619 similar reasons. Better overall performance of template matching may additionally be achieved 620 by combining the high precision of 2DTM with the high recall of 3DTM, making this approach 621 an effective and informative strategy for visual proteomics. In such a strategy, a zero-tilt image 622 with higher dose is collected before collecting a tilt series. The molecules of interest are then 623 identified in both 2D image and 3D tomogram, the 2D search is used to validate hits in the 624 tomogram, and the 3D coordinates provide context in 3D space and higher accuracy in the z 625 coordinates (e.g., Figure 5a- faster than tomography, and may therefore be more suitable for high-throughput studies. Finally, 655 the need for averaging may limit detailed interrogation of molecular structures to more abundant 656 complexes. The spatial organization, structures, composition and functional states of rare 657 complexes can still be studied in cells by 2DTM, provided they are detectable in 2D images. 658 With further improvements, we expect 2DTM to reveal new insights into the molecular 659 mechanisms of biological processes in their native cellular context. were grown on Quantifoil gold grids in modified Hayflick medium and the grid was quickly 666 washed with PBS buffer with 10 nm gold fiducial beads (Aurion, Germany) before plunge-667 freezing. The grids were imaged in a 300 keV Krios TEM (ThermoFisher) equipped with a direct 668 detector (Gatan) and a quantum post-column energy filter (Gatan). Data collection was done 669 using SerialEM (Mastronarde, 2005). 670 For the 2D images paired with tilt-series data, 19 cells were first imaged following the SPA 671 standard, with the magnification of 215,000 (pixel size 0.65 Å) and exposure time of 2 seconds 672 (20 frames, total exposure dose 32 e -/Å 2 ). The target defocus of 0.2 μm to 0.5 μm. Tilt-series 673 were collected using the dose-symmetric scheme (Hagen et al., 2017) with the following 674 settings: magnification 81,000 (pixel size 1.7 Å), tilt range -60° to 60° with 3° increment, 675 exposure time per tilt 1 second, total exposure dose ~129 e -/Å 2 . The target defocus remained the 676 same within tilt-series, and ranged from 1.5 μm to 3.5 µm between tilt-series. 677 A dataset of 220 2D images of M. pneumoniae were collected on a K3 camera (Gatan) and at a 678 magnification of 81,000 (pixel size 1.503 Å). The K3 camera was run in non-CDS mode with the 679 following settings: exposure time 1.678 second, 24 frames, total exposure dose ~31 e -/Å 2 . 680

2D template generation 682
To generate templates for 2DTM, we used the computer program pdb2mrc (Tang et al., 2007) to 683 convert PDB-formatted atomic coordinates into 3D density. To reduce potential aliasing, we 684 initially generated an over-sampled density map at half the pixel size of the image to be 685 searched. The sampling rate was then halved to the final value pixel size by Fourier cropping 686 using the program resample from the cisTEM software package (Grant et al., 2018). The 687 resulting 3D density was low-pass filtered with the standalone program bfactor (Grigorieff,688 2021a) using a B-factor of 85 Å 2 and placed into a 256 x 256 x 256 voxel box, about twice the 689 size of the 50S ribosomal subunit it contained. The final pixel size for searching the 19-image 690 dataset of M. pneumoniae that was compared with tomograms was of 1.27 Å. For computational 691 efficiency, we used a final pixel size of 1.5 Å to search the 220-image dataset. As template 692 models, we used the M. pneumoniae ribosome (PDB: in progress; EMDB 11999) and the B. 693 subtilis ribosome (PDB: 3J9W). We separated the small and large ribosomal subunits and, for the 694 B. subtilis ribosome, removed the mRNA, tRNAs and MifM protein from the coordinate file 695 using custom Perl scripts. The resulting coordinates were aligned with USCF Chimera (Pettersen 696 & Goddard, 2004) to the 50S template used for 3DTM to place them in a common coordinate 697 system. This ensured that the 3D positions and Euler angles found by 2DTM and 3DTM referred 698 to the same coordinate system and could be directly compared. For local 30S searches, the 699 atomic coordinates corresponding to the body of the M. pneumoniae 30S were used to generate a 700 template as described above. For this we combined the results of searches with two templates, 701 one with the proteins and rRNA sequence corresponding to the 30S head and one without, to 702 account for the greater conformational variability of the small subunit relative to the large 703 subunit. 704

2D template matching in cisTEM 705
Movie frames were aligned with unblur using the "optimal exposure filter" (Grant & Grigorieff,706 2015) and including all frames in the final sum, unless otherwise indicated. Defocus was 707 determined by CTFFIND4 (Rohou & Grigorieff, 2015) from within the cisTEM GUI (Grant et  708 al., 2018). Templates for 2DTM were imported into cisTEM as 3D volumes and 2DTM was 709 performed on all images, including a defocus search over a 2400 Å range centered on the 710 average defocus determined by CTFFIND4 with a 200 Å step, using a 2.5° out of plane angular 711 search step and a 1.5° in plane angular search step, assuming C1 symmetry and a minimum 712 target -target distance (peak radius) of 10 pixels (~13 Å). As previously described, the SNR 713 values resulting from 2DTM were further normalized by subtracting the mean of the SNR values 714 for all orientations at each location, and dividing by their standard deviation (Rickgauer et al.,715 2017, 2020). The best peak radius to use in a search depends on the B-factor affecting the signal 716 in the images and may, therefore, vary for different experiments. 717 718

Detection of 30S by local search using 50S 2DTM coordinates 719
To detect 30S subunits bound to 50S subunits, we wrote refine_template, a computer program 720 that is accessible through cisTEM's GUI, or as a command line tool. refine_template reads the 721 3D template and the output files of a template search, including the MIP and alignment 722 parameters for detected targets. Using these data, it calculates template projections and performs 723 a local refinement of the projection parameters. This refinement can be used, for example, to find 724 the defocus values that maximize the cross-correlation coefficient between projection and image. 725 It can also be used to detect molecules and complexes bound to already detected targets. In this 726 case, refine_template is used with the search results obtained for the detected targets, but with 727 the bound complex to be detected as the input template. The locally refined alignment 728 parameters found for the bound complex will be close to those found for the original template, 729 provided both the original template used in the search and the bound complex in the refinement 730 share the same coordinate system. 731 To ensure the coordinate systems of the M. pneumoniae 30S and 50S subunits were aligned, the 732 coordinates for the 30S and 50S subunits were separated from the 70S into two coordinate files 733 and used to generate 3D templates for 2DTM for each of them (see above). For this last step, it 734 was important not to use the -center flag of pdb2mrc to make sure the 3D densities remained in 735 their correct locations relative to the other subunit. 50S targets were then detected using the 50S 736 template, followed by local refinement with refine_template and the 30S template. For the 737 refinement, we searched within an angular range of +/-12.5° using a 2.5° step. A 30S was 738 deemed to be present when the refined x,y coordinates were within 20 Å of the original 739 coordinates found for the 50S template. 740

3D reconstruction using 50S 2DTM coordinates 741
We wrote a new computer program called make_particle_stack, as part of the cisTEM image 742 processing package . It reads the results of a template search, including the MIP with the peaks  743 indicating detected targets, as well as the Euler angles of the detected targets, the original image 744 that was searched, and other imaging parameters (defocus, amplitude contrast, lens aberrations, 745 beam energy and pixel size). On output, it will generate a stack of boxed-out targets and a list of 746 alignment parameters that can be used for 3D reconstruction with cisTEM. Using 747 make_particle_stack and cisTEM, we generated a reconstructions from 5,080 and 1,172 50S 748 targets detected in 220 images of M. pneumoniae cells, using the Euler angles and x,y locations 749 assigned by the M. pneumoniae and B. subtilis 50S template search, respectively. The 750 reconstructions had nominal resolutions of 4.3 Å and 5.5 Å, respectively (FSC threshold of 751 0.143, (Rosenthal & Henderson, 2003), Figure 3 -figure supplement 1e, Figure 4 - figure  752 supplement 1b). However, visible density of known ribosome features (not shown) suggested 753 that this resolution estimate was unrealistically high, as expected due to the well-known effect of 754 template bias (Stewart & Grigorieff, 2004 particles that could not be aligned (Figure 5 -figure supplement 1). 775 776

Comparison of 2DTM and 3DTM 777
To compare targets found by 2DTM and 3DTM, we wrote a new computer program, 778 align_coordinates, as part of cisTEM (Grant et al., 2018), to identify and align target 779 coordinates. align_coordinates assumes that the z-axes of the coordinate systems of the 2DTM 780 and 3DTM searches are aligned, i.e., they refer to the same sample tilt. Therefore, to find the 781 angular alignment and x,y offset between a set of reference coordinates and a second set of 782 sample coordinates, only the x,y coordinates need to be considered. In a first step, 783 align_coordinates calculates all possible vectors between any two x,y coordinates in a coordinate 784 set. A rotation matrix is applied to the sample coordinate vectors, and the vector with the 785 smallest vector difference within a given tolerance is found with respect to each reference vector. 786 The algorithm performs a systematic search of the rotation angle in 0.5° steps to find the rotation 787 that produces the largest number of matching vectors while giving a higher weight to better 788 matching vectors. The result of this systematic search is a rough rotation angle and a list of 789 corresponding coordinates that define a coordinate transform. The search also identifies 790 coordinates in each set that do not have corresponding coordinates in the other set. The 791 coordinate transform is then refined, and the list of corresponding coordinates is updated using a 792 finer local search with angular step of 0.01° and 1-Å steps along the x and y axes. The best 793 coordinate transform is selected based on a least-squares fit between the corresponding 794 coordinates. 795 In the comparison of 2DTM and 3DTM coordinates, the latter were used as a reference set, and 796 the distance threshold for correspondence was set to 110 Å. The final list of corresponding 797 targets was further reduced by limiting distances to 100 Å and below. By providing the 798 dimensions of the field of view of the images used for 2DTM, it was also possible to identify the 799 corresponding area within the larger field of view of tomograms analyzed by 3DTM. For the 800 calculation of percentages of corresponding targets identified by each search, only targets in the 801 overlapping areas were counted. 802 To compare the angular orientations for a given coordinate pair, the 2DTM template was first 803 aligned relative to the 3DTM template (see above), and the Euler angles of the detected targets 804 were recorded. The total angular distance between each orientation, represented by the rotation 805 matrices 2 and 3 , given by the Euler angles, was calculated using the equation (Huynh, 2009)   threads may be requested to subdivide each angular group to maximize occupancy on the GPU. 856 Each host thread queues up a series of GPU kernels into its respective stream, and then returns to 857 calculate the next projection and initiates its transfer to the GPU (green box). This way, close to 858 100% of the CPU and GPU is used during computation. (c) GPU acceleration relative to 859 optimized CPU-based calculation, of which 85% is spent on MKL-based FFTs. Kernel-fusion 860 using cuFFT callbacks and custom data structures combined with flexible kernel launch 861 parameters ensure the GPUs stay saturated, enabling an 8x speedup. A total of 10.5x speedup is 862 achieved by optimizing data throughput using the vectorized FP16 format for storing results. (d) 863 The code scales nearly linearly with the number of GPUs and tracks with the total memory 864 bandwidth of a given model. All timing was obtained using a padded K3 image with 4096 x 865 5832 pixels, and searching one defocus plane with 2.5°/1.5° angular steps. 866    based on the automatic load balancing due to the combination of CUDA streams and mixed 941 kernel launch configurations that restrain some low-complexity operations to a small subset of 942 available streaming multi-processors via grid-stride loops. 943