PAM binding ensures orientational integration during Cas4-Cas1-Cas2 mediated CRISPR adaptation

Adaptation in CRISPR-Cas systems immunizes bacteria and archaea against mobile genetic elements. In many DNA-targeting systems, the Cas4-Cas1-Cas2 complex is required for selection and processing of DNA segments containing PAM sequences, prior to integration of these “prespacer” substrates as spacers in the CRISPR array. We determined cryo-EM structures of the Cas4-Cas1-Cas2 adaptation complex from the type I-C system that encodes standalone Cas1 and Cas4 proteins. The structures reveal how Cas4 specifically reads out bases within the PAM sequence and how interactions with both Cas1 and Cas2 activate Cas4 endonuclease activity. The Cas4-PAM interaction ensures tight binding between the adaptation complex and the prespacer, significantly enhancing integration of the non-PAM end into the CRISPR array and ensuring correct spacer orientation. Corroborated with our biochemical results, Cas4-Cas1-Cas2 structures with substrates representing various stages of CRISPR adaptation reveal a temporally resolved mechanism for maturation and integration of functional spacers into the CRISPR array.


Introduction 21
CRISPR (clustered regularly interspersed palindromic repeats) arrays and Cas (CRISPR 22 associated) proteins constitute adaptive and heritable immune systems that rely on molecular 23 recordings of pathogen invasion, enabling rapid and specific response to future infections 24 (Barrangou et al., 2007;Brouns et al., 2008;Marraffini and Sontheimer, 2008). Adaptation is 25 achieved through the integration of small fragments called spacers acquired from invasive nucleic 26 acids (Jackson et al., 2017;Lee and Sashital, 2022). These spacers provide the basis for RNA-27 guided interference leading to degradation of the foreign nucleic acid upon subsequent exposure 28 2 (Marraffini, 2015). The CRISPR array consists of a leader sequence followed by direct repeats 29 interspersing the acquired spacers, with addition of new spacers occurring almost invariably at the 30 junction of the leader and the first repeat ( halodurans (previously Bacillus halodurans) revealed that Cas4 acts as an endonuclease and 58 cleaves specifically and precisely upstream of PAM sequences within 3′ overhangs of prespacers 59 stranded overhang through the RecB nuclease domain ( Figure 3A). Within the PAM, the two 151 adenine bases adopt a syn conformation, with rotation of the N-glycosidic bond resulting in 152 placement of the adenine bases above the deoxyribose. Polar and non-polar residues including 153 Asn37, Thr40, Glu119, Gln123, Leu195 and Ile28 aid in shape recognition of the PAM nucleotides 154 and formation of the tight PAM recognition channel ( Figure 3B). Each nucleotide of the PAM 155 potentially participates in hydrogen bonding interactions (Figures 3B and 3C). The first 156 nucleotide of the PAM, G29, is in proximity of polar residues His17 and Gln44, with potential 157 formation of a hydrogen bond between His17 and N7 of the G29 purine ring. The second 158 nucleotide of the PAM, A30, is recognized by Ser194 that forms two potential hydrogen bonds 159 with N6 and N7 of the purine ring respectively (Figures 3B and 3C). The PAM is further specified 160 by the last nucleotide A31 that forms two bidentate interactions, which is facilitated by the syn 161 conformation of the adenine base. N1 and N6 of the adenine are in proximity to form hydrogen 162 bonds with Gln16 and N7 and N6 with Gln24 (Figures 3B and 3C). Two hydrophobic residues, 163 Phe20 and Trp34, sit across the phosphodiester backbone. Phe20 is placed after A31 and Trp34 164 lies between G29 and A30 (Figures 3B and 3C), providing stabilizing stacking interactions for 165 the two purine bases that would be minimized in case of pyrimidines. 166 To test the importance of these interactions on PAM recognition and prespacer processing, we 167 introduced alanine substitution at each of these potential interacting residues ( Figure 3D). All 168 substitutions ablated cleavage activity, with the exception of Q44A and S194A, which only 169 partially reduced cleavage. Gln44 is located within helix α4, on the opposite face of the helix that 170 interacts with helix α2 of Cas2 ( Figure 2C). The minimal defect in cleavage for the Q44A 171 substitution suggests that Gln44 is not involved in PAM recognition but potentially facilitates 172 widening of the PAM recognition channel to allow correct positioning of the PAM. Partial 173 reduction in cleavage with S194A further indicates that shape recognition mediated by neighboring 174 residues, rather than hydrogen bonding, plays the major role in recognition of the second 175 nucleotide of PAM, A30 ( Figure 3D). Additionally, we performed saturation mutagenesis to 176 validate the importance of single nucleotides in the PAM sequence ( Figure 3E). We observed that 177 cleavage is severely compromised when we replaced the first or second nucleotides of the PAM, 7 G29 or A30, and is completely lost when the last nucleotide of the PAM A31 is changed. Together, 179 these results strongly suggest that the bidentate recognition of A31 by Gln16 and Gln24 is essential 180 for specific PAM recognition by Cas4 (Figure 3C-E). 181 We compared the PAM recognition motifs for type I-C Cas4-Cas1-Cas2 and type I-G Cas4/1-Cas2 182 complexes (Hu et al., 2021) to understand differences that might affect the overall specificity of 183 the two systems ( Figure 3F). The type I-G adaptation complex lacks specificity at the first position 184 of the PAM, recognizing a 5′-NAA-3′ PAM sequence, in contrast to the relative specificity for 5′-185 GAA-3′ observed in type I-C (Almendros et al., 2019; Rao et al., 2017). Consistently, the 186 equivalent residue to His17 in I-C is Glu18 in I-G, which is positioned slightly further from the 187 guanine, but could form a hydrogen bond with N1 of either adenine in the PAM upon protonation 188 of the Glu18 carboxylate ( Figure 3F). Similarly, Trp34, which stacks with the guanine in I-C, is 189 replaced with Phe35 in I-G. This substitution may decrease the stacking interaction energy and 190 possibly reduce specificity for a purine at this position of the PAM in I-G (Rutledge et al., 2006). 191 Recognition of the third position of the PAM also differs between the two structures. While the 192 adenine in the type I-C structure is stabilized in the syn conformation based on a dual-bidentate 193 interaction with Gln16 and Gln24, the type I-G structure positions the equivalent adenine in an 194 anti conformation, enabling the aforementioned potential hydrogen bonding interaction with 195 Glu18 (Figure 3F). Gln16 of I-C is substituted with Asn17 in I-G. This shorter side chain and 196 slightly further positioning prevents recognition of the adenine via specific hydrogen bonding 197 interactions that are observed in type I-C. Gln24 of I-C is substituted with Leu25 in I-G, which 198 mediates van der Waals contacts with the adenine for shape readout but does not enable specific 199 hydrogen bonding. Overall, these differences suggest that Cas4 may confer higher specificity for 200 a GAA PAM in the type I-C system. 201

Cas4-Cas1-Cas2 complex specifies the length of the prespacer duplex and overhang 202
In addition to the 22 bp duplex prespacer, we also solved the cryo-EM structure of a complex with 203 a 24 bp duplex with 3′ overhangs containing PAMs starting at the sixth position (Figures S1A, 204 S1B, S1G, S1H and S5A-G). The reconstruction has weaker density for one of the active Cas1 205 subunits, while density for the partner subunit is stronger, suggesting conformational heterogeneity 206 ( Figures S5H and S5I). Additionally, density for the prespacer corresponds to a 22 bp duplex, 207 suggesting unwinding of the last base pair at either end of the duplex, and creating a 6 nt overhang 208 8 up to the PAM sequence ( Figure S5I). A conserved tyrosine in Cas1 plays a role in defining the 209 duplex in potential prespacers in other systems like type I-E (Nuñez et al., 2015a). Tyr49 is 210 positioned similarly in the Cas4-Cas1-Cas2 structure, suggesting a similar role for duplex 211 definition in type I-C systems ( Figure 4A). 212 The two Cas4-Cas1-Cas2 structures indicate that a 6 nt single-stranded overhang is necessary to 213 stretch the PAM from the end of the duplex to the Cas4 PAM recognition motif (Figure 1F, S5I). 214 We previously observed that prespacer substrates with 4 nt between the duplex and the PAM were 5A, S1A, S1B, S1D, S1H and S6) substrate or a PAM/processed substrate (Figures 5B, S1A, 233 S1B, S1E, S1H and S7). Both substrates contained a 22-bp duplex with one 15-nt overhang 234 containing a PAM and either another 15-nt overhang without a PAM (PAM/NoPAM substrate) 235 ( Figure 5A and S1B), or a 6-nt overhang simulating a processed non-PAM end (PAM/processed 236 substrate) (Figure 5B and S1B). 237 9 Single-particle analysis of these complexes yielded 3.9 Å and 3.3 Å reconstructions for the 238 PAM/NoPAM and PAM/processed complexes, respectively (Figures 5A and 5B). For both maps, 239 we observed only partial density for one of the two Cas4 subunits, resulting in asymmetrical 240 reconstructions that we have previously observed by negative stain (Lee et al., 2019). Because 241 Cas4 specifically binds to a PAM containing sequence, we concluded that the lobe with stronger 242 density for Cas4 is the PAM end and the lobe with partial density is the non-PAM end of the 243 substrate. We did not observe any significant differences in Cas4 conformation on the PAM end 244 between the PAM/PAM and the PAM/NoPAM or PAM/processed complexes. 245 On the non-PAM end, we observed stronger density for a second Cas4 subunit when the longer 3′-246 overhang was present in the PAM/NoPAM substrate ( Figure 5A). In contrast, the PAM/processed 247 complex nearly completely lacked Cas4 density ( Figure 5B). In the PAM/NoPAM reconstruction, 248 density is present for the iron-sulfur cluster domain and C-terminal helix ( Figure 5A). However, 249 the density is weaker in regions that do not directly contact one of the Cas1 subunits. Notably, we 250 observed a complete lack of density for α4 of Cas4 that typically interacts with a Cas2 α2 on the 251 non-PAM end ( Figure 5C). This lack of density suggests that α4 is conformationally flexible and 252 does not interact stably with Cas2 α2 in the absence of PAM binding. 253 Similar to the PAM end, density for single-stranded DNA on the non-PAM end is traceable from 254 the end of the duplex toward the Cas4 active site when the 15 nt overhang was present in the 255 PAM/NoPAM substrate. However, no density is present past residue 28 of the DNA, consistent 256 with a lack of stabilizing interactions with Cas4 in the absence of a PAM ( Figure 8A). In the 257 presence of the 6 nt overhang in the PAM/processed substrate, we could trace only 2 nt of ssDNA 258 overhang density ( Figure S8B). Overall, these structures suggest that longer DNA overhangs are 259 necessary to anchor Cas4 to the Cas1-Cas2 complex, although Cas4 associates most stably when 260 a PAM is present in the overhang. 261 Based on these structural observations, we hypothesized that Cas4 remains partially associated 262 with Cas1-Cas2 in the presence of long 3′-overhangs and may affect trimming of the non-PAM 263 end by cellular exonucleases. To test this, we used a commercially available DnaQ-like 264 exonuclease, ExoT, to trim the non-PAM end of the PAM/NoPAM substrate in the presence of 265 Cas1-Cas2 or Cas4-Cas1-Cas2 and a DNA substrate containing a leader-repeat-spacer to mimic a 266 CRISPR array for integration (Figures 5D and S8C). ExoT trimming of the non-PAM end of the 267 10 Cas1-Cas2-bound prespacer produced products of similar length to a pre-processed prespacer 268 control, although trimming was slowed substantially in the presence of Cas4 ( Figure 5D). The 269 trimmed products were integrated into the CRISPR DNA, producing an integration product of 270 similar size to the pre-processed control, with a slight enhancement of integration observed in the 271 presence of Cas4. Overall, these results suggest that unstable association of Cas4 at the non-PAM 272 end offers partial protection from exonucleolytic trimming, slowing down processing of the non-273 PAM end. To determine how the repeat interacts within the type I-C adaptation complex, we solved a 4 Å 287 cryo-EM structure of the Cas4-Cas1-Cas2 complex bound to a half-site intermediate ( Figures 6A,  288 6B, S1A, S1B, S1F, S1H and S9). We used the Cas1 active site mutant E166A to prevent 289 disintegration of the prespacer strand integrated at the leader site of the HSI mimic. Strong density 290 for the CRISPR DNA could be observed in the reconstruction (Figures 6A and 6B). The DNA 291 helix bends sharply near the leader-repeat junction with another slight bend near the Cas4-Cas2 292 interface. The repeat spans the side of the complex opposite to the prespacer duplex. Unlike in the 293 type I-G structure, where the repeat appears to contact the inactive Cas1 domain of the Cas4/1 294 subunit, in the type I-C structure, the repeat contacts the C-terminal helix of Cas4 ( Figure 6C). 295 Specifically, two arginine residues Arg207 and Arg211 in the Cas4 C-terminal helix point directly 296 at the terminal sequence of the repeat, interdigitating within the minor groove of the repeat end of 297 11 the CRISPR (Figure 6C). This may suggest a role of Cas4 in correctly orienting the complex onto 298 the CRISPR array, positioning the PAM end of the prespacer close to the repeat-spacer site for 299 integration following PAM cleavage and Cas4 dissociation. 300 In the type I-G system, PAM cleavage by the Cas4 domain of Cas4/1 is activated following 301 integration of the prespacer at the leader site, potentially due to the interaction between the repeat 302 and the inactive Cas1 domain (Hu et al., 2021). Cleavage is followed by integration of the PAM 303 end at the spacer site. Overall, this mechanism ensures insertion of a correctly oriented spacer. 304 However, we have previously observed that type I-C Cas4 cleavage activity is not altered in the 305 presence of a CRISPR array (Lee et al., 2019), suggesting that Cas4 is not activated through 306 interactions with the CRISPR following integration of the non-PAM end. To further investigate 307 whether type I-C Cas4 is activated following half-site integration, we compared Cas4 cleavage of 308 the PAM end within an HSI substrate with cleavage of the PAM/NoPAM and PAM/processed 309 substrates (Figures 6D and 6E). Similar to our previous results, we did not observe any significant 310 differences in the fraction of substrate cleaved at various time points between the substrates, 311 suggesting that Cas4 processing is not dependent on the integration of the non-PAM strand at the 312 leader site. To test whether interactions between the Cas4 C-terminal helix and the repeat affect 313 PAM-end processing following leader-site integration, we tested a quadruple mutant of Cas4 (Q) 314 with the two C-terminal arginine residues and two nearby lysine residues (Lys206 and Lys210) 315 substituted with alanine (Figures 6D and 6E). Mutations of these residues had no effect on PAM 316 cleavage within the HSI substrate. Overall, these results strongly suggest that Cas4 is not activated 317 through interactions with the repeat following non-PAM integration in the type I-C adaptation 318 complex, in contrast to the type I-G system. 319 Cas4 enhances integration of the prespacer non-PAM end 320 As noted above, in the exonuclease trimming assays, we observed slightly more integration in the 321 presence of Cas4, despite observing slower exonucleolytic cleavage in this condition ( Figure 5D). 322 This led us to hypothesize that Cas4 may enhance integration at the leader site. To measure this 323 effect more precisely, we used a PAM/processed substrate with the processed strand radiolabeled 324 to measure integration of this strand at the leader site in the absence and presence of Cas4 ( Figures  325   6F and S8C). We observed substantial enhancement of integration at multiple temperatures in the 326 presence of Cas4 (Figures 6F and G), while Cas4 had no detectable effect on Cas1 disintegration 327 12 activity ( Figure S10). These observations suggest an additional role for Cas4 within the Cas4-328 Cas1-Cas2 complex in enhancing integration at the leader site. 329 We next tested whether enhanced integration is an effect of PAM cleavage in the presence of Cas4 330 or interactions between the Cas4 C-terminal helix and the repeat. However, neither a Cas4 active 331 site mutant nor the quadruple Cas4 C-terminal mutant (Q) substantially affected the enhancement 332 of integration ( Figure 6F). Similarly, substrates containing phosphorothioate substitutions at the 333 scissile phosphate that prevent PAM cleavage still displayed enhanced integration in the presence 334 of Cas4 ( Figure S11). Notably, a substrate containing processed 6-nt overhangs on both ends was 335 not integrated well either in the presence or absence of Cas4 ( Figure S11B). Overall, these results 336 strongly suggest that Cas4 enhances integration of the processed end at the leader site due to its 337 tight binding to a PAM-containing substrate rather than its PAM cleavage activity or its interaction 338 with the CRISPR repeat. 339

Discussion 340
Our structural and biochemical results provide a kinetic model for prespacer processing and 341 integration by the type I-C Cas4-Cas1-Cas2 complex (Figure 7). During prespacer capture, Cas4 342 is essential for binding prespacers containing a PAM sequence (Figure 7A) (Figures 7A and 7B). 347 Nevertheless, trimming of the non-PAM end gives rise to an asymmetric substrate based on the 348 slow kinetics of Cas4 PAM cleavage. Following trimming, Cas4 dissociates from the non-PAM 349 end (Figures 7B and 7C). Tight binding of Cas4 at the PAM end increases the likelihood of 350 integration of the non-PAM end at the leader site, enabling rapid formation of a half site integration 351 intermediate after exonucleolytic trimming (Figures 7C and 7D). Following PAM cleavage, Cas4 352 dissociates from the complex, allowing the mature PAM end to be integrated at the spacer site 353 resulting in insertion of a polarized and functional spacer (Figures 7E and 7F). Cas4. The data points were fit to a one phase exponential association. Polyacrylamide gel for the 444 cleavage assay is shown in Figure S4C.

533
Cloning, protein expression and purification 534 Previously described constructs for Cas1, Cas2 and Cas4 expression were used to express and 535 purify the individual proteins (Lee et al., 2018(Lee et al., , 2019. For co-expression with Cas4, sufABCDSE 536 genes were amplified from pSUF plasmid and cloned into pACYC using Gibson assembly. All 537 primers used for cloning various constructs have been listed in Table S1. 538 Cas1 and Cas2 were overexpressed in E. coli BL21(DE3) and grown at 37 °C to 0.5 OD600 in LB Following motion correction and CTF estimation, blob picking was performed with a circular blob 595 of minimum and maximum particle diameter of 100 Å and 300 Å respectively for the first data set. 596 Particles were then subjected to multiple rounds of 2D classifications with a 200 Å circular mask 597 diameter and resulting classes were used for template picking with a particle diameter of 300 Å. 598 Template picked particles were extracted with a box size of 384 and binned to a box size of 128 599 for processing until homogenous refinements. Extracted particles were subjected to iterative 2D 600 classifications. Selected 2D classes were then used to build an ab initio model. All particles were 601 refined using homogenous refinement with the ab initio model. These were then used for 602 heterogeneous refinements into 3-5 classes depending on the number of particles in the dataset. 603 Homogenous and Non-uniform refinements (Punjani, Zhang and Fleet, 2020) were further used 604 with CTF refinement of particles performed before the refinement step or on the fly during the 605 refinement. Figures S2, S5  For quantification, the intensity of bands was measured by densitometry using ImageJ (Schneider, 643 Rasband and Eliceiri, 2012). The fraction cleaved was calculated by dividing the product band by 644 the sum of both bands. The values from three or four replicates were averaged, error is reported as 645 standard deviation between the replicates. 646 For comparison of PAM/NoPAM, PAM/Proc and HSI substrates ( Figure 6D), the reactions were 647 performed at a variety of temperatures and Cas1, Cas2 and Cas4 concentrations. We did not 648 33 observe differences in activity regardless of protein concentration, and we did not observe 649 processing activity at mesophilic temperatures for any substrate tested. Concentrations of 100 nM 650 Cas1, 100 nM Cas2 and 50 nM Cas4 at a temperature of 65 °C were used for the experiment shown 651 in Figure 6D. Cas1 purified using both His6-MBP-Cas1, which retains additional GAGS N-652 terminal amino acids following TEV cleavage, and His6-SUMO-Cas1 constructs, which does not 653 contain any additional sequence on the N-or C-terminus, were tested and the assays done with 654 His6-SUMO-Cas1 are reported in Figure 6D-E. Cas4 purified with both N-and C-terminal His 655 tags were tested and the cleavage assays reported in Figure 6D-E were done with C-terminal His 656 tag. 657 All oligonucleotides used for the assays are listed in Table S2. 658

Exonuclease trimming assays and integration assays using PAM/Proc substrates 659
Reactions were performed in buffer containing 20 mM HEPES (pH 7.5), 100 mM KCl, 5% 660 glycerol, 10 mM MgCl2, 5 mM MnCl2 and 2 mM DTT. Purified Cas1 (100 nM), Cas2 (50 nM), 661 and Cas4 (50 nM) were pre-incubated with 5 nM 5′-radiolabelled prespacer DNA on ice for 15 662 min. For the exonuclease trimming assays, ExoT was added at a concentration of 1 U/µL, and the 663 reactions were further incubated at 37 ⁰C for 5 min, 15 min and 30 min. To observe integration, 1 664 µM of the linear mini-CRISPR array (consisting of a 20 bp leader segment, a 32 bp repeat sequence 665 and a 5 bp spacer sequence, shown in Table S2) was added to the reaction prior to incubation at 666 37˚C. For the exonuclease trimming assays, the mini-CRISPR DNA contained phosphorothioate 667 groups at the 3′ ends to prevent degradation by the exonuclease. For the time course reactions, a 668 master mix was prepared and an aliquot of 10 µL was taken at each time point. Reactions were 669 quenched with 2X loading dye at the indicated time points. The samples were denatured at 95 ⁰C 670 for 5 minutes and analyzed by 10% urea-PAGE, as described above. Oligonucleotides used for the 671 assays have been listed in Table S2. 672