Abstract
Cas9 is a CRISPR-associated endonuclease capable of RNA-guided, site-specific DNA cleavage1-3. The programmable activity of Cas9 has been widely utilized for genome editing applications4-6. Despite extensive studies, the precise mechanism of target DNA binding and on-/off-target discrimination remains incompletely understood. Here we report cryo-EM structures of intermediate binding states of Streptococcus pyogenes Cas9 that reveal domain rearrangements induced by R-loop propagation and PAM-distal duplex positioning. At early stages of binding, the Cas9 REC2 and REC3 domains form a positively charged cleft that accommodates the PAM-distal duplex of the DNA substrate. Target hybridisation past the seed region positions the guide-target heteroduplex into the central binding channel and results in a conformational rearrangement of the REC lobe. Extension of the R-loop to 16 base pairs triggers the relocation of the HNH domain towards the target DNA strand in a catalytically incompetent conformation. The structures indicate that incomplete target strand pairing fails to induce the conformational displacements necessary for nuclease domain activation. Our results establish a structural basis for target DNA-dependent activation of Cas9 that advances our understanding of its off-target activity and will facilitate the development of novel Cas9 variants and guide RNA designs with enhanced specificity and activity.
Main
Cas9 enzymes rely on a dual guide RNA structure consisting of a CRISPR RNA (crRNA) guide and a trans-activating CRISPR RNA (tracrRNA) coactivator to cleave complementary DNA targets. The archetypical Cas9 ortholog from Streptococcus pyogenes (SpCas9) has found widespread use as a programmable DNA targeting tool in genome editing and gene targeting applications4-6. Target DNA binding by SpCas9 is predicated on initial recognition of an NGG protospacer-adjacent motif (PAM) downstream of the target site2,7-9, which triggers local DNA strand separation to initiate its directional hybridization with a 20-nt spacer segment in the guide crRNA to form an R-loop structure7,10,11. This process is facilitated by structural pre-ordering of nucleotides 11-20 of the crRNA, termed the seed sequence, in an A form-like conformation8,12. Upon formation of a full R-loop, the Cas9 HNH and RuvC nuclease domains become activated to cleave the target (TS) and non-target (NTS) DNA strands, respectively, generating a double-strand DNA break three base pairs (bp) upstream of the PAM2,8,13. Although highly specific, SpCas9 can nevertheless cleave off-target genomic sites with imperfect complementarity to the guide RNA14-18. The resulting off-target activity is dependent on the number, type, and positioning of base mismatches within the guide-target heteroduplex15,19-21. PAM-proximal mismatches within the seed region are discriminated against through substantially increased dissociation rates11,19,21,22. In contrast, PAM-distal mismatches are compatible with stable binding, but trap the enzyme in a cleavage-incompetent, dead-end complex13,23,24. Structural, biophysical and computational studies of SpCas9 have shed light on the mechanism of guide RNA binding, PAM recognition, and nuclease activation, revealing that enzyme undergoes extensive conformational rearrangements in the process. In particular, high-resolution structures of the fully-bound target DNA complex of SpCas925-28 have revealed a target DNA-dependent conformational rearrangement of the Cas9 REC-lobe that is necessary for cleavage activation. However, our structural understanding of the mechanisms that govern conformational activation of SpCas9 and on-/off-target discrimination during R-loop formation remains incomplete.
Cryo-EM analysis of R-loop formation
To investigate the mechanism of SpCas9 R-loop formation, we initially determined the minimal extent of target DNA complementarity necessary for stable binding using fluorescence-coupled size exclusion chromatography, revealing that the presence of six complementary nucleotides in the PAM-proximal region of the target DNA heteroduplex is sufficient for stable association with the SpCas9-guide RNA complex. (Extended Data Fig. 1). Subsequently, catalytically-inactive Cas9 (dCas9) was reconstituted with a single-molecule guide RNA (sgRNA) and partially-matched DNA substrates containing 6, 8, 10, 12, 14, and 16 complementary nucleotides (Fig. 1a, Extended Data Fig. 2), and the resulting complexes were analysed by cryo-EM, yielding molecular reconstructions at resolutions of 3.0–4.1 Å (Extended Data Fig. 3, Extended Data Table 1). The conformational heterogeneity within each partially complementary DNA complex was examined using 3D variability analysis29. Most of the detected variability within each complex can be attributed to the PAM-distal duplex and the REC2, REC3, and HNH domains (Extended Data Fig. 4), suggestive of conformational equilibrium sampling. The resulting structural models are representative of the most abundant conformational state of each complex (Extended Data Fig. 5).
a, Top, Schematic representation of the domain composition of the Streptococcus pyogenes Cas9 nuclease, BH: bridge helix; Bottom, Schematic depicting DNA-bound complexes with increasing complementarity to guide RNA. b, Structural comparison of the SpCas9 binary complex (left) and the 6-nt match complex (right). c, Zoom-in view of the seed region og the sgRNA-TS DNA heteroduplex in the 6-nt match complex. Tyr450 stacks between the 5th and 6th nucleotide, counting from the PAM-proximal end of the heteroduplex. d, Zoom-in view of the seed region og the sgRNA-TS DNA heteroduplex in the 8-nt match complex. e, Structural comparison of the 6- and 8-nt match complexes. Arrows indicate inferred domain repositioning during the 6-nt to 8-nt transition.
Structural superpositions of each of the partially-bound complexes with the guide RNA-bound binary SpCas9 complex12 and the catalytically active states of SpCas928 provide a reference frame for the reconstruction of the DNA binding mechanism, revealing stepwise domain rearrangements coupled to R-loop formation. All complexes exhibit almost identical conformations of the bridge helix, REC1, RuvC, and PAM interaction domains, as well as the PAM-proximal dsDNA duplex and the sgRNA up to the seed region (Extended Data Fig. 6a). Conformational differences are observed in the positioning of the REC2, REC3, and the HNH domain relative to the emerging R-loop, consistent with the 3D variability analysis.
Initial DNA binding by REC2/REC3 domains
The structure of the 6-nucleotide complementary substrate (6-nt match) complex shows a 5-bp heteroduplex formed by the sgRNA seed sequence and TS DNA (Fig. 1b). Comparisons with the structure of the Cas9-guide RNA binary complex reveal that target strand hybridization is associated with a displacement of the REC2 domain out of the central binding channel (Fig. 1b) in the formation of a positively charged cleft between the REC2 and REC3 domains that accommodates the PAM-distal substrate DNA duplex (Extended Data Fig. 6b), which is stabilized by interactions with the REC2 residues Ser219, Thr249 and Lys263 with the NTS backbone (Extended Data Fig. 6c), and REC3 residues Arg586 and Thr657 with the TS backbone (Extended Data Fig. 6d). Consequently, the NTS is positioned parallel to the guide RNA-TS DNA heteroduplex within the central binding channel (Fig. 1b). The 5’ end of the sgRNA could not be precisely modelled due to conformational flexibility but residual cryoEM density suggests its placement in a positively charged cleft located between the HNH and PAM-interaction domains (Extended Data Fig. 6e).
In the 6-nt match complex, hybridization beyond the fifth seed sequence nucleotide is precluded by base stacking with the side chain of Tyr450, which was previously observed in the structure of the Cas9-sgRNA binary complex12 (Fig. 1c). The structure of the 8-nucleotide complementary substrate (8-nt match) complex reveals that expansion of the R-loop heteroduplex past Tyr450 forces further repositioning of the REC2 and REC3 domains to widen the binding channel as the PAM-distal duplex shifts deeper inside (Fig. 1d-f; Extended Data Fig. 6f). Together, these observations suggest that the seed sequence of the Cas9 guide RNA is bipartite and that its hybridization with target DNA proceeds in two steps, consistent with the existence of a short-lived intermediate state observed in FRET studies11,30. R-loop propagation and PAM-distal duplex displacement results in the formation of new intermolecular contacts, with Cas9 contacting the PAM-distal duplex backbone through REC2 domain residues Ser217, Lys234 and Lys253, and REC3 residues Arg557 and Arg654 (Extended Data Fig. 6g,h). Mutation of these residues, which would further destabilise this intermediate state and thus promote off-target dissociation, presents an opportunity to generate novel high-fidelity SpCas9 variants. As most off-targets are only bound but not cleaved19-21,31, these variants could prove to be valuable for biotechnological applications that rely on the fidelity of Cas9 target binding32-35, such as transcriptional regulation or base editing.
R-loop propagation and remodelling
Further guide RNA-TS hybridisation to form a 10-bp heteroduplex causes a rearrangement of the REC2 and REC3 domains and repositioning of the PAM-distal DNA duplex into the positively charged central binding channel formed by the REC3, RuvC, and the HNH domains (Fig. 2a). Although the PAM-distal dsDNA duplex could not be accurately modelled, residual cryo-EM density observed within the channel suggests that it forms a continuous base stack with the sgRNA-TS heteroduplex (Fig. 2b). The displaced NTS is positioned underneath the HNH domain and continues to run parallel to the extending guide RNA-TS DNA heteroduplex (Extended Data Fig. 7c). X-ray crystallographic analysis of the 10-nt match complex at a resolution of 2.8 Å (Extended Data Table 2) confirmed that the TS and NTS remain hybridised at the PAM-distal end of the DNA substrate (Extended Data Fig. 7a). The PAM-distal duplex is wedged between the REC3 and RuvC domains, and the L1 HNH linker (Extended Data Fig. 7a-c). The relocation of the PAM-distal duplex causes REC2 to shift closer to the binding channel and occlude the cleavage site in TS DNA (Fig. 2a). This shift also establishes a new electrostatic interaction between a negatively charged helix in REC2 (Glu260, Asp261, Asp269, Asp272, Asp273, Asp274, Asp276) and a positively charged helix in REC3 (Lys599, Arg629, Lys646, Lys649, Lys652, Arg653, Arg654, Arg655), hereafter referred to as the DDD and RRR helices, respectively (Fig. 2c, which are highly conserved across Cas9 orthologs that contain a REC2 domain (Extended Data Fig. 7d).
a, Zoom-in view of the R-loop bound in the 10-nt match complex. b, Zoom-in view of the PAM-distal duplex (white density) in the 10-nt match complex. The cryo-EM map is coloured according the schematic in Fig. 1a. c, Zoom-in view of the interaction between the REC2 domain DDD helix and the REC3 RRR helix.
R-loop completion and Cas9 activation
R-loop propagation past the seed region to form a 12-bp heteroduplex does not result in major REC lobe rearrangements (Fig. 3a), with the PAM-distal duplex remaining stacked onto the guide RNA-TS DNA heteroduplex. However, the HNH domain, which in the 6-nt, 8-nt and 10-nt match complexes is docked on the RuvC and PI domains with its active site buried at the interface of the three domains (Fig. 3b), becomes disordered along with surrounding RuvC and PI loops in the 12-nt match complex (Fig. 3c, Extended Data Fig. 4, Extended Data Fig. 5). The REC lobe conformation is maintained upon extension of the R-loop heteroduplex to 14 bp and the RuvC and PI loops responsible for HNH docking remain structurally disordered (Fig. 3d, Extended Data Fig. 8a). Residual density is observed for the HNH domain, due to a contact of the paired heteroduplex with the L2 linker (Extended Data Fig. 8a). The PAM-distal region of the substrate DNA becomes disordered, likely due to strand separation, and the NTS can be modelled only a few nucleotides past the PAM region (Fig. 3d, Extended Data Fig. 8a). Further extension of the R-loop from 14 to 16 base pairs preserves the orientations of the REC2 and REC3 domains (Fig. 3a), but causes a large translation of the HNH domain towards the target heteroduplex within the central binding channel (Fig. 4a). Facilitated by the formation of the PAM-distal part of the R-loop, a RuvC domain loop (residues 1030-1040) restructures into a helical conformation, establishing an interaction with the L2 linker of the HNH domain (Extended Data Fig. 9b). This interaction alters the positioning of the L2 linker and shifts the HNH domain on top of the heteroduplex to seal off the central binding channel (Fig. 4a). This opens up a positively charged cleft between the HNH, RuvC, and PI domains to accommodate the NTS (Extended Data Fig. 8c). Although no residual cryo-EM density can be observed for the NTS in the 16-nt match complex, its positioning could be traced to the same positively charged cleft observed in the catalytically active conformation of Cas928 (Extended Data Fig. 8d).
a, Structural overlay of the REC2 and REC3 domains in the 10-, 12-, 14-, and 16-nt match complexes. b. Position of the HNH catalytic site in the 6-, 8-, and 10-nt match complexes. c, Overview of the 12-nt match complex model overlaid with unsharpened cryo-EM map. Residual density corresponding to the PAM-distal duplex (white) is highlighted, the adjacent density is presumed to correspond to the NTS. HNH density is disordered. d, Overview of the 14-nt match complex model overlaid with unsharpened cryo-EM map, showing residual density corresponding to the HNH domain. No density is visible for NTS. Cryo-EM maps are coloured according the schematic in Fig. 1a.
a, Structural comparison of the 16-nt match complex (left) and the catalytic state of SpCas9 (right). b. Detailed views of the HNH and RuvC nuclease domains in the 16-nt match complex (left) and the catalytic state (right).
Notably, the orientation of REC2/3 domains in the 12-,14- and 16-nt match complexes is consistent with a catalytically inactive state of Cas9 proposed in biophysical studies23,24,36. The catalytic site of the HNH domain remains positioned away from the TS cleavage site in the 16-bp heteroduplex complex (Fig. 3d). To reach the catalytically active state28, the Cas9 REC2 domain must dissociate from the TS cleavage site, which is facilitated by concerted movements of the HNH and REC3 domains upon full R-loop formation (Fig. 4a), and consistent with prior biophysical, structural, and computational studies23,24,37,38. REC2 dissociation enables a ∼140° rotation of the HNH domain towards the scissile phosphate in the TS (Fig. 4b), which is accompanied by the extension of the L2 linker helix, and leads to activation of the RuvC domain for NTS cleavage13. HNH domain repositioning is likely induced by the proximity of the extended R-loop heteroduplex, as supported by observations that PAM-distal end positioning allosterically modulates HNH domain conformation36.
Conclusions
In sum, our structural analysis of SpCas9 along its DNA binding pathway reveals a mechanism whereby R-loop formation is allosterically and energetically coupled to domain rearrangements necessary for nuclease activation (Extended Data Fig. 9). Directional formation of the guide RNA-target DNA heteroduplex causes a rearrangement of the REC2 and REC3 domains and repositions the HNH nuclease domain towards the TS DNA substrate, facilitated by a network of electrostatic and hydrogen bonding interactions between the DNA substrate and Cas9, and between individual protein domains. Incomplete target pairing fails to conformationally activate the HNH domain for cleavage, explaining why off-target complexes remains trapped in an inactive state. This model further highlights the importance of maintaining guide-target complementarity and proper heteroduplex conformation, as domain mispositioning occludes interaction surfaces necessary for downstream rearrangements and nuclease activation, consistent with biophysical and computational studies showing that the conformation of the R-loop heteroduplex strongly affects off-target binding11,39. These findings thus have important implications for ongoing experimental and computational studies aiming to uncover the effects of off-target mismatches on Cas9 activity, and could inform the development of new high-fidelity SpCas9 variants and guide RNA designs.
Methods
Expression and purification of Cas9 proteins
Catalytically inactive Streptococcus pyogenes Cas9 (D10A/H840A mutant) was expressed in E.coli Rosetta 2 (DE3) (Novagen) for 16 hours at 18 °C as fusion proteins with an N-terminal His6-MBP-TEV tag. Bacterial pellets were resuspended and lysed in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 5 mM imidazole, and protease inhibitors. Cell lysates were clarified using ultracentrifugation and loaded on a 15 ml Ni-NTA Superflow column (QIAGEN) and washed with 7 column volumes of 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 5 mM imidazole. Tagged Cas9 was eluted with 10 column volumes of 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 200 mM imidazole. Salt concentration was adjusted to 250 mM KCl and the protein was loaded on a 10 ml HiTrap Heparin HP column (GE Healthcare) equilibrated in 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT. The column was washed with 5 column volumes of 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT, and dCas9 was eluted with 15 column volumes of 20 mM HEPES-KOH pH 7.5, 1.5 M KCl, 1 mM DTT, in a 0-50% gradient (peak elution around 500 mM KCl). His6-MBP tag was removed by TEV protease cleavage overnight at 4 °C with gentle shaking. The untagged protein was concentrated and further purified on a Superdex 200 16/600 gel filtration column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT. Pure fractions were concentrated to 10 mg/ml, flash frozen in liquid nitrogen and stored at 80 °C.
sgRNA in vitro transcription
The single guide RNA (sgRNA) was transcribed from a dsDNA template in a 5 ml transcription reaction (30 mM Tris-HCl pH 8.1, 25 mM MgCl2, 2 mM spermidine, 0.01% Triton X-100, 5 mM CTP, 5 mM ATP, 5 mM GTP, 5 mM UTP, 10 mM DTT, 1 µM DNA transcription template, 0.5 units inorganic pyrophosphatase (Thermo Fisher), 250 µg T7 RNA polymerase). The transcription reaction was incubated at 37 °C for 5 hours, after which the dsDNA template was degraded for 30 minutes with 15 units of RQ1 DNAse (Promega). The transcribed sgRNA was PAGE purified on an 8% denaturing polyacrylamide gel containing 7 M urea, ethanol precipitated and dissolved in DEPC-treated water.
Gel filtration binding assay
The dCas9-gRNA complex was assembled by incubating 371 picomoles of dCas9 with 400 picomoles of the sgRNA in 20 mM HEPES-KOH pH 7.5, 200 mM KCl, 2 mM MgCl2 for 10 minutes at room temperature. Then 250 picomoles of Cy5-labeled dsDNA substrate was added and incubated another 15 minutes. The volume was adjusted up to 100 µl with reaction buffer and the mixture was centrifuged to remove possible precipitates. Individual reactions were transferred to a 96-well plate and analysed using a Superdex 200 Increase 5/150 GL gel filtration column (GE Healthcare) attached to an Agilent 1200 Series Gradient HPLC system. The 260 nm, 280 nm, and Cy5 signals were exported and plotted as a function of the retention volume in GraphPad Prism 9.
Crystallisation and X-ray structure determination
The 10-nt complementary ternary complex of dCas9 was assembled by first incubating dCas9 with the sgRNA in a 1:1.5 molar ratio, and pre-purifying the binary complex on a Superdex 200 16/600 gel filtration column (GE Healthcare) in 20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT. The binary complex was diluted in 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT to 2.5 mg/ml and the partially complementary dsDNA substrate was added in 1:1.5 molar excess. For crystallisation, 1 µl of the ternary complex (1.5-2.5 mg/ml) was mixed with 1 µl of the reservoir solution (0.1 M sodium cacodylate pH 6.5, 0.8-1.2 M ammonium formate, 12-14% PEG4000) and crystals were grown at 20 °C using the hanging drop vapour diffusion setup. Crystals were harvested after 3-4 weeks, cryoprotected in 0.1 M Na cacodylate pH 6.5, 1.0 M ammonium formate, 13% PEG4000, 20% glycerol, 2 mM MgCl2, and flash-cooled in liquid nitrogen. Diffraction data was measured at the beamline PXIII of the Swiss Light Source at a temperature of 100 K (Paul Scherrer Institute, Villigen, Switzerland) and processed using the autoPROC and STARANISO package with anisotropic cut-off43. Phases were obtained by molecular replacement using the Phaser module of the Phenix package44 using the NUC lobe of the PDB ID: 5FQ5 as initial search model. The crystals belonged to the P1 space group and contained two copies of the complex in the asymmetric unit.
Cryo-EM sample preparation and data acquisition
To assemble the 6-, 8-, 10-, 12-, 14-, and 16-nt match complexes, dCas9 protein was mixed with the sgRNA in a 1:1.5 molar ratio, and incubated at room temperature for 10 minutes in buffer 20 mM HEPES-KOH pH 7.5, 150 mM KCl, 1 mM DTT. The respective partially complementary dsDNA substrate was then added in a 1:3 Cas9:DNA molar ratio and incubated another 20 minutes at room temperature. The complexes were then purified using a Superdex 200 Increase 10/300 GL gel filtration column (GE Healthcare) and eluted in 20 mM HEPES-KOH pH 7.5, 250 mM KCl, 1 mM DTT. Concentration of the monomeric peak was determined using the Qubit 4 Fluorometer Protein Assay, and then diluted to 0.275 mg/ml in 20 mM HEPES-KOH pH 7.5, 250 mM KCl cold buffer. 3 µl of diluted complex was applied to a glow discharged 200-mesh holey carbon grid (Au 1.2/1.3 Quantifoil Micro Tools), blotted for 1.5-2.5 s at 90% humidity, 20 °C, plunge frozen in liquid propane/ethane mix (Vitrobot, FEI) and stored in liquid nitrogen. Data collection was performed on a 300 kV FEI Titan Krios G3i microscope equipped with a Gatan Quantum Energy Filter and a K3 direct detection camera in super-resolution mode. Micrographs were recorded at a calibrated magnification of 130,000 x with a pixel size of 0.325 Å. Data acquisition was performed automatically using SerialEM with three shots per hole at -0.8 µm to -2.2 µm defocus.
Cryo-EM data processing
Acquired cryoEM data was processed using cryoSPARC45. Gain-corrected micrographs were imported and binned to a pixel size of 0.65 Å during patch motion correction. After patch CTF estimation, micrographs with a resolution estimation worse than 5 Å and full-frame motion distance larger than 100 Å were discarded. Initial particles were picked on denoised micrographs using Topaz 0.2.4 with the pre-trained ResNet16 (64 units) model46. Particles were extracted with a box size of 384 × 384 pixels, down-sampled to 192 × 192 pixels. After 2D classification, templates were generated using good classes and particle picking was repeated using the template picker. Particle picks were inspected and particles with NCC scores above 0.4 were extracted as before. After 2D classification, duplicate particles were removed and used for ab initio 3D reconstruction. All partially bound complexes displayed several conformational states. After several rounds of 3D classification, classes with most detailed features were reextracted using full 384 × 384 pixel box size and subjected to non-uniform refinement to generate high-resolution reconstructions47. Each map was sharpened using the appropriate B-factor value to enhance structural features, and local resolution was calculated and visualised using ChimeraX48.
Structural model building, refinement, and analysis
Map sharpening and density modification to facilitate X-ray and cryoEM model building was performed using Phenix.auto_sharpen49. Manual Cas9 domain placement, model adjustment and nucleic acid building was completed using COOT 50. Atomic model refinement was performed using Phenix.refine for X-ray data51 and Phenix.real_space_refine for cryoEM52. The quality of refined models was assessed using MolProbity53. Protein-nucleic acid interactions were analysed using the PISA web server54. Characterisation of the guide-protospacer duplex was performed using the 3DNA 2.0 web server55. Structural figures were generated using ChimeraX48.
Author contributions
M.P. and M.J. conceived the study and designed experiments. M.P. purified Cas9, performed in vitro cleavage assays, crystallized 10-bp heteroduplex complex, prepared cryo-EM samples and solved the structures. M.P. and M.J. performed structural analysis and wrote the manuscript.
Acknowledgements
This work was supported by the Swiss National Science Foundation Grant 31003A_182567 (to M.J.). M.J. is an International Research Scholar of the Howard Hughes Medical Institute and Vallee Scholar of the Bert L & N Kuggie Vallee Foundation. We thank Simona Sorrentino, Marta Sawicka, Luuk Loeff, Irma Querques, and Lena Muckenfuss for assistance with cryo-EM data collection, and Franziska Boneberg and Christelle Chanez for their help with preparing reagents. We thank members of the Jinek laboratory for discussion and critical reading of the manuscript. We are grateful to Josh Cosfsky, Katarzyna Soczek and Jennifer Doudna for sharing unpublished data and helpful comments.