Structural basis of target DNA recognition by CRISPR-Cas12k for RNA-guided DNA transposition

The type V-K CRISPR-Cas system, featured by Cas12k effector with a naturally inactivated RuvC domain and associated with Tn7-like transposon for RNA-guided DNA transposition, is a promising tool for precise DNA insertion. To reveal the mechanism underlying target DNA recognition, we determined a cryo-EM structure of Cas12k from cyanobacteria Scytonema hofmanni in complex with a single guide RNA (sgRNA) and a double-stranded target DNA. Coupled with mutagenesis and in vitro DNA transposition assay, our results revealed mechanisms for the recognition of the GGTT PAM sequence and the structural elements of Cas12k critical for RNA-guided DNA transposition. These structural and mechanistic insights should aid in the development of type V-K CRISPR-transposon systems as tools for genome editing.


INTRODUCTION
The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPRassociated (Cas) systems are adaptive immunity systems in bacteria and archaea against mobile genetic elements (MGEs) and have been developed as tools for genome editing (Mohanraju et al., 2016;Sorek et al., 2013).These systems employ guide RNAs and effector proteins to specifically target MGEs for degradation.The arms race between prokaryotes and foreign MGEs has resulted in diverse CRISPR-Cas systems, which are divided into two classes (1 and 2) and six different types (I-VI) (Makarova et al., 2015;Shmakov et al., 2015).
The class 2 type V system, featured by a conserved C-terminal RuvC nuclease domain in their effector Cas12 proteins, is abundant and further classified into 11 subtypes (V-A to V-K) .CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.It is made The copyright holder for this preprint (which this version posted July 7, 2021.; https://doi.org/10.1101/2021.07.07.451486 doi: bioRxiv preprint

In vitro DNA transposition
We first purified Cas12k, sgRNA and transposition proteins (TnsB, TnsC and TniQ) in the ShCAST system and tested their function using a previously established in vitro DNA transposition assay (Strecker et al., 2019b) (Fig. S1B,C).Our results suggest that TnsB, TnsC, and magnesium (required for transposon end cleavage and target joining (Skelding et al., 2002)) are strictly required for DNA transposition; whereas additional components including Cas12k, sgRNA, and TniQ are necessary for RNA-guided DNA transposition (Fig. S1D-G).Omitting any of the later three components leads to DNA transposition in a non-RNA-guided manner.Out of ten randomly selected colonies from the assay using all components, eight RNA-guided and two non-RNA-guided insertions were observed (Fig. S1G).Both simple insertion and co-integration products were observed in RNA-guided insertions (Rice et al., 2020;Strecker et al., 2020), with co-integration being the major product in our experiments as revealed by restriction enzyme digestion and DNA sequencing (Fig. S1C,H,I).

Overall structure of Cas12k-sgRNA-target DNA
To understand the mechanism of RNA-guided target DNA recognition, we assembled a Cas12k-sgRNA-target DNA ternary complex by incubating Cas12k, sgRNA, and a target DNA containing a GGTT PAM sequence (Fig. S1J, and Table S1).Using cryo-EM, we reconstructed a map of this ternary complex at 3.6 Å resolution (Figs.1B,C and S2, and Table 1), which allowed us to build the atomic model (Fig. S3A) except residues 103-270 of Cas12k, the crRNA-target DNA heteroduplex beyond 10 bp from the PAM duplex, and small regions (e.g.1-8 nt of sgRNA), which are not resolved in the map most likely due to flexibility.
The overall structure of Cas12k resembles other Cas12 proteins, with Cas12f as the closest match by a DALI search (z-score, 14.5) (Takeda et al., 2020;Xiao et al., 2021) (Fig. 2A,B).The 637-residue protein adopts a bi-lobed structure connected by a loop.The N-terminal lobe of Cas12k is composed of the WED, REC1, and PI domains.The WED domain, which plays a major role in recognizing sgRNA, contains seven strands (β1-7) with a helix α5 inserted between β5 and β6 (Fig. 2C).The REC1 domain is inserted between β1 and β2 of the WED domain and composed of an N-terminal helical bundle α1-4 (REC1 13-102 ) and a C-terminal flexible region (REC1 103-270 ) that is predicted to form 6-7 helices (Figs.2D and S3B).Although sharing low sequence similarity, REC1 13-102 is structurally similar to the REC1 C domain in Cas12f (Figs. 2D and S3B,C), which forms the dimerization interface in Cas12f (Takeda et al., 2020;Xiao et al., 2021).However, the key hydrophobic residues (I118, Y121, Y122, I126) in Cas12f are not conserved in REC1 13-102 (Figs.2D and S3B), which may be a reason why a Cas12k dimer is not observed.Following the WED domain is a PI domain composed of two helices, α6 and α7, which is absent in Cas12f but observed in some other Cas12 proteins such as Cas12i (Zhang et al., 2020).
The C-terminal lobe of Cas12k is composed of the RuvC and REC2 domains.Both the sequence and structure of the RuvC domain of Cas12k are conserved relative to Cas12f; however, the Cas12f's triplet of acidic residues required for nuclease activity is replaced by either serine or proline in Cas12k (Fig. S3B,C).A Cas12k mutant restoring the catalytic residues (S452D, P546E, and P619D) did not reinstate target DNA cleavage (Fig. S1K).In addition to the altered catalytic residues, two additional features are observed in the RuvC domain of Cas12k compared to that of Cas12f.First, there is no Nuc domain in Cas12k (Figs. 2A,B,E and S3B).The Nuc domains or equivalent domains are inevitably present in all Cas12 proteins with structures determined to date including Cas12a (Dong et al., 2016;Gao et al., 2016;Nishimasu et al., 2017;Stella et al., 2017;Stella et al., 2018b;Swarts and Jinek, 2019;Swarts et al., 2017;Yamano et al., 2016;Yamano et al., 2017;Zhang et al., 2019), Cas12b (Liu et al., 2017;Wu et al., 2017;Yang et al., 2016), Cas12e (Liu et al., 2019), Cas12f (Takeda et al., 2020;Xiao et al., 2021), Cas12g (Li et al., 2021) and Cas12i (Huang et al., 2020;Zhang et al., 2020), and play an essential role in the nuclease activity.Second, the lid motif of Cas12k is longer than that of Cas12f and is in a closed conformation that covers the pseudonuclease site (Fig. 2A,C).Both features are consistent with the lack of nuclease activity in the RuvC domain of Cas12k.Taken together, Cas12k is closely related to Cas12f in structure but may have evolved these new features to meet the requirements for DNA transposition.

PAM recognition
The PAM duplex is enclosed in a positively charged groove formed by the REC1, WED, and PI domains (Fig. 4A).All three domains contribute residues that directly interact with the bases of the PAM sequence for sequence-specific recognition (Figs.4B,C and S4D,E).Specifically, R78 from the REC1 domain establishes two hydrogen bonds with the base G(-3) of the non-target strand (NTS).The hydroxyl group of T287 from the WED domain forms two hydrogen bonds with base A(-2) of the target strand (TS).R421 from the PI domain interacts with both A(-2) and T(-1) from the TS and NTS, respectively.In addition, a number of polar or positively charged residues recognize the PAM duplex through the phosphate backbones (Figs.4C and S4D,E).
To test the structural observations, we mutated the three residues that recognize the bases and two positively charged residues that bind to phosphate backbones, R350 and R428 from the WED and PI domains, respectively.Alanine substitution of any of the residues reduced in vitro RNA-guided transposition activity by PCR readout (Figs.4D, S1E-G and S5).
Guided by our observation of a shorter stabilized heteroduplex, we set out to determine the minimum spacer length in sgRNA required for RNA-guided DNA transposition.We designed sgRNA with various spacer length, including 6, 8, 10, 12, 14, 16, 18, and 20 nucleotides (Table S1).Our in vitro DNA transposition assay suggests that at least a 14-nt spacer length is required for detectable DNA transposition, and at least 16-nt is required for optimized activity (Fig. 4E).This is consistent with a recent study showing that 16 nt is both sufficient and near the minimum length required for insertion in the type V-K system (Saito et al., 2021).This result suggests that one checkpoint for transposition in the type V-K system is the formation of the crRNA-target DNA heteroduplex at 14-16 bp.

Conformational changes induced by target DNA recognition
To understand the conformational changes in Cas12k upon target DNA recognition, we reconstructed a cryo-EM structure of the Cas12k-sgRNA binary complex at 3.8 Å (Figs. 5A,B and S6, and Table 1).Although still largely flexible, REC1 103-270 is more visible in the binary complex and contacts the lid motif in the RuvC domain (Fig. 5C).Structural superimposition with the ternary structure reveal minimal conformational change in Cas12k, with the exception of the REC1 domain that undergoes a ~6-8 Å shift to contact target DNA (Fig. 5D,E).To be noted, the lid motif adopts a similar closed conformation in both the ternary and binary complex.The closed-to-open transition of the lid motif upon target DNA recognition is shown as a conserved mechanism for activation of the RuvC nuclease activity in Cas12 proteins (Stella et al., 2018a;Xiao et al., 2021;Zhang et al., 2020).However, deletion of the lid motif (Δ549-588GSGS) completely abolishes RNA-guided DNA transposition (Figs.4D and S5), suggesting it plays an essential function.

DISCUSSION
RNA-guided DNA transposition by the type V-K CAST system requires both the CRISPR module and the transposition module (Strecker et al., 2019b).The CRISPR module recognizes target DNA using guide RNAs and recruits the transposition machinery for DNA insertion.In this study, we showed the cryo-EM structure of Cas12k and the mechanism of target DNA recognition by the CRISPR module of the ShCAST system.
Despite sharing similar architecture with other Cas12 proteins, Cas12k displays considerable differences that may be related to its association with the Tn7-like transposon, including an inactive RuvC domain, the absence of the Nuc domain, and a longer and closed lid motif.
Although undergoing no closed-to-open conformational change upon recognition of target DNA, the lid motif is kept in Cas12k and essential for RNA-guided DNA transposition.Given the essential function of the lid motif and the lack of nuclease activity of Cas12k, we speculate that the lid motif might play a role in either stabilization of the structure or the recruitment of transposition proteins.
Four stem loops (S4-7) within the sgRNA show no interactions with Cas12k, raising a question about their function.Interestingly, when sgRNA is removed from the in vitro transposition assay, the number of colonies are significantly larger compared to that of other conditions (Fig. S1E); however, none of the tested colonies shows RNA-guided DNA insertion.This result may suggest that sgRNA might play an inhibitory role in the transposition machinery for non-RNAguided DNA insertion.This may not be surprising because to direct the transposon machinery for RNA-guided DNA transposition, the CRISPR-Cas system may have evolved a mechanism to inhibit the transposon's original activity.
Recent studies showed the role of the AAA+ protein TnsC in transposition target site selection (Park et al., 2021;Shen et al., 2021).In the ShCAST system, TnsC forms filament structure on DNA, which is capped by TniQ (Park et al., 2021).TniQ is likely directly associated to Cas12k, similar to previous observations showing that TniQ is bound to the CRISPR effector complex, the Cascade complex, in the type I-F CAST system (Halpin-Healy et al., 2020;Jia et al., 2020;Li et al., 2020;Wang et al., 2020).The interactions between TnsB transposase and TnsC could direct DNA insertion in a fixed position relative to the target DNA recognition site of the CRIPSR module, which is 60-66 bp downstream of the PAM in the ShCAST system.The Cas12k structure reported here and these recent studies are beginning to unravel the underlying mechanism for RNA-guided DNA transposition.
Transposon-associated CRIPSR-Cas systems are promising tools for gene insertion application; however, possible off-target insertion raises concerns because it can cause genome instability.
In the case of the ShCAST system, non-RNA-guided DNA insertion is observed in in vitro transposition assay (Fig. S1G) and in E.coli (Strecker et al., 2019b).To reduce or eliminate this unwanted DNA insertion, further studies will be required to understand detailed mechanisms in the ShCAST system, including the interactions between the Cas12k-sgRNA-target DNA module and the whole transposition machinery; this will be especially vital for manipulating the system in genome editing applications.Table S1.Sequence of RNAs and DNA oligonucleotides utilized in this study.
The gene fragment for Cas12k was cloned into the bacterial expression plasmid pET-His6- glycerol, and 1 mM DTT (0.5 mM EDTA was added to the buffer for TnsC).Fractions were concentrated and stored at -80°C.
To assemble the Cas12k-sgRNA binary complex, Cas12k proteins were incubated with sgRNA (Table S1) at a ratio of 1:1.15 at 37°C for 30 min in buffer A (25 mM Tris-HCl, pH 7.6, 150 mM NaCl, 2 mM DTT and 1 mM MgCl 2 ).To reconstitute the Cas12k-sgRNA-target DNA ternary complex, Cas12k protein was incubated with sgRNA at 37°C for 30 min followed by the addition of target DNA synthesized from IDT (Table S1) at a ratio of 1:1.1.5:1.3.After 30 min, the mixture was subjected to SEC over a Superdex 200 column (Cytiva) equilibrated with buffer A for further purification.

sgRNA preparation
sgRNAs were produced by in vitro transcription using the HiScribe T7 High Yield RNA synthesis kit (NEB) with PCR amplified gBlocks (IDT) as templates.sgRNAs were purified over a Resource-Q column (Cytiva) and eluted with a linear NaCl gradient (50 mM-1000 mM) in 25 mM Tris-HCl, pH 8.0.The eluted sgRNAs were concentrated and stored at -80°C

Mutagenesis
Single amino acid mutations were introduced by the QuikChange site-directed mutagenesis method.Mutations with multiple amino acids were introduced by ligating inverse PCR-amplified backbone with mutations bearing DNA oligonucleotides via the In-Fusion Cloning Kit (ClonTech).
All mutants were confirmed by Sanger sequencing.

In vitro transposition assay
Donor plasmid (pDonor) and target plasmid (pTarget) were gifts from Feng Zhang (Addgene #127924 and #127926, respectively).In vitro transposition reaction was conducted as previously described unless otherwise stated.All proteins were diluted to 2 µM with 25 mM Tris-

Electron Microscopy
Aliquots of 4 μL Cas12k-sgRNA binary complex (1 mg/mL) and Cas12k-sgRNA-dsDNA ternary complex (1 mg/mL) were applied to glow-discharged UltrAuFoil holey gold grids (R1.2/1.3,300 mesh).The grids were blotted for 2 seconds and plunged into liquid ethane using a Vitrobot Mark IV. Cryo-EM data were collected with a Titan Krios microscope operated at 300 kV and images were collected using Leginon (Suloway et al., 2005) at a nominal magnification of 81,000x (resulting in a calibrated physical pixel size of 1.05 Å/pixel) with a defocus range of 0.8-2.0μm.The images were recorded on a K3 electron direct detector in super-resolution mode at the end of a GIF-Quantum energy filter operated with a slit width of 20 eV.A dose rate of 20 electrons per pixel per second and an exposure time of 3.12 seconds were used, generating 40 movie frames with a total dose of ~ 54 electrons per Å 2 .Statistics for cryo-EM data are listed in Table 1.

Image Processing
Movie frames were aligned using MotionCor2 (Zheng et al., 2017) with a binning factor of 2. The motion-corrected micrographs were imported into cryoSPARC (Punjani et al., 2017).Contrast transfer function (CTF) parameters were estimated using CTFFIND4 (Rohou and Grigorieff, 2015).A few thousand particles were auto-picked without template to generate 2D averages for subsequent template-based auto-picking.The auto-picked and extracted particles were processed for 2D classifications, which were used to exclude false and bad particles that fell into 2D averages with poor features.An initial reconstruction was done in cryoSPARC using 100,000 particles (Punjani et al., 2017).Heterogenous refinement was further performed to sort out different conformational heterogeneity.To further screen homogenous particles, 3D variance analysis (Punjani and Fleet, 2021) was performed and the resulting maps with different conformations (frame_000.mrc and frame_019.mrc)are used for supervised heterogenous refinement.The homogeneous dataset was used for final 3D refinement with C1 symmetry, resulting in 3.65 Å resolution from 183,870 particles.
The Cas12k-sgRNA binary complex dataset were processed in a similar way as the ternary complex.114,383 particles were selected for a final reconstruction at 3.80 Å resolution.Cryo-EM image processing is summarized in Table 1.

Model building, refinement, and validation
De novo model building of the Cas12k-sgRNA-target DNA structure was performed manually in COOT (Emsley et al., 2010) guided by secondary structure predictions from PSIPRED (Jones, 1999) of Cas12k protein and structure prediction of sgRNA by RNAComposer (Biesiada et al., 2016).Refinement of the structure models against corresponding maps were performed using the phenix.real_space_refinetool in Phenix (version 1.19.2) (Afonine et al., 2018).For the Cas12k-sgRNA complex, the structure model of the Cas12k-sgRNA-target-DNA complex was fitted into the cryo-EM map with models for target DNA deleted.The model is adjusted by all-atom refinement in COOT with self-restrains.The resultant model was refined against the corresponding cryo-EM map using the phenix.real_space_refinetool in Phenix.

Structure-based sequence alignment
PROMALS3D program (Pei et al., 2008) was used to align the sequences of Cas12k and Cas12f based on structure.The alignment diagram was plotted using ESPript (Robert and Gouet, 2014).Sequence identities and similarities were calculated using Sequence Manipulation Suite (Stothard, 2000).Root-mean-square deviation (RMSD) of the Cα atomic was calculated using the cealign command in PyMOL.

Structural visualization
Figures were generated using PyMOL and UCSF Chimera (Pettersen et al., 2004).

Figure 1 .
Figure 1.Overall structure of the Cas12k-sgRNA-target DNA complex.(A) Schematic of domain organization of Cas12k based on structure.(B) Cryo-EM map of the Cas12k-sgRNAtarget DNA complex at 3.6 Å in two views with each domain color coded as in A. The

Figure 2 .
Figure 2. Structure of Cas12k and comparison with Cas12f.(A,B) Atomic models of Cas12k (A) and Cas12f (PDB: 7L49) (B).The subunit A of Cas12f is shown in the same view as Cas12k whereas the subunit B is semi-transparent.(C-E) Comparison of the domains between Cas12k and Cas12f.

Figure 3 .
Figure 3. Overall structure of sgRNA.(A) Structure of the sgRNA and target DNA in the Cas12k-sgRNA-target DNA complex in cartoon presentation with stem-loops (S1-8), pseudoknot, and AR:R 1-2 duplexes color coded.(B) Schematic of the sgRNA and target DNA, color coded as in A. (C) Contacts between sgRNA and Cas12k.Stem loops not in contact with Cas12k are not shown.Cas12k in surface potential and cartoon representations are shown in left and right panels, respectively.

Figure 4 .
Figure 4. Target DNA recognition by Cas12k.(A) The PAM duplex is bound to a positively charged grove by REC1, WED, and PI domains (shown in cartoon and surface potential in the left and right panels, respectively).(B) Detailed interactions between the PAM duplex and Cas12k.Interactions are indicated by red dashed lines.(C) Schematic of the interactions between the PAM duplex and Cas12k.(D) PCR results of in vitro DNA transposition assay using wild-type Cas12k and various Cas12k mutants.The results shown are from three replicates.(E) PCR results of in vitro DNA transposition assay using sgRNA with different spacer length.The results shown are from two replicates.

Figure 5 .
Figure 5. Structure of the Cas12k-sgRNA complex.(A) Cryo-EM map of the Cas12k-sgRNA complex at 3.8 Å with each subunit color coded as in Fig. 1A.The unsharpened map is shown in grey mesh.(B) Atomic model of the Cas12k-sgRNA complex shown in cartoon in the same views as in A. (C) Cryo-EM density of REC1 103-270 (circled) in the binary complex.(D) Structural superimposition of Cas12k-sgRNA (color-coded as in A) and Cas12k-sgRNA-target DNA (magenta) complexes.(E) Structural superimposition of Cas12k protein in the two states shown in D.

Figure S2 .
Figure S2.Cryo-EM data processing for the Cas12k-sgRNA-target DNA ternary complex.(A) A representative raw cryo-EM micrograph of the Cas12k-sgRNA-target DNA complex from a total of 2204 micrographs.(B) Representative, good 2D class averages from a total of 100 images.(C) Three 3D reconstructions from heterogeneous refinement.(D-F) Three rounds of supervised heterogenous refinement using two maps from 3D variability analysis as templates.Variable regions are indicated by red circles.(G) Homogeneous refinement of final particle set.(H) Local refinement of final particle set using a mask as indicated.Shown on the left is the unsharpened map, and on the right is the sharpened map.(I) Plots of the half-map FSC.

Figure S3 .
Figure S3.Detailed cryo-EM density map and structure-based sequence alignment.(A) Fitting between the cryo-EM map of the Cas12k-sgRNA-target DNA complex and the atomic model.(B) Structure based sequence alignment of Cas12k and Cas12f.Residue numbers and secondary structures are labeled according to Cas12k.Arrowed residues are key residues for Cas12f dimerization (in yellow) and catalytic residues in the RuvC domain of Cas12f (in red).Each domain is indicated by background colors as in Fig. 1A.(C) Sequence identity and similarity based on alignment in B.

Figure S4 .
Figure S4.Interactions between Cas12k and bound nucleic acids.(A-C) Contacts between Cas12k and sgRNA.The AR:R1 duplex is clamped between the N-lobe and C-lobe of Cas12k (A).S1 and pseudoknot contact the C-lobe (B), whereas S3 and S8 contact the N-lobe of Cas12k (C).(D,E) Interactions between Cas12k and the PAM duplex of target DNA.Left panels are in cartoon representation, while right panels are in stick presentation with cryo-EM density shown in mesh.Key residues involved in interactions are labeled.Interactions are indicated by red dashed lines.(F) Contacts between the N-lobe of Cas12k and the TS of the crRNA-target DNA heteroduplex.(G) Contacts between the C-lobe of Cas12k and the TS of the crRNA-target DNA heteroduplex.

Figure S5 .
Figure S5.In vitro DNA transposition assay for Cas12k mutants.(A) LB agar plate showing colonies after transformation of each transposition reaction.Graph on the right shows mean±SD (n=3).(B) PCR results using purified plasmid as template.Ten colonies are randomly selected for plasmid extraction from each plate in A. Positions of expected PCR readout at ~350 bp are indicated by red dashed boxes.The plates and PCR results were from the same batch as those from Fig. S1E,F.

Figure S6 .
Figure S6.Cryo-EM data processing of the Cas12k-sgRNA binary complex.(A) A representative raw cryo-EM micrograph of the Cas12k-sgRNA complex from a total of 1849 micrographs.(B) Representative, good 2D class averages from a total of 100 images.(C) Three 3D reconstructions from heterogeneous refinement.(D) Supervised heterogenous refinement using two maps in 3D variability analysis as templates.Variable regions are indicated by red circles.(E) Homogeneous refinement of final particle set.Shown on the left is the unsharpened map, and on the right is the sharpened map.(F) Plots of the half-map FSC.(G) Fitting between the cryo-EM map of the Cas12k-sgRNA complex and the atomic model.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .K I D K .G V D . P . .D F E K W Q Q K G K P . .S T V Q .C P L .K T D P R F A G Q P S R Y M S I H I D I F I K G K G I A . . . .N S S V E H Y D C Y .R A A E . . .L .F K N A A A S G R S K K S N A I Q . .K R L Q Q L D G K T R W L E M L N S D A E L V E L S G D T L E A I R V K A A EI L A I A M P A S E S R S .L K . .E L K N M K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .S P H D K K E A L S A was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.It is made