Abstract
Bacteria have evolved adaptive immune systems encoded by Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and the CRISPR-associated (Cas) genes to maintain genomic integrity in the face of relentless assault from pathogens and mobile genetic elements [1–3]. Type I CRISPR-Cas systems canonically target foreign DNA for degradation via the joint action of the ribonucleoprotein complex Cascade and the helicase-nuclease Cas3 [4,5] but nuclease-deficient Type I systems lacking Cas3 have been repurposed for RNA-guided transposition by bacterial Tn7-like transposons [6,7]. How CRISPR- and transposon-associated machineries collaborate during DNA targeting and insertion has remained elusive. Here we determined structures of a novel TniQ-Cascade complex encoded by the Vibrio cholerae Tn6677 transposon using single particle electron cryo-microscopy (cryo-EM), revealing the mechanistic basis of this functional coupling. The quality of the cryo-EM maps allowed for de novo modeling and refinement of the transposition protein TniQ, which binds to the Cascade complex as a dimer in a head-to-tail configuration, at the interface formed by Cas6 and Cas7 near the 3’ end of the crRNA. The natural Cas8-Cas5 fusion protein binds the 5’ crRNA handle and contacts the TniQ dimer via a flexible insertion domain. A target DNA-bound structure reveals critical interactions necessary for protospacer adjacent motif (PAM) recognition and R-loop formation. The present work lays the foundation for a structural understanding of how DNA targeting by TniQ-Cascade leads to downstream recruitment of additional transposon-associated proteins, and will guide protein engineering efforts to leverage this system for programmable DNA insertions in genome engineering applications.
We previously demonstrated that a transposon derived from Vibrio cholerae Tn6677 undergoes programmable transposition in E. coli directed by a CRISPR RNA (crRNA), and that this activity requires four transposon- and three CRISPR-associated genes in addition to a CRISPR array (Fig. 1a [7]). Whereas TnsA, TnsB, and TnsC exhibit functions that are consistent with their homologs from a related and well-studied cut-and-paste DNA transposon, E. coli Tn7 (reviewed in citePeters:2014aa), we showed that TniQ, a homolog of E. coli TnsD, forms a co-complex with the Cascade ribonucleoprotein complex encoded by the Type I-F variant CRISPR-Cas system. This finding suggested an alternative role for TniQ, as compared to the role of EcoTnsD in identifying target sites during Tn7 transposition. Rather, we proposed that RNA-guided DNA targeting by Cascade could deliver TniQ to DNA in a manner compatible with downstream transpososome formation, and that TniQ might interact with Cascade near the 3’ end of the crRNA, consistent with RNA-guided DNA insertion occurring approx. 49-bp downstream from the PAM-distal edge of the target site. To determine this unambiguously, we purified the V. cholerae TniQ-Cascade complex loaded with a native crRNA and determined its structure by cryo-EM. The overall complex adopts a helical architecture with protuberances at both ends (Fig. 1 and Extended Data Fig. 1 and 2). The global architecture is similar to previously determined structures of Cascade from I-E and I-F systems (Extended Data Fig. 3) [8–11] with the exception of a large mass of additional density attributable to TniQ (see below). Maximum likelihood classification methods implemented in Relion3 [12] allowed us to identify significant dynamics in the entire complex, which appears to “breathe”, widening and narrowing the distance between the two protuberances (Extended Data Fig. 1d and Supplementary Movie 1). The large subunit encoded by a natural Cas8-Cas5 fusion protein (hereafter referred to simply as Cas8) forms one protuberance and recognizes the 5’ end of the crRNA via base- and backbone-specific contacts (Extended Data Fig. 4, 5a-c, 6a), akin to the canonical roles played by Cas8 and Cas5 (Extended Data Fig. 3). Cas8 exhibits two primary subdomains formed mainly by α-helices, along with a third domain of approximately 100 residues (residues 277 to 385) that is predicted to form three α-helices but could not be built in our maps due to its intrinsic flexibility (Fig. 1c). However, low-pass filtered maps revealed that this flexible domain connects with the TniQ protuberance at the opposite end of the crescent-shaped complex (Extended Data Fig. 2e). Additionally, there seemed to be a loose coupling between the Cas8 flexible domain and overall “breathing” of the complex, as stronger density for that domain could be observed in the closed state (Extended Data Fig. 1d and Supplementary Movie 1). Six Cas7 subunits protect much of the crRNA by forming a helical filament along its length (Fig. 1b and d), similar to other Type I Cascade complexes (Extended Data Fig. 3 [8–11]. A “finger” motif in Cas7 clamps the crRNA in regular intervals, causing every sixth nucleotide (nt) of the 32-nt spacer to flip out while leaving the flanking nucleotides available for DNA recognition (Extended Data Fig. 4f). These bases are pre-ordered in short helical segments, with a conserved phenylalanine stacking below the first base of every segment. Cas7.1, the monomer furthest away from Cas8, interacts with Cas6 (also known as Csy4), which is the ribonuclease responsible for processing of the precursor RNA transcript derived from the CRISPR locus. The Cas6-Cas7.1 interaction is mediated by a β-sheet formed by the contribution of a β-strands from Cas6 and the two β-strands that form the “finger” of Cas7.1 (Extended Data Fig. 5f). Cas6 also forms extensive interactions with the conserved stem-loop in the repeat-derived 3’ crRNA handle (Fig. 1 and Extended Data Fig. 5d and e), with an arginine-rich α-helix (residues 110 to 128) docked in the ma jor groove, positioning multiple basic residues within interaction distance of the negatively charged RNA backbone. The interaction established between Cas6 and Cas7.1 forms a continuous surface where TniQ is docked, forming the other protuberance of the crescent. The intrinsic flexibility of the complex rendered lower local resolutions in this area of the maps, which we overcame using local alignments masking the area comprising TniQ, Cas6, Cas7.1 and the crRNA handle (Extended Data Fig. 7). The enhanced maps allowed for de novo modeling and refinement of TniQ, for which no previous structure or homology model has been reported (Fig. 2). Notably, TniQ binds to Cascade as a dimer with head-to-tail configuration (Fig. 2), a surprising result given the expectation that EcoTnsD functions as a monomer during Tn7 transposition [13]. TniQ is composed of two domains: an N-terminal domain of approximately 100 residues formed by three short α-helices and a second, larger domain of approximately 300 residues with signature sequence for the TniQ family. A DALI search [14] using the refined TniQ model as a probe yielded significant structural similarity of the N-terminal domain to proteins containing Helix-Turn-Helix (HTH) domains (Extended Data Fig. 8). This domain is often involved in nucleic acid recognition, however there are reported examples where it has been re-purposed for protein-protein interactions [15]. The remaining C-terminal TniQ-domain is formed by 10 α-helices of variable length and is predicted to contain two tandem zinc finger motifs, though this region was poorly defined in the maps (Fig. 2). Overall, the double domain composition of TniQ results in an elongated structure, bent at the junction of the HTH and the TniQ-domain (Fig. 2). The HTH domain of one monomer engages the TniQ-domain of the other monomer via interactions between α-helix 3 (H3) and α-helix 11 (H11), respectively, in a tight protein-protein interaction (Fig. 2c). This reciprocal interaction is complemented by multiple interactions established between the TniQ-domains from both monomers (up to 45 non-covalent interactions as reported by PISA [16]). Tethering of the TniQ dimer to Cascade is accomplished by specific interactions established with both Cas6 and Cas7. 1 (Fig. 3). One monomer of TniQ interacts with Cas6 via its C-terminal TniQ-domain, while the other TniQ monomer contacts Cas7.1 through its N-terminal HTH domain (Fig. 2b, 3). The loop connecting alpha-helices H6 and H7 of the TniQ-domain of the first TniQ monomer is inserted in a hydrophobic cavity formed at the interface of two α-helices of Cas6 (Fig. 3b, d). The TniQ histidine residue 265 is involved in rearranging the hydrophobic loop connecting H6 and H7 (Fig. 3d), which is inserted in the hydrophobic pocket of Cas6 formed by residues L20, Y74, M78, Y83 and F84. The HTH domain of the other TniQ monomer interacts with Cas7.1 through a network of interactions established mainly by α-helix H2 and the linker connecting H2 and H3 (Fig. 3c, e). Thus, both the HTH domain and the TniQ-domain exert dual roles to drive TniQ dimerization and dock onto Cascade. In order to explore the structural determinants of DNA recognition by the TniQ-Cascade complex, we determined the structure of the complex bound to a double-stranded DNA (dsDNA) substrate containing the 32-bp target sequence, 5’-CC-3’ PAM, and 20-bp of flanking dsDNA on both ends (Fig. 4 and Extended Data Fig. 9). Density for 28 nucleotides of the target strand (TS) and 8 nucleotides for the non-target strand (NTS) could be confidently assigned in the reconstructed maps (Fig.4c). As with previous I-F Cascade structures, Cas8 recognizes the double-stranded PAM within the minor groove (Extended Data Fig. 10 [10]), and an arginine residue (R246) establishes a stacking interaction with a guanine nucleotide on the TS, which acts like a wedge to separate the double-stranded PAM from the neighboring unwound DNA where base-pairing with the crRNA begins (Fig. 4b). Twenty-two nucleotides of the TS within the 32-bp target showed clear density, but surprisingly, the terminal nine nucleotides were not ordered. The TS base-pairs with the spacer region of the crRNA in short, discontinuous, helical segments, as observed previously for I-E and I-F DNA-bound Cascade complexes [10,11] with every 6th base flipped out of the heteroduplex by the insertion of a Cas7 finger (Extended Data Fig. 6b). The observed 22-bp heteroduplex is stabilized by the four Cas7 monomers proximal to the PAM (Cas7.6-7.3), but even after local masked refinements, no density could be observed for any TS nucleotides that would base-pair with the 3’ end of the crRNA spacer bound by Cas7.2 and Cas7.1. These two Cas7 monomers are proximal to Cas6 and in the region previously described to exhibit dynamics due to the interaction of the Cas8 flexible domain with the inner face of the TniQ-dimer. In addition, the disordered nucleotides also correspond to positions 25-28 of the target site where RNA-DNA mismatches are detrimental for RNA-guided DNA integration [7]. Thus, we propose the possibility that the partial R-loop structure we observed may represent an intermediate conformation refractory to integration, and that further structural rearrangements may be critical for further stabilization of an open conformation, possibly driven by recruitment of the TnsC AT-Pase. Here we present the first cryo-EM structures of a CRISPR-Cas effector complex bound to the transposition protein TniQ, with and without target DNA. These structures reveal the unexpected presence of TniQ as a dimer that forms bipartite interactions with Cas6 and Cas7.1 within the Cascade complex, forming a likely recruitment platform for downstream-acting transposition proteins18 (Fig. 4d). Our structures furthermore reveal a possible fidelity checkpoint, whereby formation of a complete R-loop requires conformational rearrangements that may depend on extensive RNA-DNA complementarity and/or downstream factor recruitment; this proofreading step could account for the highly specific RNA-guided DNA integration we previously reported for the V. cholerae transposon [7]. In light of recent work demonstrating exaptation of Type V-K CRISPR-Cas systems by similar Tn7-like transposons that also encode TniQ [17,18], it will be interesting to determine whether tethering of TniQ to evolutionarily distinct CRISPR RNA effector complexes - Cascade or Cas12k - is a general theme of RNA-guided transposition.
Methods
TniQ-Cascade purification
Protein components of TniQ-Cascade were expressed from a pET-derivative vector containing the native V. cholerae tniQ-cas8-cas7-cas6 operon with an N-terminal His10-MBP-TEVsite fusion on TniQ. The crRNA was expressed separately from a pACYC-derivative vector containing a minimalrepeat-spacer-repeat CRISPR array encoding a spacer from the endogenous V. cholerae CRISPR array. The TniQ-Cascade complex was overexpressed and purified as described previously [7], and was stored in Cascade Storage Buffer (20 mM Tris-Cl, pH 7.5, 200 mM NaCl, 1 mM DTT, 5% glycerol).
Sample preparation for electron microscopy
For negative staining, 3 μL of purified TniQ-Cascade ranging from 100 nM to 2 μM was incubated with plasma treated (H2/O2 gas mix, Gatan Solarus) CF400 carbon-coated grids (EMS) for 1 minute. Excess solution was blotted and 3 μL of 0.75% uranyl formate was added for an additional minute. Excess stain was blotted away and grids were air-dried overnight. Grid screening for both negative staining and cryo conditions was performed on a Tecnai-F20 microscope (FEI) operated at 200 KeV and equipped with a Gatan K2-Summit direct detector. Microscope operation and data collection were carried out using the Leginon/Appion software. Initial negative staining grid screening allowed determination of a suitable concentration range for cryo conditions. Several grid geometries were tested in the 1-4 μM concentration range for cryo conditions using a Vitrobot Mark-II operated at 4 C, 100% humidity, blot force 3, drain time 0, waiting time 15 seconds, and blotting times ranging from 3-5 seconds. The best ice distribution and particle density was obtained with 0.6/1 UltrAuFoil grids (Quantifoil).
Electron microscopy
A preliminary dataset of 300 images in cryo was collected with the Tecnai-F20 microscope using a pixel size of 1.22 Å/pixel with illumination conditions adjusted to 8 e-/pixel/second with a frame window of 200 ms. Preprocessing and image processing were integrally done in Relion3 [12] with ctf estimation integrated via a wrapper to Gctf [19]. An initial model computed using the SGD algorithm [20] implemented in Relion3 was used as initial reference for a refine 3D job that generated a sub-nanometric reconstruction with approximately 10,000 selected particles. Clear secondary structure features in the 2D averages and the 3D reconstruction could be identified. For the DNA-bound TniQ-Cascade complex containing DNA, we pre-incubated two complementary 74-nt oligonucleotides (NTS: 5’TTCATCAAGCCATTGGACCGCCTTACAGGACGCTTTGGCTTCATTGCTTTTCAGCTTCGCCTTGACGGCCAAAA-3’, TS: 5’TTTTGGCCGTCAAGGCGAAGCTGAAAAGCAATGAAGCCAAAGCGTCCTGTAAGGCGGTCCAATGGCTTGATGAA-3’) for 5 minutes at 95° C in hybridization buffer (20 mM Tris-Cl, pH 7.5, 100 mM KCl, 5 mM MgCl2) to form dsDNA, which was subsequently aliquoted and flash frozen. Complex formation was performed by incubating a 3x molar excess of dsDNA with TniQ-Cascade at 37° C for 5 minutes prior to vitrification, which followed the conditions optimized for the apo complex (defined as TniQ-Cascade with crRNA but no DNA ligand). High resolution data for the apo complex were collected in a Tecnai-Polara-F30 microscope operated at 300 KeV equipped with a K3 direct detector (Gatan). A 30 μm C2 aperture was used with a pixel size of 0.95 Å/pixel and illumination conditions in microprobe mode adjusted to a fluence of 16e-/pixel/second. Four-second images with a frame width of 100 ms (1.77 e-/2/frame) were collected in counting mode. For the DNA-bound complex, high resolution data were collected in a Titan Krios microscope (FEI) equipped with an energy filter (20 eV slit width) and a K2 direct detector (Gatan) operated at 300 KeV. A 50 μm C2 aperture was used with a pixel size of 1.06 Å/pixel and illumination conditions adjusted in nanoprobe mode to a fluence of 8e-/pixel/second. Eight-second images with a frame width of 200 ms (1.42 e-/2/frame) were collected in counting mode.
Image processing
Motion correction was performed for every micrograph applying the algorithm described for Mo-tioncor2 [21] implemented in Relion3 with 5 by 5 patches for the K2 data and 7 by 5 patches for the K3 data. Parameters of the contrast transfer function for each motion-corrected micrograph were obtained using Gctf integrated in Relion3. Initial particle picking of a subset of 200 images randomly chosen was performed with the Laplacian tool of the Auto-picking module of Relion3, using an estimated size for the complex of 200 Å15,000 particles were extracted in a 300 pixels box size and binned 3 times for an initial 2D classification job. Selected 2D averages from this job were used as templates for Auto-picking of the full dataset. The full dataset of binned particles was subjected to a 2D classification job to identify particles able to generate averages with clear secondary structure features. The selected subgroup of binned particles after the 2D classification selection was refined against a 3D volume obtained by SGD with the F20 data. This “consensus” volume was inspected to localize areas of heterogeneity which were clearly identified at both ends of the crescent shape characteristic of this complex. Both ends were then individually masked using soft masks of around 20 pixels that were subsequently used in classification jobs without alignments in Relion3. The T parameter used for this classification job was 6 and the total number of classes was 10. This strategy allowed us to identify two main population of particles which correspond to an “open” and “closed” state of the complex. Particles from both subgroups were separately re-extracted to obtain unbinned datasets for further refinement. New features implemented in Relion3, namely Bayesian polishing and ctf parameters refinement, allowed the extension of the resolution to 3.4, 3.5 and 2.9 Å for the two apo and the DNA-bound complexes, respectively. Post processing was performed with a soft-mask of 5 pixels being the B-factor estimated automatically in Relion3 following standard practice. A final set of local refinements was performed with the masks used for classification. The locally aligned maps exhibit very good quality for the ends of the C-shape. These maps were used for de novo modeling and initial model refinement.
Model building and refinement
For the Cas7 and Cas6 monomers, the E. coli homologs (PDB accession code 4TVX) were initially docked with Chimera [22] and transformed to poly-alanine models. Substantial rearrangement of the finger region of Cas7 monomers, as well as other secondary structure elements of Cas6, were performed manually in COOT [23] before amino acid substitution of the poly-alanine model. Well-defined bulky side chains of aromatic residues allowed a confident assignment of the register. The crRNA was also well defined in the maps and was traced de novo with COOT. For Cas8 and TniQ in particular, no structural similarity was found in the published structures able to explain our densities. Locally refined maps using soft masks at both ends of the crescent-shaped complex rendered well-defined maps below 3.5 Å resolution. These maps were used for manual de novo tracing of a poly-alanine model in COOT that was subsequently mutated to the V. cholerae sequences. Bulky side chains for aromatic residues showed excellent density and were used as landmarks to adjust the register of the sequence. For refinement, an initial step of real space refinement against the cryo-EM maps was performed with the phenix real space refinement tool of the Phenix package [24], with secondary structure restraints activated. A second step of reciprocal space refinement was performed in Refmac5 [25], with secondary restraints calculated with Prosmart [26] and LibG [27]. Weight of the geometry term versus the experimental term was adjusted to avoid overfitting of the model into cryo-EM map, as previously reported30. Model validation was performed in Molprobity [28].
Data availability
Maps and models have been deposited in the EMDB (accession codes 20349, 20350 and 20351) and the PDB (accession codes 6PIF, 6PIG and 6PIJ).
Acknowledgements
We acknowledge Bob Grassucci and Zhening Zhang for technical assistance in cryo-EM data acquisition. Part of this work was performed at the Simons Electron Microscopy Center and National Resource for Automated Molecular Microscopy located at the New York Structural Biology Center, supported by grants from the Simons Foundation (SF349247), NYSTAR, and the NIH National Institute of General Medical Sciences (GM103310).