Pervasive off-target and double-stranded DNA nicking by CRISPR-Cas12a

Cas12a (formerly Cpf1) is an RNA-guided endonuclease in the CRISPR-Cas immune system that can be easily programmed for genome editing. Cas12a can bind and cut dsDNA targets with high specificity in vivo, making it an ideal candidate for precise genome editing applications. This specificity is contradictory to the natural role of Cas12a as an immune effector against rapidly evolving phages. However, the native cleavage specificity and activity remains to be fully understood. We employed high-throughput in vitro cleavage assays to determine and compare the native specificities of three Cas12a orthologs. Surprisingly, we observed pervasive nicking of randomized target libraries, with strong nicking activity observed against targets with up to four mismatches. Nicking and cleavage activities are dependent on mismatch type and position, and vary depending on the Cas12a ortholog and crRNA sequence. Our high-throughput and biochemical analysis further reveal that Cas12a has robust activated non-specific nicking and weak non-specific dsDNA degradation activity in trans. Together, our findings reveal Cas12a cleavage activities that could be beneficial in the context of bacterial CRISPR-Cas immunity but may be detrimental for genome editing technology.


4
In this study, we developed a high-throughput in vitro method to determine the native specificity and cleavage activity of Cas12a orthologs. We show that Cas12a nicks target sequences containing up to four mismatches, even though linearization does not always occur.
We further show that Cas12a has robust activated non-specific nicking activity in trans, which could result in unpredicTable off-target nicking events during genome editing. Activated Cas12a also has weak dsDNA degradation activity for both target and non-specific DNA. Our results report several cleavage activities of Cas12a including cis (target-dependent) and activated trans (target-independent) dsDNA nicking, and cis and trans dsDNA degradation. These non-specific cleavage activities of Cas12a may aid bacteria containing type V-A CRISPR-Cas systems to fight against a variety of rapidly evolving phages.

Cleavage activity of Cas12a against a target library
Genome editing activity and specificity of Cas12a have previously been characterized in vivo (D. Kim et al. 2016;Kleinstiver et al. 2016; H. K. Kim et al. 2017, 1). These studies show that Cas12a has low or no tolerance for mismatches in the target sequence in eukaryotic cells.
However, the native cleavage specificity of Cas12a remains unclear, given that eukaryotic genomic structure may sequester potential off-targets (Hinz, Laughery, and Wyrick 2015;Isaac et al. 2016;Yarrington et al. 2018) and that most off-target analyses only account for the creation of double-strand breaks (Tsai et al. 2015(Tsai et al. , 2017. To directly observe the cleavage activity and specificity of Cas12a, we performed in vitro cleavage assays using a plasmid library (Fig. 1A).
The libraries were designed to contain target sequences with 0 to 7 mismatches, with targets containing 2 or 3 mismatches maximally represented in the pool ("Pool Design, Complexity, and Purification" n.d.). The 0 mismatch "perfect" target was spiked in as an internal control ( Supplementary Fig. 1A). We tested the cleavage activity of three Cas12a orthologs -FnCas12a, LbCas12a and AsCas12a (referred to collectively as Cas12a hereafter). For each Cas12a ortholog, we used two or three different crRNA sequences and corresponding negatively supercoiled plasmids containing the perfect target (pTarget) or target libraries (pLibrary) Supplementary Table 1). The three crRNA and library sequences were designed based on protospacer 4 sequence from Streptococcus pyogenes CRISPR locus 5 (55% G/C), EMX1 gene target sequence (80% G/C) and CCR5 gene target sequence (20% G/C, tested for FnCas12a and AsCas12a), henceforth referred to as pLibrary PS4, EMX1 and CCR5 respectively.
In cleavage assays, Cas12a completely linearized the pTarget within the time course, indicating complete cleavage of both DNA strands (Fig. 1B, Supplementary Fig. 1E). In contrast, only a fraction of pLibrary was linearized and a substantial amount of plasmid remained supercoiled, indicating that many sequences within the pool could not be cleaved by Cas12a in the time span tested (Fig. 1B, Supplementary Fig. 1E). Surprisingly, we also observed a nicked fraction for pLibrary that persisted through the longest time point tested (3 hours) (Fig. 1B,   Supplementary Fig. 1E).
To determine which sequences were uncleaved and nicked, we extracted the plasmid DNA from gel bands for each of these fractions, PCR amplified the target region and performed high-throughput sequencing (HTS) followed by bioinformatic analysis (Fig. 1A, Supplementary   Fig. 2). Although we could not analyze sequences present in the linearized DNA sample, we assumed that sequences that were absent from both the supercoiled and nicked fractions were linearized. We generated mismatch distribution curves for the supercoiled and nicked fractions of the pLibraries subjected to Cas12a cleavage over time ( Interestingly, Cas12a orthologs have differential cleavage activities against the three libraries tested. For pLibraries PS4 and EMX1, we observed rapid depletion of sequences with up to two mismatches from the supercoiled fractions within the first time point (1 min), indicating that all three Cas12a orthologs can tolerate up to 2 mismatches in these target sequences. We also consistently observed more depletion of sequences with three mismatches for LbCas12a and AsCas12a than for FnCas12a, suggesting that FnCas12a is less tolerant of 6 these mismatches. Target sequences with 1 or 2 mismatches were depleted more rapidly for pLibrary CCR5 than the other two pLibraries, suggesting that mismatches can be tolerated better for A/T-rich sequences ( Supplementary Fig. 3B). As with pLibraries PS4 and EMX1, sequences with three mismatches were depleted more rapidly from the pLibrary CCR5 supercoiled fraction for AsCas12a than for FnCas12a.
The mismatch distribution of the nicked fractions was not a uniform shift as observed for the supercoiled fractions ( Fig 1C, Supplementary Fig. 3A, B). For most pLibraries subjected to FnCas12a and LbCas12a cleavage, target sequences with 3 or fewer mismatches were enriched in the nicked fraction in the first few time points but depleted over time, demonstrating that some of the mismatched sequences were eventually linearized. However, AsCas12a displayed less linearization activity against nicked targets, as target sequences with 2 or 3 mismatches were depleted more slowly or remained enriched in the nicked fraction over the entire time course for all pLibraries. We were also surprised to observe that FnCas12a and LbCas12a displayed strong nicking activity against target sequences containing 4 or more mismatches based on the enrichment of these sequences in the nicked fractions at the last time point tested (3 hours) for all pLibraries (Fig. 1C, Supplementary Fig. 3A, B). Overall, these results suggest that FnCas12a and LbCas12a are more tolerant of mismatches in the target sequences, and may be more prone to generating double-strand breaks at off-target sites than AsCas12a.

Sequence determinants of Cas12a cleavage activity
We next looked at the sequences that were present in the supercoiled and nicked fractions to determine the effects of mismatch position and type ( Supplementary Fig. 2). The heatmaps in . Some mismatches in the PAM-proximal "seed" region were enriched in the supercoiled fraction and/or depleted slowly, indicating that these mismatches were not tolerated 7 by Cas12a. We observed a short seed region of ~6 nucleotides for most pLibraries and Cas12a orthologs. This is in agreement with previously reported in vivo and in vitro specificity of Cas12a (D. Kim et al. 2016;Kleinstiver et al. 2016;H. K. Kim et al. 2017;Swarts, Oost, and Jinek 2017), and is shorter compared the ~10 nucleotide length for SpCas9 (Hsu et al. 2013;Liu et al. 2016;Sternberg et al. 2014). A single G substitution in the seed region is highly deleterious for cleavage by Cas12a, while C or T mismatches slowed the rate of cleavage or depletion from the supercoiled fractions to a lesser degree (Fig 2, Supplementary Fig. 4 -10). In contrast, target sequences with a single A mismatch in the seed are generally tolerated for cleavage outside of the first PAM-proximal position for most pLibraries. Outside of the seed, most target sequences containing a single mismatch were rapidly depleted from the supercoiled fraction, indicating that any type of single mismatch outside the seed can be similarly tolerated. However, in some cases single mismatches throughout the target sequence slow the rate of cleavage, most notably for EMX1 pLibrary cleavage by LbCas12a and FnCas12a (Supplementary Fig. 6 -8).
A similar seed-dependent cleavage trend was observed for target sequences with 2, 3 and 4 mismatches (2 MM, 3 MM and 4 MM, respectively) (Fig 2, Supplementary Fig. 4 -10). In general, G substitution is most deleterious for cleavage at all positions in the target sequence with 3 and 4 mismatches. T and C substitutions also slow the rate of cleavage by Cas12a, particularly in pLibrary CCR5 ( Supplementary Fig. 9, 10). For LbCas12a, most target sequences with 2 and 3 mismatches in the PAM-distal regions were eventually depleted in pLibraries PS4 and EMX1 (Fig. 2, Supplementary Fig. 7). FnCas12a tolerates most transition and transversion substitutions in the PAM-distal region of target sequences with 2 and 3 mismatches across all three pLibraries (Supplementary Fig. 4,6,9). AsCas12a tolerates up to 3 mismatches in the PAM-distal region for pLibrary CCR5 ( Supplementary Fig. 10), as seen in the library mismatch distribution of the supercoiled fraction ( Supplementary Fig. 3B). For sequences with 5 and 6 mismatches, we observed a steady enrichment of all sequences regardless of mismatch position or type, indicating that Cas12a does not have sequence-specific cleavage activity against these targets (Fig 2, Supplementary Fig. 4 -10).

Sequence determinants of Cas12a nickase activity
8 To determine sequences that were preferentially nicked by Cas12a, we performed analysis on the nicked fraction similar to the supercoiled fraction. The heatmaps in Fig Supplementary Fig 11 -17). While LbCas12a rapidly linearized target sequences with most single mismatches, FnCas12a and AsCas12a displayed slower kinetics for double-stranded cleavage for some single mismatch sequences. Interestingly, several of these deleterious mismatches were located in the region where the non-target strand is cut (position 16 -18 from the 5' end of the target on the non-target strand) (Zetsche et al. 2015;Strohkendl et al. 2018) (Supplementary Fig. 11,12,15,17). Similarly, target sequences with 2 and 3 mismatches (2 MM and 3 MM, respectively) in the PAM-distal region were enriched in the nicked fraction at early time points (Fig. 3, Supplementary Fig 11 -17). For LbCas12a and FnCas12a, these 2 and 3 mismatch targets were eventually depleted, indicating that these mismatches are tolerated for linearization (Fig. 3,Supplementary Fig. 11,13,14,16). For AsCas12a cleavage of PS4 and EMX1 pLibraries, target sequences with 2 and 3 mismatches in the PAM-distal region remained highly enriched throughout the time course (Supplementary Fig. 12,15,17), indicating that mismatches in the PAM-distal region block the second cleavage step for this Cas12a ortholog.
Taken together, these data suggest that LbCas12a and FnCas12a can linearize most target sequences with 2 and 3 mismatches while AsCas12a can only nick these target sequences. These results support our observations that FnCas12a and LbCas12a are more tolerant to mismatches than AsCas12a for double-stranded cleavage.
FnCas12a and LbCas12a can nick target sequences with 4 or more mismatches, observed as a strong enrichment of these target sequences for almost all pLibraries tested (Fig. 3, Supplementary Fig 11 -17). Similar to the supercoiled fraction, target sequences with 5 and 6 mismatches were uniformly enriched in the nicked fraction irrespective of the mismatch position or type for most pLibraries and Cas12a orthologs (Fig 3, Supplementary Fig. 11 -17). However, AsCas12a did not nick target sequences with 4 or more mismatches for pLibraries PS4 and 9 EMX1 ( Supplementary Fig. 12, 15), but exhibited weak nicking of these target sequences in pLibrary CCR5 ( Supplementary Fig. 17).

Cas12a has non-specific nicking and dsDNA degradation activity
Our HTS data suggests that Cas12a can nick sequences with several mismatches. To validate this observation, we selected sequences from pLibrary PS4 that were relatively enriched in the nicked fraction at the longest time point (3 hours). We cloned sequences containing 2 to 8 mismatches and individually tested Cas12a nicking activity against each plasmid. Consistent . Notably, heatmaps for target sequences with 5 and 6 mismatches indicated enrichment of these sequences in the nicked fraction was sequence non-specific (Fig. 3,Supplementary Fig. 11,13,14,16,17). This led us to hypothesize that Cas12a may have targetactivated non-specific nicking activity against targets with low or no homology to the crRNA. In the pLibrary cleavage assays, the mixed pool of sequences contains the perfect target sequence which may activate Cas12a for non-specific nicking activity (Chen et al. 2018;Li et al. 2018).
To test this, we used a short dsDNA oligonucleotide activator that was fully complementary to the crRNA to activate Cas12a. We formed a complex containing Cas12a, crRNA and the dsDNA activator and tested for cleavage activity against empty plasmid in three forms -negatively supercoiled (nSC), nicked (n) and linear (li). Surprisingly, we observed robust non-specific, trans nicking and partial linearization of the empty negatively supercoiled plasmid by Cas12a (Fig.   5A, Supplementary Fig. 19A). To confirm that the non-specific nicking activity was not due to an artifact in our protein purification, we also tested commercially available LbCas12a (New England Biolabs, NEB LbCas12a) and AsCas12a (Integrated DNA Technologies, IDT AsCas12a). We observed that the non-specific, activator-mediated nicking activity was reproducible with these commercial enzymes ( Supplementary Fig. 19A). This activity was also reproducible with other crRNA-activator pairs, although the activation of the trans nicking activity varied with the crRNA-activator pair ( Supplementary Fig. 20). Taken together, these results demonstrate that Cas12a can be activated for non-specific nicking of dsDNA targets.
While both FnCas12a and LbCas12a have strong non-specific nicking activity, AsCas12a is not strongly activated as a nickase upon target binding ( Supplementary Fig. 19A). The lack of AsCas12a activated nickase activity is evident in the HTS data, where sequences containing more than 4 mismatches were not strongly enriched in the nicked fraction (Supplementary Fig. 12,15,17). The reduced activation could be due to slower rates of PAM-distal product release after cleavage, where the cleaved products hinder RuvC-domain from accessing other dsDNA substrates (Swarts and Jinek 2019;Singh et al. 2018).
In addition to non-specific nicking by activated Cas12a, we also observed linearization of the nicked dsDNA over time and degradation of empty linear and nicked plasmid (Fig. 5A,   Supplementary Fig. 19A). To further investigate this activity, we first tested cleavage of pTarget by Cas12a and performed the cleavage assay for longer time points (up to 24h). We observed slow, processive degradation of the DNA target plasmid after 4 hours of incubation with Cas12a ( Fig. 5B, Supplementary Fig. 19B). Similarly, target-activated Cas12a can degrade non-specific dsDNA after nicking and linearizing it (Fig. 5A, Supplementary Fig. 19A). Like the non-specific nicking and ssDNase activity (Chen et al. 2018;Li et al. 2018), Cas12a can be activated by crRNA-complementary ssDNA binding in a PAM-independent, RuvC-domain dependent manner for non-specific dsDNA nicking and degradation (Fig. 5C, Supplementary Fig. 19C, We next tested whether mismatched target sequences that were present in the pLibrary cleavage assays could also act as activators. Interestingly, Cas12a was activated by some of these mismatched targets as well, especially those that were partially linearized by Cas12a ( Supplementary Fig. 22A). However, mismatched targets that were only nicked by Cas12a (Fig.   4B, Supplementary Fig. 18B, C) were weak activators, indicating that double-strand cleavage of the target activator is important for activated dsDNA nicking, as has been previously observed for target-activated ssDNA degradation (Chen et al. 2018;Li et al. 2018; Jinek 2019).

Discussion
Cas12a has become a widely used tool for various biotechnological applications such as genome editing and diagnostic tools Chen et al. 2018;Gootenberg et al. 2018). Several reports show that Cas12a and engineered orthologs are highly specific for RNAguided dsDNA cleavage activity (D. Kim et al. 2016;Kleinstiver et al. 2016;H. K. Kim et al. 2017). Despite these studies on Cas12a specificity, the cleavage activity and specificity outside of a eukaryotic setting remains unclear. The apparent high specificity of Cas12a in genome editing studies is paradoxical to its natural role as an immune system effector. Phages evolve rapidly and can escape from CRISPR-Cas immunity via mutations (Deveau et al. 2008;Tao, Wu, and Rao 2018). The high specificity of Cas12a may also limit targeting of closely related phages (Andersson and Banfield 2008).
Here we show that Cas12a has additional dsDNA nicking and degradation activities apart from previously described crRNA-mediated cis cleavage of dsDNA targets (Zetsche et al. 2015, 1) and activated trans cleavage of non-specific ssDNA substrates (Chen et al. 2018;Li et al. 2018). Our results demonstrate that Cas12a can nick and, in some cases, create double-strand breaks in targets with up to four mismatches. Similarly, a recent study by Fu et al. (Fu et al. 2019) demonstrated that Cas12a and Cas9 have target-dependent nicking activity against targets with one or two mismatches. We also establish the Cas12a has non-specific dsDNA nicking activity upon binding to a crRNA-complementary DNA. While this manuscript was in preparation, a complementary study reported similar observations, demonstrating that these activities are reproducible in vitro (Fig. 6)  . It is also interesting to note that several proteins in the Cas12 family have strong nickase activities against target DNA, but can only weakly linearize dsDNA targets (Yan et al. 2019;Strecker et al. 2019).
Cas12a has a single active site, and it cleaves the dsDNA target in a sequential order with the nicking of non-target strand (NTS) followed by target-strand (TS) ( attributed to the fraying of DNA at the exposed sites (Andreatta et al. 2006;Fei and Ha 2013).
The exposed ssDNA regions are likely degraded via the ssDNase activity of Cas12a (Chen et al. 2018;Li et al. 2018). Multiple events of the trans nicking activity may also cause the DNA to fall apart ). This may be true in the case of nicked dsDNA substrates where we observed direct degradation rather than an intermediate linear product. This suggests that a 13 cumulative effect of all the trans activities of Cas12a eventually leads to complete degradation of nucleic acid substrates (Fig. 6).
Cas12a has been successfully used for gene editing in vivo without any deleterious offtarget effects (D. Kim et al. 2016;Kleinstiver et al. 2016; H. K. Kim et al. 2017;Tang et al. 2018;Moon et al. 2018). Our high-throughput pLibrary cleavage analysis indicates that Cas12a can bind and cleave sequences with up to four mismatches in vitro. However, the cellular context also plays a role in Cas effector binding and cleavage. SpCas9 can bind to targets containing several mismatches depending on DNA breathing and supercoiling, both in vitro and in vivo (Newton et al. 2019;Wu et al. 2014). Cas12a can stably bind to targets with mismatches in vitro (Singh et al. 2018), but in vivo studies suggest low or no off-target binding ).
This could reflect the inability of Cas12a to unwind and bind DNA in varying topological and cellular contexts which may result in overall lower off-target editing rates by Cas12a.
While off-target sites can be predicted and avoided by careful design of crRNAs, the robust non-specific nicking activity we observed in vitro may lead to off-target editing as nicked DNA can recruit DNA repair machinery in vivo (Kuzminov 2001;Vriend et al. 2016; Krawczyk 2017), although nicks may be repaired by error-free DNA repair pathways (Fukui 2010). Nevertheless, the unpredicTable nature of the non-specific nicking activity makes it difficult to detect the outcomes of nicking. In addition, the commonly used methods to verify off-target editing do not detect nicks (Tsai et al. 2015(Tsai et al. , 2017, meaning that detection of potential off-target effects due to non-specific nicking could require whole genome sequencing. Use of Cas12a orthologs, such as AsCas12a, that display reduced non-specific nickase activity may reduce these unpredictable effects during genome editing experiments. Notably, our in vitro specificity analysis also suggests that AsCas12a is less prone to creating double-strand breaks at sites with two or three mismatches, suggesting that this ortholog may be less prone to off-target cleavage at highly homologous sites. The activities reported in our study add to the growing number of specific and nonspecific Cas12a cleavage activities (Fig. 6) and may provide a possible explanation of how Cas12a compensates for its highly specific targeted cleavage activities as an immune effector.
The target-dependent non-specific nicking and degradation activities, along with previously described dsDNA cleavage (Zetsche et al. 2015) and trans ssDNase activity (Chen et al. 2018;Li et al. 2018), could allow Cas12a to mount a strong defense against different types of invading phages. In the event of phage evolution via mutations, Cas12a may tolerate some mutations and still nick or fully cleave phage DNA. Cas12a could also be activated by the evolved target region of the phage DNA, enabling non-specific nicking and degradation activities. The nonspecific nature of the nicking and degradation activities may also be harmful to the host bacteria.
In CRISPR-Cas systems like type III and type VI, Cas nucleases can be activated for nonspecific cleavage of RNA (Hille et al. 2018). Perhaps, this is a means to prevent phage proliferation by initiating programmed cell death (PCD), abortive infection or dormancy in order to save the bacterial population (Koonin and Zhang 2017;Meeske, Nakandakari-Higa, and Marraffini 2019). Further studies are required to investigate the cost-benefit relation of such nonspecific activities of Cas12a to the bacteria.

Acknowledgements
We thank all the former and current members of the Sashital Lab for helpful discussions and suggestions on various aspects of the project. We thank Michael Baker from the DNA Facility for assistance with HTS data collection, and the Protein Facility for providing access to

Competing interests
Authors declare no competing interests.

Data Availability
HTS data and processed data files from this study have been deposited in the Iowa State University Library's DataShare, and can be found at https://doi.org/10.25380/iastate.8178938.
All other information and data are available from the authors upon request.

Cas12a expression and purification
Purification protocols were adapted from previously established Cas12a purification methods (Mohanraju et al. 2018). All Cas12a proteins were expressed in Escherichia coli BL21

Library creation
To generate a pool of sequences containing mismatches, the library was partially randomized ("Pool Design, Complexity, and Purification" n.d.). The following probability distribution function was used to determine the randomization/doping frequency, where, P is the fraction of the population, L is the sequence length, n is the number of mutations/template and f is the probability of mutation/position (doping level or frequency). The three library sequences tested were a modified protospacer 4 sequence from Streptococcus pyogenes CRISPR locus (55% GC), EMX1 gene target sequence (80% GC) and CCR5 gene target sequence (20% GC) (see Supplementary Table 1 for target sequence). A randomization/doping frequency (f) of 15% was selected to optimize the library to contain a maximum of sequences with 2 or 3 mismatches.

Library preparation
Single-stranded oligonucleotides with 15% doping frequency in the target region were ordered from Integrated DNA Technologies (IDT

Plasmid DNA Cleavage Assay
The protocol was adapted from previously described methods (Anders and Jinek 2014).

Library preparation for HTS
The library plasmid cleavage products were run on an agarose gel as described above to separate the cleaved (linear and nicked) and uncleaved (supercoiled) products. The bands from the nicked and supercoiled fractions from various time points were excised and gel purified using QIAquick Gel Extraction Kit (Qiagen). PCR was used to add Nextera Adapters, followed by another round to add unique indices/barcodes for each sample. Sample size and quality were verified using DNA 1000 kit and Agilent 2100 Bioanalyzer. Samples were sent for MiSeq or NextSeq for paired-end reads of 150 cycles to Iowa State DNA Facility or Admera Health, LLC (New Jersey, USA). 15% PhiX was spiked in.

HTS analysis
HTS data were obtained as compressed fastq files and were processed with custom bash scripts (see associated GitHub repository https://github.com/sashital-lab/Cas12a_nickase). A simple workflow of the analysis is described in Supplementary Fig. 2. Briefly, the files were renamed based on the sample information (pLibrary name, replicate and Cas12a ortholog), stored in separate folders identified by the library, and the target sequences were extracted. Bash scripts were used for obtaining the counts of the extracted target sequences, determining the number of mismatches, calculating the fractions in each replicate, as well as preparing summary Tables for total counts of each mismatched target sequence. Once all the processing was done on the command-line, they were imported in to Microsoft Excel for plotting and summarizing.

20
The fraction of target sequences containing 'n' mismatches (MM) (Fn-MM) in the pool was calculated as: The relative change (enrichment and/or depletion) (Rc) of a sequence 'S' containing 'n' mismatches at each time point 'x' compared to the control pLibrary was calculated as: