High-throughput in vitro specificity profiling of natural and high-fidelity CRISPR-Cas9 variants

Cas9 is an RNA-guided endonuclease in the bacterial CRISPR-Cas immune system and a popular tool for genome editing. The most commonly used Cas9 variant, Streptococcus pyogenes Cas9 (SpCas9), is relatively non-specific and prone to off-target genome editing. Other Cas9 orthologs and engineered variants of SpCas9 have been reported to be more specific than wild-type (WT) SpCas9. However, systematic comparisons of the cleavage activities of these Cas9 variants have not been reported. In this study, we employed our high-throughput in vitro cleavage assay to compare cleavage activities and specificities of two natural Cas9 variants (SpCas9 and Staphylococcus aureus Cas9) and three engineered SpCas9 variants (SpCas9 HF1, HypaCas9, and HiFi Cas9). We observed that all Cas9s tested were able to cleave target sequences with up to five mismatches. However, the rate of cleavage of both on-target and off-target sequences varied based on the target sequence and Cas9 variant. For targets with multiple mismatches, SaCas9 and engineered SpCas9 variants are more prone to nicking, while WT SpCas9 creates double-strand breaks (DSB). These differences in cleavage rates and DSB formation may account for the varied specificities observed in genome editing studies. Our analysis reveals mismatch position-dependent, off-target nicking activity of Cas9 variants which have been underreported in previous in vivo studies.


Introduction
Cas9 is the well-studied effector protein of type II CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR associated) bacterial immune systems (Jiang and Doudna, 2017;Makarova et al., 2020). Cas9 is an endonuclease that uses a dual CRISPR RNA (crRNA) and transactivating-crRNA (tracrRNA) to bind dsDNA targets that are complementary to the guide region of the crRNA and adjacent to a short, conserved protospacer-adjacent motif (PAM) sequence (Gasiunas et al., 2012). Two nuclease domains in Cas9, HNH and RuvC, cut the target and non-target strand respectively, generating a double-stranded break (DSB) in the dsDNA (Jinek et al., 2012) with little post-cleavage trimming (Stephenson et al., 2018). The dual RNAs can be combined into a single guide-RNA (sgRNA) and the targeting region can be varied, making Cas9-sgRNA a readily programmable, two component system for use in various biotechnological applications (Doudna, 2020;Jinek et al., 2012). In particular, DSB formation followed by DNA repair can lead to changes in genomic DNA sequence, enabling genome editing following Cas9 cleavage (Cong et al., 2013;Mali et al., 2013).
Cas9 can tolerate mismatches between the crRNA and the target DNA, which is consistent with its role as a bacterial immune system effector in facilitating defense against rapidly evolving phages (Fu et al., 2016(Fu et al., , 2014(Fu et al., , 2013Hsu et al., 2013;Pattanayak et al., 2013).
Cas9 generally tolerates multiple mismatches in the PAM-distal region while PAM-proximal "seed" mismatches reduce the cleavage activity. This low fidelity leads to off-target activity when used for genome editing applications, as Cas9 can create DSBs at sites with limited homology to the intended target. Many strategies have been developed to reduce off-target activity of Cas9 (Vakulskas and Behlke, 2019). Among these, various natural and engineered variants of Cas9 have been reported to have reduced off-target cleavage activity. While the commonly used wildtype (WT) Streptococcus pyogenes Cas9 (SpCas9) can tolerate multiple mismatches in the target sequence, other naturally occurring Cas9 orthologs from Staphylococcus aureus (SaCas9), Neisseria meningitidis and Campylobacter jejuni are reported to have higher specificity in genome editing compared to SpCas9 (Amrani et al., 2018;Friedland et al., 2015;Kim et al., 2017;Ran et al., 2015). SpCas9 has also been engineered to improve the fidelity of target cleavage activity. Some mutations were designed to reduce DNA target interactions, making the requirement for complete complementary with the crRNA more stringent (Kleinstiver et al., 2016;Slaymaker et al., 2016). Mutations rationally introduced in the REC domain of SpCas9 prevent conformational changes required for nuclease domain activation when a target sequence with mismatches is encountered Dagdas et al., 2017).
Bacterial screens have also been used to select high-fidelity SpCas9 variants that maintain ontarget cleavage but have reduced off-target cleavage activity (Hu et al., 2018;Lee et al., 2018;Vakulskas et al., 2018).
Several methods have been developed to detect and study off-target activities of Cas9 (Tsai et al., 2017(Tsai et al., , 2015Vakulskas and Behlke, 2019). However, methods that measure Cas9 offtarget editing in eukaryotic cells are limited because cellular factors and DNA accessibility sequester potential cleavage sites (Horlbeck et al., 2016;Yarrington et al., 2018). DNA accessibility can also vary depending on cellular processes which may change the outcome and detection of potential Cas9 off-target editing events. These methods also rely on DSBs in the DNA generated by Cas9 or post-cleavage DNA repair and indel formation which can vary among cell types and experiments (Schmid-Burgk et al., 2020;Tsai et al., 2017Tsai et al., , 2015. Differences in the Cas9/sgRNA delivery methods and cell lines have resulted in discrepancies in the reported specificities of high-fidelity Cas9 variants Kleinstiver et al., 2016;Vakulskas et al., 2018).
To avoid these pitfalls, specificity studies can be performed in vitro to detect the native cleavage activities of Cas9 variants (Fu et al., 2019(Fu et al., , 2016(Fu et al., , 2014Huston et al., 2019;Jones et al., 2019;Pattanayak et al., 2013;Zhang et al., 2020). Here, we used a previously established plasmid-based high-throughput in vitro cleavage assay to compare the native cleavage specificity of different Cas9 variants (Murugan et al., 2020). Our method enables the detection of target sequences that may be nicked by Cas9. We tested the cleavage activity of two WT Cas9 orthologs, SpCas9 and SaCas9, and three engineered SpCas9 variants, SpCas9 HF1 (Kleinstiver et al., 2016), hyper-accurate Cas9 (HypaCas9)  and Alt-R ® S.p. HiFi Cas9 (Vakulskas et al., 2018) against two different target library sequences. Each of these three variants represent a version of high-fidelity Cas9 developed via different strategies discussed above. We show that SpCas9 rapidly cleaves target sequences with up to five mismatches. While the high-fidelity Cas9 variants retained cleavage activity against targets with multiple mismatches, they have reduced rates of cleavage compared to SpCas9. High-fidelity Cas9 variants also have higher nicking activity against target sequences with mismatches, resulting in incomplete DSB formation. Overall, our study reveals target-sequence dependent nicking activity that may account for the lower off-target cleavage events observed in genome editing studies with high-fidelity Cas9 variants.

Cleavage activity of Cas9 against target library
We sought to compare the cleavage activity and specificity of Cas9 and engineered variants in a high-throughput manner. We performed a previously established in vitro plasmid library (pLibrary) cleavage assay with Cas9 and dual RNAs (tracrRNA and crRNA) followed by high-throughput sequencing analysis (Fig. 1A, S1, S2A, B) (Murugan et al., 2020). The pLibraries contained a distribution of target sequences with zero and ten mismatches to the crRNA guide sequence, with a maximum representation of target sequences with two to four mismatches in the library (Fig. S2B). We tested the cleavage activity of WT SpCas9, WT SaCas9 and three high-fidelity variants of SpCas9 -SpCas9 HF1, HypaCas9, and Alt-R ® S.p.
HiFi Cas9 (HiFi Cas9) Kleinstiver et al., 2016;Vakulskas et al., 2018). The three high-fidelity variants of SpCas9 will be collectively referred to as HF Cas9 hereafter. We used two different crRNA sequences and generated corresponding negatively supercoiled (nSC) plasmids containing the perfect target (pTarget) or target library (pLibrary) (see methods section -Plasmid and nucleic acid preparation) (Fig. S2B). The two crRNA and library sequences were designed based on protospacer 4 sequence from Streptococcus pyogenes CRISPR locus (55% G/C) and EMX1 gene target sequence (80% G/C), referred to as pLibrary PS4 and EMX1 respectively. We used the differential migration of the nicked (n) and linear (li) cleavage products of negatively supercoiled (nSC) dsDNA plasmid on an agarose gel to analyze Cas9 cleavage activity (Oppenheim, 1981). Cleavage rates of pTarget were variable depending both on target sequence and Cas9 variant (Fig. 1B, S2C). SpCas9 HF1 and HypaCas9 cleaved pTarget PS4 relatively slowly (Fig. 1B) but were comparable to SpCas9 and HiFi Cas9 for pTarget EMX1 (Fig. S2C). While cleavage of pTarget PS4 by SaCas9 was comparable to SpCas9, cleavage of pTarget EMX1 by SaCas9 was substantially reduced in comparison to other Cas9 variants (Fig.   S2C). Cleavage rates of the pLibrary were consistent with those observed for pTarget. While, SpCas9 HF1 and HypaCas9 cleaved pLibrary PS4 relatively slowly, SaCas9 had the slowest rate of cleavage for pLibrary EMX1 (Fig. 1B, C, S2C, D). SpCas9 rapidly cleaved more than 50% of both negatively supercoiled pLibraries, with the vast majority of product DNA becoming linearized (Fig. 1C, S2D). In contrast, for HF Cas9 variants and SaCas9, we observed accumulation of nicked pool, especially for pLibrary PS4 (Fig. 1C, S2D).
We also checked whether cleavage occurred outside of the target region during pLibrary cleavage by testing the cleavage activity of Cas9 against the empty plasmid backbone without and with the different crRNAs (Fig. S2E). The empty plasmid was minimally cleaved by Cas9-tracrRNA:crRNA, except in the case of SpCas9-EMX1 crRNA where a substantial nicked product was observed at the three hour time point. However, we do not observe similar amounts of nicking of the pLibrary EMX1 by Cas9 (Fig. S2D) and further analysis indicated that pLibrary nicking is, in part, target-sequence dependent (see below).
To determine which sequences were cleaved by Cas9 variants, we extracted the plasmid DNA from the supercoiled and nicked pools, performed barcoded-PCR amplification and multiplexed, high-throughput sequencing (HTS) (Fig. 1A, S1). Target sequences cleaved by Cas9 were depleted from the supercoiled pool while those nicked by Cas9 were enriched in the nicked pool. Although we were unable to sequence the linearized pool using PCR-based HTS, for our analysis, we assumed that target sequences absent in both the supercoiled and nicked pools were linearized. We normalized the counts of the target sequences in the HTS data with the fraction of DNA present in the pool at a given time point (Fig. 1C, S2D), and then to the original library to obtain relative, normalized counts (see methods section -HTS analysis).
We generated mismatch distribution curves to compare the cleavage of target sequences containing varying numbers of mismatches with the crRNA across Cas9 variants at each time point or across time points for each Cas9 variant (Fig. 2, S3). As expected, the perfect target (zero mismatch) was quickly depleted from the supercoiled pool of the pLibrary ( Fig. 2A, C, S3A, C). HF Cas9 variants cleaved all target sequences more slowly than SpCas9 for pLibrary PS4 but had comparable cleavage activity for G/C rich pLibrary EMX1. SpCas9 quickly cleaved sequences with up to four mismatches (i.e. target sequences containing four mismatches with the crRNA guide sequence) in the first time point tested for both pLibraries. While HF Cas9 variants cleaved target sequences relatively slowly in pLibrary PS4, they also eventually cleaved target sequences with up to four mismatches ( Fig. 2A, C). Sequences with six or more mismatches were not notably depleted from the supercoiled pool indicating that these sequences were not cleaved by Cas9 ( Fig. 2A, C, S3A, C).
SaCas9 and HF Cas9 variants nicked target sequences with two to four mismatches, based on the accumulation of these sequences in the nicked pool over time (Fig. 2B, D, S3B, D).
Some target sequences with one and two mismatches were initially nicked but subsequently linearized over time. Sequences with three to five mismatches accumulated in the nicked pool for pLibrary PS4 while those in pLibrary EMX1 did not change notably over time (Fig. 2B, D).

Specificity comparison for Cas9 variants
Our HTS data allows us to compare the overall cleavage efficiency and specificity of the To analyze these differences in cleavage efficiencies, we generated a specificity score that reports the relative efficiency of cleavage of on-and off-target sequences over time for the Cas9 variants against the two pLibraries ( Fig. 3E, F) (See methods -HTS analysis). Thus, high specificity score values indicate relatively high specificity, while low values indicate relatively low specificity. SpCas9 scored the lowest for both pLibraries, consistent with its expected low specificity. At early time points, specificity scores of Cas9 variants were highly variable. While SaCas9 initially appeared more specific than SpCas9 at early time points for pLibrary PS4, at later time points both orthologs scored similarly. HF Cas9 variants generally scored higher than SpCas9 for pLibrary PS4, with SpCas9 HF1 and HypaCas9 scoring highest (Fig. 3A, B). In contrast, SpCas9 HF1 scored similarly to SpCas9 and SaCas9 for pLibrary EMX1, indicating that increased specificity for this variant may be target dependent. In general, we observed more spread in specificity scores for pLibrary PS4 than for pLibrary EMX1, suggesting that target dependence is universal to specificity of all Cas9 variants. Interestingly, specificity scores for HF Cas9s changed over time and eventually converged with SpCas9 and SaCas9. This change in specificity score over time indicates that prolonged exposure of HF Cas9 variants can lead to offtarget cleavage activity.

Sequence determinants of Cas9 cleavage activity
To characterize the effects of the mismatch position and type on Cas9 cleavage, we analyzed the sequences present in both the supercoiled and nicked pools (Fig. S1). We generated heatmaps showing the relative abundance of target sequences containing one to six mismatches over time (Fig. 4, 5, S4, S5). These heatmaps show the effect of all possible nucleotides at each position of the sequence in the supercoiled pool when present in a target sequence as a single mismatch or the cumulative effect of multiple mismatches (see methods section -HTS analysis).
The heatmaps revealed a strong PAM-proximal "seed" dependent cleavage activity, as expected for all Cas9s, especially for target sequences with two to four mismatches. The seed region is eight to ten nucleotides from the PAM for Cas9 as previously reported Liu et al., 2016;Sternberg et al., 2014) (Fig. 4, 5, S4, S5). Although target sequences with a single mismatch were depleted from the supercoiled pool by SpCas9 for both pLibraries, mismatches in the seed region slowed down the rate of depletion (Fig. 4, S4). For SaCas9 and HF Cas9 variants, single mismatches in the PAM-distal region were generally more deleterious than for SpCas9. A single mismatch throughout the target sequence in pLibrary PS4 was more deleterious for cleavage by SpCas9 HF1 and HypaCas9 (Fig. 4). HF Cas9 variants also better tolerated single mismatches outside the seed in G/C rich pLibrary EMX1 (Fig. S4).
Similarly, SpCas9 was more tolerant of two to four mismatches outside the seed while supercoiled pool (Fig. 4, S4). In pLibrary EMX1, C and G mismatches slow the rate of depletion of these target sequences for all Cas9s (Fig. S4). The heatmaps also revealed the depletion of some target sequences containing five and some with six mismatches in the PAM-distal end, particularly by SpCas9 and SaCas9 in pLibrary PS4 (Fig. 4).
Cas9 also had high target-sequence dependent nicking activity for target sequences with multiple mismatches (Fig. 5, S5). Cas9 nicked and slowly linearized target sequences with mismatches across the length of the target, although we observed a marked effect of PAM-distal mismatches (Fig. 5, S5). In particular, target sequences with two to four mismatches in the PAMdistal region accumulated over time in the nicked pool. The nicked pool of pLibrary PS4 also contained target sequences with five and six mismatches in the PAM-distal region (Fig. 5). These results indicate that SpCas9 is more promiscuous than previously thought where it can also nick target sequences with three to six mismatches that were not recorded in previous in vitro and in vivo specificity studies (Fu et al., 2019(Fu et al., , 2016(Fu et al., , 2014(Fu et al., , 2013Hsu et al., 2013;Pattanayak et al., 2013). SaCas9 and HF Cas9 variants are also more prone to nicking sequences with several mismatches in the PAM-distal region. For pLibrary EMX1, HiFi Cas9 eventually linearized some of the target sequences with multiple mismatches (Fig. S5).

Effect of two mismatches as function of position and separation on Cas9 cleavage activity
Our library was designed to have a high representation of targets sequences containing two mismatches with the crRNA. To further understand the effects of two mismatches on Cas9 cleavage, we generated heatmaps of the relative abundance of these target sequences as a function of mismatch location and distance between the two mismatches (see methods section -

HTS analysis).
There are 190 possible combinations in which two mismatches can occur in a 20nucleotide sequence, in which the two mismatches can be separated by between 0 and 18 nucleotides. We observed that target sequences with two mismatches that are separated by four nucleotides or less are generally depleted slowly from the supercoiled pool by all Cas9 variants (Fig. 6). However, SpCas9 is more tolerant of two mismatches compared to the other Cas9 variants tested, even when the mismatches are spaced close together (Fig. 6A, B). SpCas9 HF1 initially nicked most sequences with two mismatches separated by any distance in both pLibraries. Over time, most target sequences with two mismatches separated by a distance of six or fewer nucleotides remained nicked while sequences with mismatches that were further apart were linearized by SpCas9 HF1 in pLibrary EMX1.
We next looked at the positional effects when the two mismatches were present within the 10-nucloeotide PAM-proximal seed or the 10-nucleotide PAM-distal region. Here, the two mismatches could be separated by between 0 and 8 nucleotides. All Cas9s best tolerated mismatches that were separated by six or more nucleotides within the seed (Fig. S6A, B). The cleavage tolerance of two mismatches in the seed was similar among the HF Cas9 variants for pLibrary PS4 (Fig. S6A). However, there were notable differences for pLibrary EXM1 (Fig.   S6B). While all Cas9 variants had defects for cleaving sequences with two mismatches separated by two to five nucleotides, SpCas9 HF1 and HypaCas9 depleted these sequences more rapidly from the supercoiled pool. In addition, all Cas9s except HypaCas9 accumulated sequences with two seed mismatches in the nicked pool (Fig. S6B). Together, these data suggest that HypaCas9 tolerated double mismatches in the seed for pLibrary EMX1 more than any other Cas9 variant.
Outside the seed region, most target sequences with two mismatches separated by any distance were eventually depleted from the supercoiled pool over time (Fig. S6C, D). For HF Cas9 variants, two mismatches separated by six or seven nucleotides in pLibrary PS4 were depleted more slowly from the supercoiled pool. Target sequences with two mismatches outside the seed separated by a distance of four nucleotides or less were more prone to nicking by SpCas9 HF1 for both pLibraries (Fig. S6C, D).

Validating the nicking activity against mismatched targets
We observed an enrichment of target sequences with multiple mismatches in the nicked pool of the pLibraries upon Cas9 cleavage. We sought to validate this observation by verifying cleavage of target sequences present in the nicked pool of pLibrary PS4 subjected to cleavage by Cas9 at the longest time point (three hours). We tested cleavage activity of Cas9 variants against target sequences with two to five mismatches (MM). Similar to the HTS data, we observed varying degrees of nicking and linearization of target sequences containing different mismatches for different Cas9 variants (Fig. 7A). We quantified the supercoiled, linearized and nicked pools at two different time points, ten minutes and three hours for the perfect and mismatched target sequences (Fig. 7B). We observed that all Cas9s eventually fully linearized the perfect target (0 MM). Although the two and three mismatch target sequences tested contained two mismatches in the seed sequence, Cas9 linearized these targets with only about 20% remaining nicked after three hours of incubation. We observed little linearization and some nicking of the target sequences with seed mismatches, as in the case of pTarget 4.1 MM, indicating decreased cleavage activity of Cas9. However, Cas9 largely nicked target sequences with mismatches in the PAM-distal region, as in pTarget 4.2 MM. Interestingly, we also observed strong nicking of pTarget 5 MM by all Cas9s. Overall, all Cas9s showed more linearization activity against target sequences with up to three mismatches and nicking activity against targets with more mismatches.

Discussion
Independent studies have reported the specificity of Cas9 orthologs and engineered variants separately Hsu et al., 2013;Huston et al., 2019;Kleinstiver et al., 2016;Vakulskas et al., 2018;Zhang et al., 2020). Here, we directly compared the cleavage activity of five different Cas9 variants against two plasmid libraries. By studying the cleavage activities in vitro, we establish a comparative understanding of the native specificity of Cas9 variants. We show that Cas9 has target sequence-dependent linearization and nicking activities against targets with multiple mismatches. In agreement with Fu et al.'s in vitro studies (Fu et al., 2019(Fu et al., , 2016(Fu et al., , 2014, WT SpCas9 has sequence-dependent cleavage activity against targets with one or two mismatches. In addition, we establish that SpCas9 can nick and/or linearize sequences with up to five mismatches. This low-fidelity cleavage activity of SpCas9 has also been recorded in in vivo specificity studies (Fu et al., 2013;Hsu et al., 2013).
In parallel to SpCas9, we tested the cleavage activities of a natural ortholog, SaCas9 and three engineered SpCas9 variants -SpCas9 HF1, HypaCas9 and HiFi Cas9. Our analysis revealed that the high specificity of SaCas9 and HF Cas9 variants observed in previous studies may stem from the lack of detection of nicking by Cas9. Most studies test for DSB at target and off-target sites and/or indel formation Huston et al., 2019;Kleinstiver et al., 2016;Schmid-Burgk et al., 2020;Vakulskas et al., 2018;Zhang et al., 2020). While SpCas9 linearized most target sequences with multiple mismatches as previously observed, SaCas9 and HF Cas9 variants often only nicked these sequences. Although HiFi Cas9 had similar on-target cleavage activity to SpCas9, it had higher off -target nicking activity.
Recent in vitro and in vivo studies profiled kinetic properties and genome editing activities of natural and high-fidelity Cas9 variants (Jones et al., 2019;Schmid-Burgk et al., 2020). In these studies, SpCas9 had high mismatch-tolerance and in turn, more off-target cleavage activity whereas HF Cas9 variants had reduced off-target cleavage activity. While the in vitro study concluded that SpCas9 HF1 had high cleavage specificity (reduced off-target cleavage rate compared to on-target cleavage), the in vivo study found that HiFi Cas9 had the best on-target activity and specificity ratio. Neither of the two studies were designed to address nicking of the target sequence. Nicked DNA may be subject to error-prone DNA repair, but nicks are more likely repaired by error-free DNA repair pathways (Fukui, 2010;Kuzminov, 2001;Vriend and Krawczyk, 2017). It is also important to note that off-targets that can be nicked by Cas9 may also be sequestered by chromatin structure in vivo (Horlbeck et al., 2016;Yarrington et al., 2018). Nevertheless, new methods developed to detect nicks may be used to detect potential off-target nicking activity that are unreported in the previous studies (Cao et al., 2019;Elacqua et al., 2019).
Cas9 cleavage specificity depends on both the position-specific mismatches (within or outside the seed) and the target sequence Huston et al., 2019;Liu et al., 2016;Zhang et al., 2020). We tested two different pLibraries that show the effects of target-crRNA sequence and various nucleotide substitutions across the target region. Our heatmaps show that mismatches in the PAM-distal region of the target sequence in both pLibraries contribute to target nicking rather than DSBs. While SpCas9 had a similar specificity score for both pLibraries, the specificity of other Cas9 variants not only depended on the target sequence but also the time of exposure. SaCas9 had a similar specificity score as SpCas9 for both pLibraries despite the differences in the cleavage activities against these pLibraries. SpCas9 HF1 and HypaCas9 had similar specificity scores as SpCas9 at longer incubation times. HiFi Cas9, however had a similar cleavage activity against both pLibraries tested but was more specific compared to SpCas9.
Cas9 undergoes conformational changes in the REC domain upon binding to its guide RNA Jinek et al., 2014;Nishimasu et al., 2014). Following target recognition and binding, further domain rearrangements occur. Cas9 cleavage activity is limited by R-loop formation upon target recognition (Gong et al., 2018;Singh et al., 2018). HF Cas9 variants were reported to have higher target sequence unwinding specificity compared to SpCas9 (Okafor et al., 2019). HF Cas9 variants can also discriminate against cleavage of target sequences with mismatches due to decreased rates of cleavage, which may result in off-target DNA release rather than cleavage (Liu et al., 2019). SpCas9 HF1 was designed to disrupt contacts made by Cas9 and the target strand (Kleinstiver et al., 2016). This alters target binding with a stringent requirement of base-pairing between the crRNA and target DNA. This possibly leads to R-loop destabilization and in turn, discrimination against targets with mismatches for cleavage as seen in in vivo studies. In vitro binding and cleavage analysis of an oligonucleotide library by HiFi Cas9 showed reduced cleavage of target sequences with mismatches compared to SpCas9 . Our pLibrary cleavage assays indeed showed slower rates of cleavage of target sequences with mismatches by HF Cas9 variants. Although, not all target sequences with mismatches were fully cleaved by HF Cas9 variants compared to SpCas9, the reduced rate of linearization and nicking may possibly be due to R-loop collapse or premature release of these targets following nicking.
Cas9 cleaves dsDNA using the HNH and RuvC domains that cleave the crRNAcomplementary target strand and non-target strand, respectively (Chen et al., 2014;Jinek et al., 2012). Upon binding to a target, the HNH domain is re-positioned for target strand cleavage.
However, some mismatches in the target sequence may prevent HNH domain movement Sternberg et al., 2015). RuvC domain activation is allosterically controlled by HNH conformational changes but HNH nuclease activity is not a prerequisite. HypaCas9 was designed with mutations in the REC domain that prevent HNH domain activation and movement . Both SpCas9 HF1 and HiFi Cas9 have a single mutation in the REC domain (Kleinstiver et al., 2016;Vakulskas et al., 2018). Upon binding to a target sequence with mismatches, sufficient HNH domain movement may trigger cleavage of the non-target strand by the RuvC domain without activation of the HNH domain for target strand cleavage, leading to nicking of the non-target strand.
Cas9 generates DSB in the target with little post-cleavage trimming of the non-target strand (Jinek et al., 2012;Stephenson et al., 2018). Unlike other CRISPR systems where a Cas nuclease has processive cleavage activity to clear the invading phage nucleic acid upon recognition and/or cleavage, Cas9-containing host bacteria may rely on host nucleases to clear the Cas9-targeted phage DNA (Hille et al., 2018). Despite the lack of any strong post-cleavage activities, Cas9 provides robust protection against phages (Barrangou et al., 2007). This is possibly due to the high mismatch tolerance of Cas9 which may enable better protection against closely related and rapidly evolving phages. Additionally, the ability to nick target sequences with multiple mismatches may also contribute to the immunity against phages. Off-target, sequence-dependent nicking or non-specific nicking have been recently reported for several other Cas effector proteins (McMahon et al., 2020;Murugan et al., 2020;Yan et al., 2019). Here, target sequence-dependent nicking activity may help to slow down phage replication and fight against phages that develop mutations in the target region (McMahon et al., 2020;Tao et al., 2018). Further studies are required to demonstrate the benefits of target sequence-dependent nicking activity in vivo.

Cas9 expression and purification
All Cas9 proteins were expressed in Escherichia coli BL21 (DE3) cells. Overnight cultures of the cells carrying the expression plasmid was used to inoculate 2X TY broth supplemented with corresponding antibiotics in 1:100 ratio. Cultures were grown at 37 °C to an optical density (600 nm) of 0.5 − 0.6 and IPTG was added to a final concentration of 0.2 mM to induce protein expression. The incubation was continued at 18 °C overnight (~16 -18 hours) and harvested the next day for protein purification.
SpCas9 was purified by the following protocol (Chen et al., 2014). Cells were resuspended in Lysis Buffer I (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 10 mM imidazole, and 10% glycerol) supplemented with PMSF. A sonicator or homogenizer was used to lyse the cells and the lysate was centrifuged to remove insoluble material. The clarified lysate was applied to a Alt-R ® S.p. HiFi Cas9 was provided by Integrated DNA Technologies (IDT) -HiFi Cas9.

Library creation
As previously described (Murugan et al., 2020), the library was partially randomized to generate a pool of sequences containing mismatches (Pollard et al., 2000). The following probability distribution function was used to determine the randomization/doping frequency, where, P is the pool of the population, L is the sequence length, n is the number of mutations/template and f is the probability of mutation/position (doping level or frequency). A randomization/doping frequency (f) of 15% was selected to optimize the library to contain a maximum of sequences with 2 to 4 mismatches.
The number of different mutation combinations (MM c ) for a given number of mutations, n and sequence length, L, regardless of the doping level/frequency is determined by, The total number of unique target sequences with a single mismatch is 60, with 2 mismatches is 1,710, and with 3 mismatches is 30,780, etc. We used two library sequences that we previously tested (Murugan et al., 2020), a modified protospacer 4 sequence from Streptococcus pyogenes CRISPR locus (55% GC) and EMX1 gene target sequence (80% GC) (see Supplementary Table 1 for target sequence).

Plasmid and nucleic acid preparation
All DNA oligonucleotides used in this study were synthesized by IDT or Thermo Scientific. RNAs (tracrRNA and crRNA) and single-stranded library oligonucleotides with 15% doping frequency in the target region were ordered from IDT. Supplementary Table 1 has the sequences of DNA oligonucleotides and RNA used in this study.
Gibson assembly was used to generate libraries and target plasmids. The oligonucleotides for the libraries, target and mismatched targets were diluted to 0.2 µM in 1X NEBuffer 2. pUC19 vector was amplified using primers listed in supplementary table 1 via PCR to insert homology arms. The PCR reaction was subjected to DpnI digestion and PCR cleanup (Promega Wizard SV Gel and PCR Clean-Up System). 30 ng of PCR amplified pUC19, 5 µL of oligonucleotide (0.2 µM) and ddH2O to bring the volume to 10 µL were mixed with 10 µL 2X NEBuilder HiFi DNA Assembly Master mix (New England Biolabs) and incubated at 50 °C for 1 hour. NEB Stable competent cells were transformed with 2 µL of the assembled product, as per the manufacturer's protocol. After the recovery step, all of cells in the outgrowth media were used to inoculate 50 mL LB supplemented with ampicillin and incubated overnight at 37 °C. The following precautions were taken to ensure the plasmid was intact and remain supercoiled after extractions as described before (Murugan et al., 2020). Cells were cooled on ice before harvesting for the plasmid extraction using QIAGEN Plasmid Midi Kit. All the initial steps from lysis to neutralization for plasmid extractions (pTarget, pLibrary and empty plasmid) were performed on ice with minimum mechanical stress. Plasmid were stored as aliquots that were used for up to 10 freeze-thaw cycles. Two different pLibrary assembly reactions and preparations were used for the two replicates of the high-throughput in vitro cleavage assays (Fig. S2A).
For controls, target plasmids and empty pUC19 were linearized by restriction enzyme digestion using BsaI-HF and nicked using a nicking enzyme Nt.BspQI (New England Biolabs).
All restriction digestion reactions were carried out as per the manufacturer's protocols. All sequences were verified by Sanger sequencing (Eurofins Genomics, Kentucky, USA). The topology of the extracted and restriction digested plasmids was verified on an agarose gel before using in cleavage assays (Fig. S2A).

In vitro cleavage assay and analysis
The protocol was adapted from previously described methods .
Briefly, Cas9:tracrRNA:crRNA complex was formed by incubating Cas9 and tracrRNA:crRNA (1:1.5 ratio) in 1X reaction buffer (20 mM HEPES, pH 7.4, 100 mM potassium chloride, 5 mM magnesium chloride, 1 mM dithiothreitol, and 5% glycerol) at 37 °C for 10 min. Cas9 RNP complex was mixed with pTarget, pLibrary or empty plasmid (150 ng) to initiate cleavage reactions and incubating at 37 °C. Phenol-chloroform was used to quench reactions at different time points. The aqueous layer was extracted and separated on a 1% agarose gel via electrophoresis and stained with SYBR safe or RED safe stain for dsDNA visualization. Excess tracrRNA:crRNA was used in cleavage assays to prevent any RNA-independent cleavage activity (Sundaresan et al., 2017). For library, mismatched target plasmid and empty plasmid cleavage assays, 100 nM Cas9 and 150 nM tracrRNA:crRNA was used. Concentrations of pLibrary, pTarget and empty pUC19 used were at 150 ng/10 µL (8.6 nM) reaction.
The FR SC , FR N and FR L were determined for each of the time points 't'. FR for time point 0 was the determined for the negative control pLibrary (i.e. pLibrary run on a gel after preparation as represented in Fig. S2A).

Library preparation for HTS
Agarose gel electrophoresis (as described above) was used to separate the library plasmid cleavage products into cleaved (linear and nicked) and uncleaved (supercoiled) products. The bands from the nicked and supercoiled pools from various time points were excised separately and were individually gel purified using QIAquick Gel Extraction Kit (Qiagen). Nextera Adapters (NEA) designed to amplify the target region in the pLibrary and standard Nextera unique indices/barcodes to multiplex the samples were added to with two rounds of PCRs (see Supplementary Table 1

HTS data analysis
The analysis of HTS data was done as previously described (Murugan et al., 2020).
Briefly, the library HTS data were processed with custom bash scripts (see associated GitHub repository: https://github.com/sashital-lab/Cas9_specificity). A simple workflow of the analysis is described in supplementary figure 1, adapted from our previous study on Cas12a (Murugan et al., 2020). Target sequences were extract along with the counts of the extracted target sequences and the number of mismatches. Target sequence information were imported into Microsoft Excel for plotting and summarizing, post command-line processing.
In each pool, the fraction of target sequences containing 'n' mismatches (MM) (F n-MM ) was calculated as follows. The RA for the perfect target sequence (0 MM), RA 0MM was calculated using the above equation. The RA for target sequences with 1 to 5 MM at each time point 't', RA 1-5MM-t was calculated by summing EA for 1 to 5 MM, EA 1-5MM-t and normalizing to the sum of EA of 1 to 5 MM in the negative control pLibrary, EA 1-5MM-0 (i.e. pLibrary run on a gel after preparation as represented in Fig. S2A). The relative cleaved fraction of counts or abundance (RA FR ) was determined by subtracting these RA 0MM or RA 1-5MM values from 1 and plotted against time. The specificity score for Cas9 cleavage was calculated by dividing RA 0MM by RA 1-5MM (RA 0MM / RA 1-5MM ). The cleavage rates differed among the Cas9 variants and against target sequences with varying mismatches. Therefore, each RA value was normalized to the maximum RA value present within each Cas9 and mismatch set in either the supercoiled or nicked pool to scale the relative abundance from 0 to 1 for the heatmaps. For the analysis of target sequences with two mismatches, the sequences with 2 mismatches were extracted. The distance between the two mismatches and the total counts for sequences separated by that distance were determined. The counts were normalized to the number of possible ways the two mismatches can occur, and the normalized RA was calculated similarly to the heatmaps. (pLibrary PS4, right) by Cas9 variants, resulting in linear (li) and/or nicked (n) products. Time points at which the samples were collected are 1 min, 5 min, 30 min, 1 h, and 3 h.