Efficient strategies to detect genome editing and integrity in CRISPR-Cas9 engineered ESCs

CRISPR-mediated genome engineering provides a powerful tool to study the function of genes and proteins. In the past decades, the advances in genome and transcriptome sequencing techniques have shed light on the genetic causes underlying many human diseases, such as neurodevelopmental disabilities or cancer. Sometimes, a single point-mutation in a protein coding gene has been identified as the primary cause of the disease. CRISPR-Cas offers the possibility to introduce or remove such a mutation of interest to understand disease mechanisms and even bears therapeutic potential. We describe the adaptation of an experimental strategy that allows the mutation of protein residues in mouse embryonic stem cells (ESCs) and propose a new screening method, Mismatch-qPCR, to reliably detect editing events in clonal cell lines as an alternative to restriction digest or Sanger sequencing. Finally, we show that RNA-Sequencing (RNA-Seq) data or low-coverage genomic sequencing data can be used to detect large chromosomal deletions and rearrangements that frequently occur at the CRISPR-targeting site.

Introduction 1 CRISPR-Cas is an adaptive immune system that protects bacteria and archae against foreign 2 DNA (Jinek et al., 2012;Makarova et al., 2011). In recent years, components of this system have 3 been modified and made applicable for genome engineering in mammalian cells (Charpentier 4 and Doudna, 2013; Ran et al., 2013). The main components are the endonuclease Cas9 that can 5 cleave double-stranded DNA molecules, and a single-guide RNA (sgRNA). The sgRNA acts as a 6 scaffold and directs Cas9 to a genomic site of interest by a short 20 nucleotide complementary 7 guide sequence. The requirement for Cas9 to bind and cleave the targeted genomic sequence is a 8 protospacer adjacent motif (PAM) in the DNA, most commonly a 5'-NGG" motif where N is 9 any nucleotide followed by two guanine nucleotides. Cas9 introduces double-strand breaks into 10 the DNA, which can can be repaired by the Non-homologous end joining (NHEJ) or the 11 homology directed repair (HDR) pathway. NHEJ ligates the DNA strands in an error-prone way 12 that results in insertion or deletion (Indel) mutations at the repair site. Indels can cause 13 frameshift and the formation of premature stop codons, resulting in gene knock-out. The more 14 precise HDR pathway repairs the DNA according to a repair template, which can be a 15 chromosome, exogenously supplied plasmid or single-stranded DNA with homology to the DSB 16 site. This pathways allows precise gene editing by introducing nucleotide changes of interest. 17 Previous studies showed that the frequency of CRISPR-editing via the HDR pathway is low 18 (Ran et al., 2013). Thus, it is required to establish a robust way for high-throughput screening of 19 1/16 many clonal cell lines. So far, screening for genome editing is mostly done by restriction digest 20 (Ran et al., 2013) or Sanger sequencing. Restriction digest requires the introduction of 21 nucleotide changes that give rise to a restriction site to help identifying edited clones. This 22 procedure can be tedious and it is not always feasible to change nucleotides in a way to create a 23 restriction site while maintaining the amino acid sequence of the encoded protein. Alternatively,24 Sanger sequencing is the most reliable technique to help identify editing events in cells, but is 25 expensive if applied to large numbers of clonal cell lines. 26 In the field of epigenetics, systematic histone mutation studies have already delivered direct 27 clues for the functional significance of histone residues in individual organisms such as yeast and 28 fly (Dai et al., 2008;Hodl and Basler, 2012;Pengelly et al., 2013). In mammalian cells, 29 CRISPR-mediated precise genome editing of histones will help in understanding the role of 30 specific residues and their post-translational modifications for gene regulation in stem cells and 31 during development. Thus, we tested and adapted the CRISPR-Cas9 system developed by Ran 32 et al. (2013) for mutation of histone variant H3.3 in ESCs, specifically to exchange lysine 33 residues with alanine, but our approach can be extended to other genes of interest. For this 34 purpose, we develop a new screening method, Mismatch-qPCR, to reliably detect editing events 35 in clonal cell lines. The new strategy proved to be more time-effective than screening by 36 restriction digest, reduced the costs of screening many cell lines compared to Sanger sequencing 37 and abolished the need to insert a restriction site into the genome. Finally, we address the issue 38 of large chromosomal deletions and rearrangements that have been reported to occur during 39 CRISPR editing (Lee and Kim, 2018;Zhang et al., 2015). For this, we tested the feasibility of 40 leveraging functional genomic data frequently employed in the downstream analysis of CRISPR 41 generated cell lines. Specifically, using sequencing data of either Chromatin CRISPR system developed by Ran et al. (2013). The plasmid encodes a fusion protein of Cas9 48 and GFP, which allows cell selection by flow cytometry, and a single-guide RNA for targeting of 49 Cas9 to a genomic site. The delivery of this plasmid and a repair template is crucial for 50 successful editing, but primary cells, including ESCs, are difficult to transfect using traditional 51 transduction methods such as liposomal reagents. We tested if electroporation with a 52 nucleofector system (nucleofection) is suitable for the delivery of the CRISPR plasmid into ESCs 53 and analyzed the efficiency by flow cytometry. Overall, around 5% of all sorted cells were 54 GFP-positive and therefore successfully transduced (Fig. 1, gating strategy for this experiment 55 is depicted in Supplementary Fig. S1). This proportion is sufficiently high to obtain the 56 required cell numbers for gene editing. Next, we continued with optimizing the conditions for 57 gene editing of endogenous H3.3 in ESCs using this CRISPR-Cas9 system. 58 Figure 1. Transduction efficiency in ESCs with a plasmid-based CRISPR system using nucleofection. Nucleofected ESCs were analyzed by flow cytometry for GFP expression, which represents successful delivery of the Cas9-encoding plasmid. Displayed are GFP-signal (x-axis) against RFP-signal (y-axis) and rectangles indicate areas of positive cells expressing the analyzed Cas9-GFP fusion protein.
Scr7 promoters CRISPR-Cas9 mediated gene editing in ESCs Gene editing through 59 the HDR pathway occurs at lower frequencies than gene knockout via the NHEJ pathway, but 60 treatment with small molecules has been proposed to promote the frequency of HDR in cells. 61 We tested the efficiency of two small molecules, Scr7 and L755,507, to promote gene editing of 62 the H3.3 genes (H3f3a or H3f3b). Scr7 has been reported to promote editing via HDR by 63 inhibiting the activity of DNA ligase IV, an important enzyme in the competing NHEJ pathway 64 (Chu et al., 2015;Maruyama et al., 2015). L755,507 is a β3-adrenergic receptor partial agonist 65 reported to enhance gene editing, but the mode of action is unknown (Yu et al., 2015). The 66 treatment with individual small molecules at concentrations between 1-10 µM did not visibly 67 reduce cell viability, but to minimize potential toxicity the cells were treated with the small 68 molecules for only 36 hours of the culture (12 hours before and 24 hours after delivery of the 69 Cas9-plasmid and single-stranded repair template by nucleofection).

70
In untreated ESCs, we did not obtain edited cell lines carrying the mutation of interest, neither 71 with L755,507 treatment. Using the Scr7 inhibitor, we obtained edited cell lines that had 72 incorporated nucleotide changes according to a supplied repair template. Thus, treatment with 73 Scr7 inhibitor resulted in a higher frequency of editing events than without treatment or by 74 treatment with L755,507 (Table 1). Overall, the editing frequency of either H3f3a or H3f3b was 75 around 0-2% of transduced cells for homozygous editing and 0-10% for heterozygous editing.  Figure 2. Scr7 promotes CRISPR gene editing in ESCs. Cells were transduced with CRISPR-Cas9 plasmids targeting the H3f3b gene and repair templates carrying a restriction site for genomic insertion. A bulk of cells expressing Cas9-GFP was selected by flow cytometry. The targeted H3f3b locus was amplified by PCR and subjected to restriction digest. Digestion pattern was analyzed by agarose gel electrophoresis. Successful integration of the restriction site results in cleavage of the PCR product (blue arrowhead) and the occurrence of smaller digestion products (orange arrowheads). HDR frequency was calculated as the ratio of band intensities.

76
Mismatch-qPCR as a high-throughput screening method to detect gene editing

86
The systematic exchange of multiple protein residues of in mammalian cells can only be achieved 87 if a reliable method allows high-throughout screening of many clonal cell lines at reduced costs. 88 Whereas Sanger sequencing is a fast and precise method for screening, it is expensive if applied 89 to many cell lines. Instead, screening by restriction digest is inexpensive, but tedious and 90 requires the insertion of a restriction site into the targeted genomic locus.

91
We tested if CRISPR-editing events can be detected in a quantitative PCR (qPCR) by designing 92 mutation-specific primers that recognize the inserted nucleotide changes of interest ( Fig. 3a,b), 93 referred to as Mismatch-qPCR.  Figure 3. Mismatch qPCR screen detects CRISPR-mediated point-mutations in H3.3B. (a) Guide sequences were designed to direct Cas9 close to the mutation site of interest. The repair template contains nucleotide changes to introduce a target mutation, and 3 additional synonymous mutations into the guide binding site or PAM to prevent re-cleavage after repair. Optionally, synonymous mutations can give rise to a new restriction site used to validate clones. The mutation-specific primer recognizes nucleotide changes that arise after CRISPR-editing at the most 3'end. The wild-type (WT) primer recognizes the same, but unmodified genomic site. (b) Examples of two mutation-specific primers for Mismatch qPCR that detect editing of lysine 4 to alanine in H3.3B by recognizing either the K4A mutation or the synonymous mutations inside the guide. (c) Mismatch qPCR screen of CRISPR cell lines using mutation-specific and wild-type primers. Successful amplification result in an increase of the fluorescent signal (y-axis) at lower cycle numbers (x-axis). DNA of homozygously edited clones is amplified only with mutation-specific primers, whereas heterozygous clones are also amplified using the wild-type primer. (d) Confirmation of editing events by restriction digest using a newly introduced restriction site after CRISPR targeting. DNA of wildtype cells (WT) and positive clones predicted by Mismatch qPCR screening were used for PCR amplification followed by restriction digest with BanI. Digestion pattern was analyzed by agarose gel electrophoresis. Digestion of the PCR product (red arrowhead) of wild type DNA results in a larger product (blue arrowhead) than from edited DNA (orange arrowheads) with an additional integrated restriction site. Restriction digestion confirms the detected editing events by qPCR.

5/16
Using this method, we were able to separate edited clones from wild type clones by shifts to 95 lower cycle threshold numbers (rounds of amplification) (Fig. 3c). In combination with a primer 96 that recognizes the unchanged wild type allele, it was possible to distinguish heterozygous from 97 homozygous clones. Heterozygous clones with one mutant and one wild type allele amplify in a 98 qPCR reaction with both primer sets, while homozygous clones only amplify using the 99 mutation-specific primer. Using restriction digest, we confirmed homozygosity and 100 heterozygosity of the clonal lines, which can be identified by the complete or incomplete 101 digestion of a PCR product (Fig. 3d) and the results were in agreement with the results from 102 the qPCR screen. After identification of potential candidate clones by Mismatch-qPCR, the 103 exact genotype of the edited clones has to be determined by Sanger sequencing to confirm that 104 the mutation was introduced correctly in both alleles. Hereby, we confirmed the successful 105 exchange of lysine 4 or 36 in H3.3B (Fig. S2). For some candidate clones that were detected by 106 screening, we observed incomplete repair resulting in additional small deletions around the guide 107 binding site. Only clonal lines that have incorporated nucleotide changes correctly from the 108 repair template can be used for downstream analysis.

109
CRISPR off-target analysis for gene copy number alterations Double-strand 110 cleavage by Cas9 can cause unintended off-target effects that affect genome integrity (Lee and 111 Kim, 2018;Zhang et al., 2015). As a next step, we wanted to confirm that during clonal 112 selection and CRISPR targeting the integrity of the genome was not affected in selected clones, 113 e.g. by chromosomal rearrangements.

114
Gene expression data has been previously demonstrated to be predictive of somatic gene copy 115 number alterations in the absence of accompanying genomic data in cancer cells (Ben-David 116 et al., 2016;Fehrmann et al., 2015). Since gene expression data is regularly generated in 117 contemporary genomic studies from CRISPR-edited or knock-out cell lines, we tested if it can be 118 exploited to confirm their genomic integrity after targeting. The advantage would be to exclude 119 clonal cell lines with chromosomal deletions or duplications, which could otherwise complicate 120 downstream analysis. Using mRNA-Seq, we determined gene expression changes in CRISPR cell 121 lines relative to their wild type ESC line of the same genetic background. The gene expression 122 changes were compared to the genomic coordinates of the respective gene. A sequence of down-123 or up-regulated genes that are located in proximity to each other indicates large-scale 124 chromosomal abnormalities (Fig. 4a). Using this strategy, we found that incomplete repair of a 125 chromosome can result in large copy number alterations (frequently chromosome arm losses), 126 typically beginning at the CRISPR target site and spanning the rest of the chromosome arm 127 (Fig. 4b,c,d). Such events result in hemizygous loss of hundreds of genes. We observed that 128 chromosome arm-losses can occur independently of the exact CRISPR guide sequence and on 129 different chromosomes, given that they were detected during the targeting of H3f3a on 130 chromosome 1 or H3f3b on chromosome 11 (Fig. 4b,c). Both H3.3-encoding genes are located at 131 the periphery of chromosome 1 and 11, respectively, and it is possible that CRISPR-targeting of 132 genes at the ends of chromosomes are more likely to result in hemizygous chromosome-arm loss 133 since larger changes in gene copy number that would result from a mid-chromosome cut may be 134 less well tolerated. It should be noted that such deletions are not detectable by traditional 135 Sanger Sequencing, because only the intact allele is amplified in a PCR reaction. Additionally, 136 we also observed rearrangements of chromosomes that were not targeted by CRISPR, e.g. of 137 chromosome 6 (Fig. 4d). These rearrangements can potentially be CRISPR off-target effect, but 138 may also have occurred spontaneously during clonal selection.

7/16
To increase the confidence in genome integrity predictions from RNA-Seq data, we tested 140 whether the predicted chromosomal deletions/duplications can be confirmed on the genomic 141 level by using low-coverage genomic sequencing data, such as ChIP-Seq Inputs. From RNA-Seq 142 data of a chosen cell line, a cluster of up-regulated genes and a cluster of down-regulated genes 143 between the CRISPR target site and the chromosome end were detected (Fig. 5a), perhaps 144 indicative of a simple breakage fusion bridge cycle initiated by a DNA break (Bignell et al.,145 2007). The same chromosomal abnormality was also detectable using ChIP-Seq Input data (Fig. 146  5b), which confirmed that the gene expression changes were the result of a duplication-deletion 147 rearrangement on the genomic level. Compared to the genomic analysis using ChIP-Seq Input 148 data, RNA-Seq data yields a lower resolution because the predictions are dependent on the 149 gene-density per chromosome, which is rather sparse considering that only 62% of the genome is 150 transcribed, and an even smaller fraction of this corresponds to coding exons (5.5%) 151 (Consortium, 2012). Thus, low-coverage genomic sequencing data (e.g. ChIP-Seq) allows a more 152 detailed analysis of chromosomal abnormalities with higher precision and confidence, but with 153 RNA-Seq data it is possible to make similar predictions especially in the case of large 154 chromosomal abnormalities.    reliably identify CRISPR-edited clones. By direct comparison with the restriction digest method 167 (Ran et al., 2013), Mismatch-qPCR proved to be a faster screening method and did not require 168 the insertion of a restriction site into the genome. The read-out can be observed during the 169 qPCR reaction without requiring subsequent analysis steps. Sanger sequencing was required to 170 exclude false positive clones and to confirm the precise genotype of the clonal cell lines.

171
Nevertheless, sequencing of few candidate clones after screening by Mismatch-qPCR was more 172 economical than to sequence all generated clonal cell lines. The limitation of this approach is 173 certainly the requirement of suitable primer pairs for screening. Dependent on the DNA sequence 174 and GC-content of the targeted locus, it is not always possible to design primers that fall into 175 the recommended property range (e.g. melting temperature), which is predicted to result in less 176 efficient PCR amplification. However, the read-out of Mismatch-qPCR is qualitative and not 177 quantitative and should not necessarily be compromised by less efficient primers.

178
Following the generation on CRISPR-edited clones, we wanted to confirm that their genomic  Guides were designed with homology to a sequence close to the mutation site of interest using 212 MIT's Optimized CRISPR design tool. As a general guideline, the guide binding site should 213 ideally be less than 30 nucleotides away from the mutation site of interest, and can also overlap 214 the mutation site. If the mutation site is close to an intron, it is recommendend to use an 215 intronic guide sequence in case additional indels occur at the CRISPR cutting site, but this is 216 optional. Guide sequences with an aggregate score of greater than 50% were selected and cloned 217 into pSpCas9(BB)-2A-GFP (PX458, Addgene) or pSpCas9(BB)-2A-RFP (modified from PX458) 218 according to instructions by Ran et al. (2013). For this purpose, phosphorylated DNA oligos 219 (5'-Phos) were ordered from Eurofins according to this scheme: 220 CACC + G + guide sequence forward 221 AAAC + guide sequence reverse + C 222 pSpCas9(BB)-2A-GFP was digested with BbsI, followed by dephosphorylation using Antarctica 223 Phosphatase (NEB) and separated from undigested plasmid by 1% agarose gel electrophoresis. 224 Digested plasmid was extracted from the gel (Gel extraction Kit, Qiagen). Complementary 225 guide oligos were annealed and cloned into pSpCas9(BB)-2A-GFP/RFP plasmid, from here on 226 referred to as Cas9-GFP-guide or Cas9-RFP-guide plasmid.

227
We used guides with the following sequences: For gene editing 2x10 6 ESCs were transfected with 2 µg Cas9-GFP-guide plasmid and 5 µl of nucleofection to promote gene editing (Chu et al., 2015;Maruyama et al., 2015;Yu et al., 2015). Growing ESC colonies were dispersed 6 days after sorting by trypsinization. After disperal, ESC 260 media was changed every day in wells with growing ESC colonies. Dispersed clones were split 9 261 days after sorting into 2 replicate 96-well plates. One plate was used for freezing in 262 DMSO-containing medium and the second was used for lysis and screening of clones. To detect 263 gene editing events, clonal cell lines were lysed directly in one of the replicate 96-well plates. For 264 cell lysis medium was removed from wells and 70 µl of lysis buffer were added to each well. CRISPR off-target effects in the form of chromosomal duplications/deletions were ruled out by 306 RNA-sequencing. RNAs were extracted from approx. 1x10 6 cells using RNeasy Kit (Qiagen), 307 followed by DNase digestion using TURBO DNase (Thermo Fisher). mRNAs were isolated from 308 1 µg of total RNA using a PolyA selection kit (NEB) and sequencing libraries were prepared 309 following instructions from NEBs Ultra Library Preparation Kit for Illumina. All samples were 310 barcoded, pooled and sequenced on a HiSeq2000 Sequencer (Illumina) using a 50 bp single-end 311 run. Sequencing reads were mapped to mouse reference genome (mm10 assembly) using Tophat2 312 aligner with default settings for single-end reads. Reads per gene were counted using 313 HTSeqCount union or intersection nonempty mode. We used Ensembl gene annotation 314 12/16 Mus musculus.GRCm38.83. Differential RNA-Seq analysis between each clone and wild type 315 cells was performed using DESeq2 package (Bioconductor, (Love et al., 2014)) to obtain 316 log2(FoldChanges) per gene. Genomic coordinates per gene were obtained using Biomart 317 (Bioconductor) and log2(FoldChanges) were plotted over chromosome position to obtain 318 distribution profiles of gene expression changes. Cell lines that displayed deletions or 319 duplications of chromosome regions, as seen by concomitant up-or down-regulation of close-by 320 genes, were discarded and not used for analysis.

321
CRISPR off-target analysis by RNA-Seq analysis. Purified DNA was fragmented into 322 500 bp fragments by sonication using Bioruptor Pico (Diagenode). Sequencing libraries were 323 prepared using DNA Ultra II library preparation kit (NEB) according to the manufacturer's 324 instructions and sequenced either on Illumina's HiSeq2000 Sequencer (50 bp single-end mode) or 325 NextSeq 500 Sequencer (75 bp single-end mode). Sequencing reads were aligned to mouse 326 reference genome (mm10 assembly) using Bowtie2 (Langmead and Salzberg, 2012). Only 327 non-duplicated, uniquely mapped reads were retained for further analysis.