Abstract
The dCasMINI protein is a hyper-compact, nuclease-inactivated CRISPR-Cas system engineered for transcriptional modulation and epigenetic editing [Xu et al., 2021]. The small size of dCas-Mini (529 amino acids), less than half the size of comparable Cas9 molecules, makes it ideal for AAV-based therapies which are frequently limited by AAV’s small cargo capacity. Unlike Cas9 or Cas12a, there is no available computational tools for designing CasMINI guides. To facilitate and accelerate the development of dCasMINI-based applications, we synthesized knowledge regarding CasMINI guide design and built a website to assist researchers in designing optimal guides for dCasMINI-based experiments for transcriptional inhibition (CRISPRi) and activation (CRISPRa), which covers 99.7% of genes for CRISPRi and 99.9% of genes for CRISPRa. We experimentally characterized the importance of each nucleotide position on the guide RNA for determining its activity. Based on this information,, our tool offers more sensitively mapping off-targets and provides information about alignment mismatches in the spacer seed region, which we have experimentally determined to be critical for true binding events. The tool is freely available at casmini-tool.com.
The CRISPR-Cas revolution is upon us. Advances in CRISPR-Cas9 based therapeutics have resulted in transformational therapies for β-thalassemia [Frangoul et al., 2020], sickle cell disease [De Dreuzy et al., 2019], B-cell lymphoma [McGuirk et al., 2022], non-Hodgkin lymphoma [O’Brien et al., 2022], hereditary transthyretin amyloidosis [Gillmore et al., 2021], among others. However, the large size of the Cas9 molecule presents challenges for therapeutic delivery. Its large size (in the range of 3-4kb) prohibits its use of adeno-associated virus (AAV) delivery, which has a packaging size below 4.7kb [Wu et al., 2010]. As a consequence, the vast majority of current CRISPR-Cas9 therapies are restricted to ex-vivo or lipid nanoparticle-based delivery modalities, severely limiting their general application. The dCasMINI molecule [Xu et al., 2021] is extremely small and ideal for AAV-based delivery, as its small size of 529 amino acids (1587bp) allows researchers to package the dCasMINI DNA sequence, guide RNA, associated promoter sequences, and modulator peptides capable of gene regulation into a standard AAV vector with a maximum cargo size of 4.7kb. Furthermore, traditional Cas9-based therapeutics are limited to the treatment of diseases that are ameliorated by gene knockouts and are thus unsuitable for a whole host of genetic diseases such as those caused by haploinsufficiency. On the other hand, dCasMINI has an inactivated nuclease domain and is thus capable of upregulating or downregulating a genetic locus when tethered to the appropriate modulator peptides. Beyond its smaller size and modulatory versatility, recent research has additionally suggested that dCasMINI has a lower incidence of off-targets than Cas9 or Cas12a [Xin et al., 2022], making it an attractive Cas molecule candidate for therapeutic applications. To assist researchers in the applications of dCasMINI-based tools, we have developed a web-based database to rapidly search for optimal dCasMINI spacer sequences for CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) at a desired genetic locus (Figure 1). To improve upon pre-existing tools such as CHOPCHOP [Labun et al., 2019] or Casilico [Asadbeigi et al., 2022], we performed extensive computational mapping of each potential spacer sequence to extend the search for off-targets, and provided the full off-target information for each guide so that researchers can determine what type and how many off-targets they can tolerate and/or test. To further characterize computationally predicted off-targets, we experimentally determined the guide seed region (high fidelity region at 5’ of spacer that is intolerant to mismatches) for dCasMINI and annotate potential off-targets with information about seed region mismatches. This will allow researchers to increase the potential search space of guides by including guides with computationally predicted off-targets that have mismatches in the seed region (and thus unlikely to result in a true binding event). We expect that our casmini-tool and website portal will greatly simplify and accelerate the guide design workflow for researchers carrying out experiments using the dCasMINI platform.
Example workflow for designing guides targeting the Rett syndrome associated gene MECP2 for CRISPRa.
Methods
Computational identification of on-target gRNA sequences
To identify on-target guide sequences for CRISPRi/a genome-wide, we first extracted the primary transcription start site (TSS) for each gene using the hg38 FANTOM5 database, a repository of genome-wide TSSs defined from human CAGE-seq data [fan, 2014]. Based on in-house experiments [data not shown], we defined the optimal targeting region for each regulatory modality as follows:
CRISPRi: -200bp:+1000bp from the TSS;
CRISPRa: -1000bp:+200bp from the TSS.
Nucleotide sequences for each of these targeting regions genome-wide were extracted using the pybedtools command getfasta [Dale et al., 2011]. We then identified all dCasMINI PAMs (TTTA/TTTG) in these sequences and extracted the 20 nucleotides directly downstream of each PAM to generate the list of on-target spacer sequences.
Computational mapping of putative off-targets
We sensitively mapped each 20bp on-target spacer sequence to the hg38 genome with Bowtie2 [Langmead and Salzberg, 2012], tolerating up to 3 mismatches anywhere in the spacer sequence and a maximum of twenty thousand alternative mapping locations, using the following command:
For each putative off-target site, we queried 5bp upstream and discarded any alignments that did not contain an appropriate PAM in the upstream region (we chose 5bp to allow for a single nucleotide bulge between the 4bp PAM and the spacer sequence). Given our experimental data on the importance of the 6bp seed region (see below), we further verified whether any mismatches were present in the first 6bp of the 20bp spacer alignment and annotated each off-target with this information.
Experimental determination of the dCasMINI seed region
To identify the high-fidelity seed region of dCasMINI spacer sequences, we synthesized a panel of gRNA-encoding gene fragments where each individual nucleotide in the spacer sequence targeting CD2 was mutated to all three other nucleotides (“single-mismatch”) or deleted (“single-deletion”). Each gRNA variant was cloned into the gRNA plasmid backbone downstream of mU6 promoter. HEK293T cells were co-transfected with individual gRNA variant plasmids and our proprietary dCasMINI CRISPRa plasmid [Carosso et al., 2023] as triplicates in a 96-well plate format. Three days post-transfection, CD2 target gene activation was quantified by cell surface antibody staining of live cells using APC anti-human CD2 antibody (Biolegend, 309224) followed by flow cytometry on the cell population transfected with both gRNA and dCasMINI plasmids (performed on Cytoflex and analyzed using Flowjo software). The level of gene activation by each gRNA variant was compared and normalized to the activity of the wild-type gRNA as a control. The seed region in the spacer with minimal tolerance on single mismatch/deletion was defined with a cutoff of normalized activity below 0.2.
Results
Case study: generating CRISPRa guides against MECP2
As a case study, we queried our website tool for CRISPRa guides against MECP2, a gene frequently mutated in Rett syndrome, a severe and progressive X-linked neurodevelopmental disorder (Figure 1) [Lamonica et al., 2017]. Querying “Target Gene” = MECP2 and “Effect” = Activation on the website portal generated a list of on-target guide sequences situated -1000bp:+200bp around the TSS (since that is the appropriate targeting window for transcriptional activation) (Figure 2). On-target spacer sequences are annotated with chromosomal position and strand, mismatch count, edit distance, distance to the TSS, and off-target count. These guides are ranked in descending order based on number of off-targets to prioritize the more therapeutically relevant guide sequences. When interested in a specific spacer sequence with off-targets, the user can further query information about putative off-target binding sites by following the “Query Off Targets” search button associated with the on-target guide (Figure 1).
IGV genome browser visualization of CRISPRa guides targeting MECP2 annotated with number of potential off-targets per guide.
A comprehensive database for computational identification of on-target gRNA sequences
Our tool provides on-target dCasMINI guide sequences for 22,498 human genes suitable for CRISPRi and/or CRISPRa (based on position relative to the primary TSS). In total, we identified 420,024 spacer sequences suitable for CRISPRi targeting 22,429/22,498 = 99.7% of genes and 477,655 spacer sequences for CRISPRa targeting 22,474/22,498 = 99.9% of genes (96,738 spacer sequences were situated +/-200bp around the TSS and thus appropriate for both CRISPRi and CRISPRa). For CRISPRi, genes have an average of 19 on-target spacer sequences suitable for transcriptional suppression (Figure 3). For CRISPRa, genes have an average of 21 on-target spacer sequences suitable for transcriptional activation (Figure 4).
Histogram displaying number of on-target CRISPRi guides per gene genome-wide.
Histogram displaying number of on-target CRISPRa guides per gene genome-wide.
dCasMINI seed region experiment
To computationally predict off-targets, we first investigated the tolerance of dCasMINI on mismatched target sites and characterize the high-fidelity seed region on its spacer sequence [Slaymaker et al., 2016], we systematically mutated the CD2 guide spacer sequence to introduce single-base mismatches and single-base deletions at different positions, and then measured the tolerance of dCasMINI-modulator fusion protein against these spacer variants in the context of CRISPRa. We identified a conserved seed sequence in the +1 to +6 range where such single mismatches/deletions are minimally tolerated by dCasMINI, irrespective of the modulator used (Figure 5).
Heatmap illustrating single mismatch/deletion tolerance of dCasMini spacer sequences targeting endogenous CD2 for transcriptional activation, 3 day post transfection. The heatmap was generated by normalizing CD2 expression with a given single mismatch/deletion spacer to the maximum expression level of CD2 with the WT gRNA sequence. Lower expression indicates that the mismatch disrupts
Computational mapping of putative off-targets
To provide comprehensive off-target information for CRISPRi/a guides, we used Bowtie2 to sensitively map each on-target spacer sequence genome-wide (tolerating up to 3 mismatches) and annotated potential off-targets with information about whether mismatches were present in the high-fidelity 6bp seed region of the spacer. We anticipate this will allow researchers to prioritize testing guides with minimal predicted off-target effects, which is critical for developing safe and efficacious therapeutic products. In total, we mapped 228,370,545 potential off-target sites associated with CRISPRi spacer sequences and 454,352,553 potential off-target sites associated with CRISPRa spacer sequences.
Ultimately we predict users will be most interested in selecting guides with a minimal number of potential off-target binding events. Analyzing the off-target database reveals that genes have an average of 8 on-target guides with ≤5 potential off-target sites for CRISPRi (Figure 6) as well as CRISPRa (Figure 7). However, many of these potential off-target sites (approximately 26%) contain one or more mismatches in the spacer high-fidelity seed region which are extremely unlikely to result in true off-target binding events (Figure 5). Incorporating these data on seed region fidelity allows us to effectively discard 1/4 computationally predicted off-target sites which have been experimentally demonstrated to be false positive predictions, thus increasing the number of potentially therapeutically relevant on-target guides to test.
Histogram displaying number of on-target CRISPRi guides with ≤5 potential off-target sites per gene genome-wide.
Histogram displaying number of on-target CRISPRa guides with ≤5 potential off-target sites per gene genome-wide.
Discussion
The simplicity and versatility of the CRISPR-Cas9 platform for genetic editing has increasingly resulted in its use as a therapy for the treatment of genetic diseases [Wang et al., 2016]. While extremely powerful, the traditional CRISPR-Cas9 platform suffers from a number of key drawbacks: its large size prevents it from being delivered in a single AAV and its nuclease ability renders it unsuitable for the treatment of a host of genetic disorders in which healthy genes are inactivated or more subtle downregulation is required. Since the discovery of Cas9 [Jinek et al., 2012], there has been an explosion of research into the diversification and optimization of other Cas molecules [Chavez et al., 2023]. Recently, a hyper-compact, nuclease-inactivated Cas molecule, termed dCasMINI, was engineered to be small enough for AAV delivery without compromising on-target efficacy or the incidence of off-target binding events making it an ideal Cas molecule candidate for the development of AAV-based therapies [Xu et al., 2021]. However, different Cas molecules are governed by different guide design principles making guide sequence design a frequent bottleneck in the dCasMINI workflow for CRISPRi/a.
Here we present a tool to easily design gRNAs against any human loci of interest for CRISPRi/a experiments, along with the first presentation of mismatch data in the high-fidelity seed region of dCasMINI gRNAs. We anticipate that this web-based tool will be of value to the CRISPR-Cas research community and will allow researchers to more easily and quickly design CRISPRi/a experiments with dCasMINI. We plan to add new functionality in future versions, so please look out for updates.
Authors’ Contributions
S.L. and R.W.Y. developed the backend database of on- and off-target guides. S.L. and N.J. built the frontend website. X.Y. developed and analysed the seed region experiments. D.O.H., L.S.Q, and T.P.D. supervised and directed this project. R.W.Y. and T.P.D. wrote the manuscript with input from all authors.
Author Disclosure Statement
L.S.Q. is the founder of Epicrispr Biotechnologies, and also serves as a scientific advisor for Laboratory of Genomics Research and Kytopen. S.L., R.W.Y., X.Y., D.O.H., L.S.Q., and T.P.D. hold provisional patents relating to this work, are employees of and acknowledge outside interest in Epicrispr Biotechnologies.
Acknowledgements
The authors would like to thank Yanxia Liu (Functional Genomics, EpiCRISPR Biotechnologies) for helpful conversation and reviewing the manuscript.
Footnotes
spencer.lopp{at}epic-bio.com
robin.yeo{at}epic-bio.com
me{at}nishantjha.org
xiao.yang{at}epic-bio.com
dan.hart{at}epic-bio.com
slqi{at}stanford.edu
tim.daley{at}epic-bio.com