Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

FORCAST: a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments

View ORCID ProfileHillary Elrick, View ORCID ProfileViswateja Nelakuditi, Greg Clark, Michael Brudno, Arun K. Ramani, View ORCID ProfileLauryl M.J. Nutter
doi: https://doi.org/10.1101/2020.04.21.053090
Hillary Elrick
1Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
2The Centre for Phenogenomics, The Hospital for Sick Children, Toronto, ON M5T 3H7, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Hillary Elrick
Viswateja Nelakuditi
1Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Viswateja Nelakuditi
Greg Clark
2The Centre for Phenogenomics, The Hospital for Sick Children, Toronto, ON M5T 3H7, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Brudno
1Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
3Department of Computer Science, University of Toronto, Toronto, ON M5T 3A1, Canada
4University Health Network, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arun K. Ramani
1Centre for Computational Medicine, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: lauryl.nutter@sickkids.ca
Lauryl M.J. Nutter
2The Centre for Phenogenomics, The Hospital for Sick Children, Toronto, ON M5T 3H7, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lauryl M.J. Nutter
  • For correspondence: lauryl.nutter@sickkids.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Cas-mediated genome editing has enabled researchers to perform mutagenesis experiments with relative ease. Effective genome editing requires tools for guide RNA selection, off-target prediction, and genotyping assay design. While independent tools exist for these functions, there is still a need for a comprehensive platform to design, view, evaluate, store, and catalogue guides and their associated primers. The Finding Optimizing and Reporting Cas Targets (FORCAST) application integrates existing open source tools such as JBrowse, Primer3, BLAST, bwa, and Silica to create a complete allele design and quality assurance pipeline. FORCAST is a fully integrated software that allows researchers performing Cas-mediated genome editing to generate, visualize, store, and share information related to guides and their associated experimental parameters. It is available from a public GitHub repository and as a Docker image, for ease of installation and portability.

With the advent of Cas-mediated genome editing, a wide range of tools have been made available to aid the experimental design process. This includes CRISPOR1, GuideScan2, and CHOPCHOP3 to design and evaluate guides, Cas-OFFinder4 to predict off target sites, Benchling (https://benchling.com) to generate and store guides, and the original MIT website for scoring guides5 (now offline). While these tools assist Cas-mediated experimental design, they are each tailored to individual parts of this process. Thus, there exists a need for a single free, versatile, and fully integrated software that allows researchers performing Cas-mediated genome editing to generate, visualize, store, and share information related to guides and their associated experimental parameters.

We developed the open-source tool Finding Optimizing and Reporting Cas Targets (FORCAST) to provide such an integrated functionality. FORCAST is available for use with any organism and utilizes rigorous criteria to generate and rank guides, perform quality assurance of alleles, and design genotyping primers. By maintaining an internal database, FORCAST allows users to save and retrieve existing design details as required. Users can also mark existing guides and primers that are no longer being used, based on the results obtained from in silico, in vitro, or in vivo experiments. Specificity scores, predicted off-target sites, and genomic context can then be used to make an informed decision for primer and guide redesign of a particular target.

All information saved in FORCAST is stored only on local infrastructure, which provides full control of data and eliminates security and privacy concerns. Additionally, the use of Docker allows for deployment in a cloud environment and the leverage of high performance computing when available. Thus, FORCAST acts as a shared resource within a laboratory to prevent duplication of effort and facilitate coordination of Cas-mediated genome editing experiments.

Results

FORCAST has been used to design and genotype over 176 successful Cas-mediated in vivo gene knockout experiments in mice, at the time of writing. It has been extensively tested by model production teams for bugs, and features have been added to improve the workflow. A typical design workflow using FORCAST is described in Figure 1.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1. Performing a Cas-mediated experiment with FORCAST.

a) Any Ensembl genome can be selected for use in the tool. b) An interactive genome browser with default tracks (Genes, Transcripts, Regulatory Elements) as well as custom tracks such as Exon Splicing Enhancers allows users to explore their region of interest. c) Users refine their guide search by selecting an RNA-guided endonuclease (RGEN), protospacer length, and types of potential off-target sites to consider. d) All guides in the search region are displayed in a ranked table with their available scores and potential off-target sites. Saved guides are displayed in the Genome Browser with a user-defined label and notes. e) Genotyping primers can be designed for the wild-type (WT) and endonuclease-mediated (EM) alleles. Each potential primer is checked for specificity against the genome and quality assurance is performed on guides to ensure only transcripts of the selected gene are affected within the edited region. f) Design details can be exported in various formats. In the case of a design failure, the experiment can be revisited in FORCAST and existing guides or primers rejected and re-designed.

Though Streptococcus pyogenes Cas9 (SpCas9)6 remains the most commonly used system for genome editing, FORCAST enables researchers to use other RNA-guided endonucleases (RGENs) which provide advantages including reduced size7, increased number of target-able sites8, and generation of staggered cuts9,10, expanding the array of RGEN-mediated genome edits scientists are able to make. FORCAST comes preloaded with the following RGENs: Streptococcus pyogenes Cas9 (SpCas9), Acidaminococcus Cas12a (AsCpf1/Cas12a), Streptococcus canis Cas9 (ScCas9), and Staphylococcus aureus Cas9 (SaCas9). Relevant information about these RGENs, described in Table 1 below, is loaded into the application at setup. New RGENs can be easily added to the database, and the default specifications can be modified as needed. Novel RGENs with genome editing capabilities are being discovered at a rapid pace, and FORCAST was designed so that researchers can quickly use these new technologies as they emerge.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1: RNA-guided endonuclease (RGEN) attributes.

Specific attributes for common RGENs loaded into FORCAST include protospacer adjacent motifs (PAMs), sequence length of the guide RNA, RGEN cleavage site, spacer seed region, known non-canonical PAMs, and any available implemented scores.

Performance

Benchmark tests were performed to compare FORCAST’s guide searching speed and accuracy to CRISPOR, GuideScan, and Cas-OFFinder, three frequently used tools in guide design (Table 2). Command-line versions of each tool were tested by an automated program (Supplementary Data) on a server with 8GB of RAM using randomly selected input search sequences.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2: Benchmarking of FORCAST, CRISPOR, GuideScan, and Cas-OFFinder.

FORCAST is significantly faster than Cas-OFFinder and returns many more off-targets sites than GuideScan. It is comparable in speed to CRISPOR while still returning more off-target sites. Accurately reporting the number of potential off-target sites is essential for reducing the risk of undesired edits, calculating scores, and performing quality control on resulting organisms. Though GuideScan has the fastest running time, it is limited to showing only off-target sites with three mismatches in the genome. CasOFFinder was set to return off-targets with four mismatches for this test, though it allows for up to nine. FORCAST and CRISPOR return off-targets with up to four mismatches, as off-target sites with up to four mismatches have been shown to produce undesired edits16. Furthermore, FORCAST reports potential off-targets adjacent to non-canonical PAMs (NAG, NCG, and NGA for SpCas9), with the option to modify this list. To increase speed, FORCAST processes and displays a maximum of 1000 potential off-target sites for a given guide, and skips scoring guides in repetitive regions by default. However, these restrictions can be disabled by users to display a full list of guides and their off-targets for a given region. With these options, FORCAST allows users to decide whether to prioritize speed or completeness when searching for and evaluating guides and off-targets.

Furthermore, we tested FORCAST in a region of the genome (chr19:10907072-10907187) that several tools were reported to erroneously suggest guides with a high number of potential off-target sites17. Rather than rejecting these guides outright, FORCAST displays a warning about the high number of mismatches and reports them at the bottom of the ranked results table.

Discussion

Advantages

FORCAST is available as an open source stand-alone application (see Availability), which provides several benefits over publicly accessible web or cloud-based tools, such as security, privacy, and long-term data storage and integrity. Data saved by the tool is stored only on local or owned cloud infrastructure, giving organizations full control of their data including the ability to backup, export and share experiment details. These qualities make FORCAST ideal for use in a Core Facility, where standard protocols and controlled access to data is essential. FORCAST can also be used with Docker, making it suitable for non-technical users to run on a personal computer with minimal setup.

Additionally, we recognized that laboratories have specific needs and protocols with regard to experimental design and validation. Flexibility was kept in mind during the development of FORCAST, allowing researchers to use specific genome versions, modify available RGENs, include additional annotation data, and define custom primer design settings. This makes FORCAST an incredibly flexible tool that can be used to aid in the design of Cas-mediated genome editing experiments across various fields of biology.

Future Directions

FORCAST is under active development and planned features include adding the ability to design conditional alleles, mutations in non-coding genes, and point mutations (variants). Additional goals include incorporating genomic variant information associated with a reference genome, and incorporating new scoring methods, including specificity scores for AsCas12a.

Methods

Implementation

Several open-source tools are integrated into FORCAST; these tools were chosen for their demonstrated reliability, accuracy, and ease of use. JBrowse18 is used to visualize genomic features such as genes, transcripts, regulatory information, guides, and primers. BWA19, BEDtools20, and SAMtools21 are used to find guides and their potential off-target sites. The published MIT5 and Cutting Frequency Determination22 (CFD) scores are used to evaluate guide specificity. While Primer323 is used to generate PCR primers for genotyping, BLAST24 and Silica (https://www.gear-genomics.com/silica) are used to evaluate primer specificity. FORCAST is written in Python and uses MongoDB to store genes, guide RNA spacer sequences with their associated RGENs, and PCR primers for quality control and genotyping.

Detailed installation and setup instructions for FORCAST are described in the GitHub repository (see Availability). Briefly, a shell script installs all required tools and programs, after which users can populate FORCAST with their genomes of interest using the included Python setup script. During setup, the genome sequence (in FASTA format) and genomic annotation (in GFF3 format) files are downloaded programmatically from the Ensembl FTP site. Users can specify the version of Ensembl release to use; if this isn’t specified, the latest version is used. A BED file categorizing the genome into intergenic, intronic, and exonic regions is generated from the annotation file and gene symbols, identifiers, and chromosomal locations are extracted and stored in the MongoDB. Genome sequence and annotation files are then loaded into JBrowse. Additionally, BLAST and BWA indexes are built at setup, for which we recommend at least 8GB of RAM.

Availability

Project home page: https://ccmbioinfo.github.io/FORCAST

Demo: https://youtu.be/SJMDAuJRuDI

Operating systems(s): Host machine must be Docker-compatible (most Linux distributions, MacOS 10.12 and higher, Windows 10) or run Ubuntu 16.04 to host natively. Web-interface is operating system-independent, tested on Chrome, Firefox, and Opera.

Programming languages: Python, JavaScript, bash

Other requirements: Recommended that host machine has at minimum 8GB of RAM License: GPLv3 License

Acknowledgements

Thanks to Lauri Lintott at The Centre for Phenogenomics for beta testing and feedback as well as beta testers Denise Lanza, Jason Heaney, Juan Gallegos, Kiran Rajaya, and Vivek Ramanathan at Baylor College of Medicine. Thanks to Mia Husic for constructive feedback and proofreading. Thank you also to Tobias Rausch for assistance integrating the in silico PCR tool Silica. This work was funded by Genome Canada and Ontario Genomics (OGI-137) and supported by the Canadian Centre for Computational Genomics (C3G), part of the Genome Innovation Network (GIN), funded by Genome Canada through Genome Quebec and Ontario Genomics.

Footnotes

  • https://github.com/ccmbioinfo/FORCAST

  • https://ccmbioinfo.github.io/FORCAST/

References

  1. 1.↵
    Concordet, J.-P., & Haeussler, M. (2018). CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Research, 46(W1). doi: 10.1093/nar/gky354
    OpenUrlCrossRef
  2. 2.↵
    Perez, A. R., Pritykin, Y., Vidigal, J. A., Chhangawala, S., Zamparo, L., Leslie, C. S., & Ventura, A. (2017). GuideScan software for improved single and paired CRISPR guide RNA design. Nature Biotechnology, 35(4), 347–349. doi: 10.1038/nbt.3804
    OpenUrlCrossRefPubMed
  3. 3.↵
    Labun, K., Montague, T. G., Krause, M., Torres Cleuren, Y. N., Tjeldnes, H., & Valen, E. (2019). CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Research, 47(W1). doi: 10.1093/nar/gkz365
    OpenUrlCrossRef
  4. 4.↵
    Bae, S., Park, J., & Kim, J.-S. (2014). Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics, 30(10), 1473–1475. doi: 10.1093/bioinformatics/btu048
    OpenUrlCrossRefPubMedWeb of Science
  5. 5.↵
    Hsu, P. D., Scott, D. A., Weinstein, J. A., Ran, F. A., Konermann, S., Agarwala, V., … Zhang, F. (2013). DNA targeting specificity of RNA-guided Cas9 nucleases. Nature Biotechnology, 31(9), 827–832. doi: 10.1038/nbt.2647
    OpenUrlCrossRefPubMed
  6. 6.↵
    Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J. A., & Charpentier, E. (2012). A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science, 337(6096), 816–821. doi: 10.1126/science.1225829
    OpenUrlAbstract/FREE Full Text
  7. 7.↵
    Ran, F. A., et al. (2015). In vivo genome editing using Staphylococcus aureus Cas9. Nature, 520(7546), 186–191.
    OpenUrlCrossRefPubMed
  8. 8.↵
    Chatterjee, P., et al. (2018). Minimal PAM specificity of a highly similar SpCas9 ortholog. Science Advances, 4(10).
  9. 9.↵
    Zetsche, B., et al. (2015). Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell, 163(3), 759–771.
    OpenUrlCrossRefPubMed
  10. 10.↵
    Yamano, T., et al. (2017). Structural Basis for the Canonical and Non-canonical PAM Recognition by CRISPR-Cpf1. Molecular Cell, 67(4).
  11. 11.
    Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M., & Joung, J. K. (2014). Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nature Biotechnology, 32(3), 279–284. doi: 10.1038/nbt.2808
    OpenUrlCrossRefPubMed
  12. 12.
    Chen, H., Choi, J., & Bailey, S. (2014). Cut Site Selection by the Two Nuclease Domains of the Cas9 RNA-guided Endonuclease. Journal of Biological Chemistry, 289(19), 13284–13294. doi: 10.1074/jbc.m113.539726
    OpenUrlAbstract/FREE Full Text
  13. 13.
    Jiang, W., Bikard, D., Cox, D., Zhang, F., & Marraffini, L. A. (2013). RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature Biotechnology, 31(3), 233–239. doi: 10.1038/nbt.2508
    OpenUrlCrossRefPubMed
  14. 14.
    Semenova, E., Jore, M. M., Datsenko, K. A., Semenova, A., Westra, E. R., Wanner, B., … Severinov, K. (2011). Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proceedings of the National Academy of Sciences of the United States of America, 108(25), 10098–10103. doi:10.1073/pnas.1104144108
    OpenUrlAbstract/FREE Full Text
  15. 15.
    Kleinstiver, B. P., Tsai, S. Q., Prew, M. S., Nguyen, N. T., Welch, M. M., Lopez, J. M., … Joung, J. K. (2016). Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nature Biotechnology, 34(8), 869–874. doi: 10.1038/nbt.3620
    OpenUrlCrossRefPubMed
  16. 16.↵
    Haeussler, M., Schönig, K., Eckert, H., Eschstruth, A., Mianné, J., Renaud, J.-B., … Concordet, J.-P. (2016). Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biology, 17(1). doi: 10.1186/s13059-016-1012-2
    OpenUrlCrossRefPubMed
  17. 17.↵
    Bradford, J., & Perrin, D. (2019). A benchmark of computational CRISPR-Cas9 guide design methods. PLOS Computational Biology, 15(8). doi: 10.1371/journal.pcbi.1007274
    OpenUrlCrossRef
  18. 18.↵
    Buels, R., Yao, E., Diesh, C. M., Hayes, R. D., Munoz-Torres, M., Helt, G., … Holmes, I. H. (2016). JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology, 17(1). doi: 10.1186/s13059-016-0924-1
    OpenUrlCrossRefPubMed
  19. 19.↵
    Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760. doi: 10.1093/bioinformatics/btp324
    OpenUrlCrossRefPubMedWeb of Science
  20. 20.↵
    Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841–842. doi: 10.1093/bioinformatics/btq033
    OpenUrlCrossRefPubMedWeb of Science
  21. 21.↵
    Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. doi: 10.1093/bioinformatics/btp352
    OpenUrlCrossRefPubMedWeb of Science
  22. 22.↵
    Doench, J. G., Fusi, N., Sullender, M., Hegde, M., Vaimberg, E. W., Donovan, K. F., … Root, D. E. (2016). Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology, 34(2), 184–191. doi: 10.1038/nbt.3437
    OpenUrlCrossRefPubMed
  23. 23.↵
    Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B. C., Remm, M., & Rozen, S. G. (2012). Primer3—new capabilities and interfaces. Nucleic Acids Research, 40(15). doi: 10.1093/nar/gks596
    OpenUrlCrossRefPubMed
  24. 24.↵
    Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). BLAST : architecture and applications. BMC Bioinformatics, 10(1), 421. doi: 10.1186/1471-2105-10-421
    OpenUrlCrossRefPubMed
Back to top
PreviousNext
Posted April 23, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
FORCAST: a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
FORCAST: a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments
Hillary Elrick, Viswateja Nelakuditi, Greg Clark, Michael Brudno, Arun K. Ramani, Lauryl M.J. Nutter
bioRxiv 2020.04.21.053090; doi: https://doi.org/10.1101/2020.04.21.053090
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
FORCAST: a fully integrated and open source pipeline to design Cas-mediated mutagenesis experiments
Hillary Elrick, Viswateja Nelakuditi, Greg Clark, Michael Brudno, Arun K. Ramani, Lauryl M.J. Nutter
bioRxiv 2020.04.21.053090; doi: https://doi.org/10.1101/2020.04.21.053090

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4239)
  • Biochemistry (9172)
  • Bioengineering (6804)
  • Bioinformatics (24064)
  • Biophysics (12155)
  • Cancer Biology (9564)
  • Cell Biology (13825)
  • Clinical Trials (138)
  • Developmental Biology (7658)
  • Ecology (11737)
  • Epidemiology (2066)
  • Evolutionary Biology (15541)
  • Genetics (10672)
  • Genomics (14359)
  • Immunology (9511)
  • Microbiology (22901)
  • Molecular Biology (9129)
  • Neuroscience (49113)
  • Paleontology (357)
  • Pathology (1487)
  • Pharmacology and Toxicology (2583)
  • Physiology (3851)
  • Plant Biology (8351)
  • Scientific Communication and Education (1473)
  • Synthetic Biology (2301)
  • Systems Biology (6205)
  • Zoology (1302)