Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction

View ORCID ProfileOfir Yaish, Maor Asif, View ORCID ProfileYaron Orenstein
doi: https://doi.org/10.1101/2021.09.30.462534
Ofir Yaish
aSchool of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ofir Yaish
Maor Asif
aSchool of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yaron Orenstein
aSchool of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yaron Orenstein
  • For correspondence: yaronore@bgu.ac.il
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

CRISPR/Cas9 system is widely used in a broad range of gene-editing applications. While this gene-editing technique is quite accurate in the target region, there may be many unplanned off-target edited sites. Consequently, a plethora of computational methods have been developed to predict off-target cleavage sites given a guide RNA and a reference genome. However, these methods are based on small-scale datasets (only tens to hundreds of off-target sites) produced by experimental techniques to detect off-target sites with a low signal-to-noise ratio. Recently, CHANGE-seq, a new in vitro experimental technique to detect off-target sites, was used to produce a dataset of unprecedented scale and quality (more than 200,000 off-target sites over 110 guide RNAs). In addition, the same study included GUIDE-seq experiments for 58 of the guide RNAs to produce in vivo measurements of off-target sites. Here, we fill the gap in previous computational methods by utilizing these data to perform a systematic evaluation of data processing and formulation of the CRISPR off-target site prediction problem. Our evaluations show that data transformation as a pre-processing phase is critical prior to model training. Moreover, we demonstrate the improvement gained by adding potential inactive off-target sites to the training datasets. Furthermore, our results point to the importance of adding the number of mismatches between the guide RNA and the off-target site as a feature. Finally, we present predictive off-target in vivo models based on transfer learning from in vitro. Our conclusions will be instrumental to any future development of an off-target predictor based on high-throughput datasets.

Competing Interest Statement

The authors have declared no competing interest.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted September 30, 2021.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction
Ofir Yaish, Maor Asif, Yaron Orenstein
bioRxiv 2021.09.30.462534; doi: https://doi.org/10.1101/2021.09.30.462534
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A systematic evaluation of data processing and problem formulation of CRISPR off-target site prediction
Ofir Yaish, Maor Asif, Yaron Orenstein
bioRxiv 2021.09.30.462534; doi: https://doi.org/10.1101/2021.09.30.462534

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4113)
  • Biochemistry (8815)
  • Bioengineering (6519)
  • Bioinformatics (23463)
  • Biophysics (11790)
  • Cancer Biology (9209)
  • Cell Biology (13323)
  • Clinical Trials (138)
  • Developmental Biology (7438)
  • Ecology (11410)
  • Epidemiology (2066)
  • Evolutionary Biology (15151)
  • Genetics (10436)
  • Genomics (14044)
  • Immunology (9171)
  • Microbiology (22154)
  • Molecular Biology (8812)
  • Neuroscience (47570)
  • Paleontology (350)
  • Pathology (1428)
  • Pharmacology and Toxicology (2491)
  • Physiology (3730)
  • Plant Biology (8080)
  • Scientific Communication and Education (1437)
  • Synthetic Biology (2221)
  • Systems Biology (6037)
  • Zoology (1253)