Abstract
Motivation Sizeable research has been conducted to facilitate the usage of CRISPR-Cas systems in genome editing, in which deep learning-based methods among others have shown great promise in the prediction of the gRNA efficiency. An accurate prediction of gRNA efficiency helps practitioners optimize their engineered gRNAs, maximizing the on-target efficiency, and minimizing the off-target effects. However, the black box prediction of deep learning-based methods does not provide adequate explanation to the factors that make a sequence efficient; rectifying this issue can help promote the usage of CRISPR-Cas systems in numerous domains.
Results We put forward a framework for interpreting gRNA efficiency prediction, dubbed CRISPR-VAE, and apply it to CRISPR/Cpf1. We thus help open the door to a better interpretability of the factors that make a certain gRNA efficient. We further lay out a semantic articulation of such factors into position-wise k-mer rules. The paradigm consists of building an efficiency-aware gRNA sequence generator trained on available real data, and using it to generate a large amount of synthetic sequences with favorable traits, upon which the explanation of the gRNA prediction is based. CRISPR-VAE can further be used as a standalone sequence generator, where the user has access to a low-level editing control. The framework can be readily integrated with different CRISPR-Cas tools and datasets, and its efficacy is confirmed in this paper.
Availability and implementation The source code will be shared publicly upon acceptance.
Contact ahmad.obeid{at}ku.ac.ae
Index Terms CRISPR, Explainable deep learning
Competing Interest Statement
The authors have declared no competing interest.