Abstract
CRISPR-Cas enzymes must recognize a protospacer-adjacent motif (PAM) to edit a genomic site, significantly limiting the range of targetable sequences in a genome. Machine learning-based protein engineering provides a powerful solution to efficiently generate Cas protein variants tailored to recognize specific PAMs. Here, we present Protein2PAM, an evolution-informed deep learning model trained on a dataset of over 45,000 CRISPR-Cas PAMs. Protein2PAM rapidly and accurately predicts PAM specificity directly from Cas proteins across Type I, II, and V CRISPR-Cas systems. Using in silico deep mutational scanning, we demonstrate that the model can identify residues critical for PAM recognition in Cas9 without utilizing structural information. As a proof of concept for protein engineering, we employ Protein2PAM to computationally evolve Nme1Cas9, generating variants with broadened PAM recognition and up to a 50-fold increase in PAM cleavage rates compared to the wild-type under in vitro conditions. This work represents the first successful application of machine learning to achieve customization of Cas enzymes for alternate PAM recognition, paving the way for personalized genome editing.
Competing Interest Statement
S.N., A.B., A.N., G.O.E., E.H., J.A.R., J.G., A.J.M., P.C., and A.M. are current or former employees, contractors, or executives of Profluent Bio Inc and may hold shares in Profluent Bio Inc. R.A.S. and B.P.K. are inventors on patents or patent applications filed by Mass General Brigham (MGB) that describe HT-PAMDA or genome engineering technologies related to the current study. B.P.K. is a consultant for Novartis Venture Fund, Foresite Labs, Generation Bio, and Jumble Therapeutics, and is on the scientific advisory boards of Acrigen Biosciences, Life Edit Therapeutics, and Prime Medicine. B.P.K. has a financial interest in Prime Medicine, Inc. B.P.K.'s interests were reviewed and are managed by MGH and MGB in accordance with their conflict-of-interest policies.