Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect modeling and data integration

CRISPR interference (CRISPRi), the targeting of a catalytically dead Cas protein to block transcription, is the leading technique to silence gene expression in bacteria. However, design rules for CRISPRi remain poorly defined, limiting predictable design for gene interrogation, pathway manipulation, and high-throughput screens. Here we develop a best-in-class prediction algorithm for guide silencing efficiency by systematically investigating factors influencing guide depletion in multiple genome-wide essentiality screens, with the surprising discovery that gene-specific features such as transcriptional activity substantially impact prediction of guide activity. Accounting for these features as part of algorithm development allowed us to develop a mixed-effect random forest regression model that provides better estimates of guide efficiency than existing methods, as demonstrated in an independent saturating screen. We further applied methods from explainable AI to extract interpretable design rules from the model, such as sequence preferences in the vicinity of the PAM distinct from those previously described for genome engineering applications. Our approach provides a blueprint for the development of predictive models for CRISPR technologies where only indirect measurements of guide activity are available.


Introduction
CRISPR interference (CRISPRi), in which a catalytically dead Cas protein incapable of DNA cleavage (dCas) is targeted to interfere with transcription of a gene of choice [1,2], is one of most widely used CRISPR technologies in bacteria. In contrast to eukaryotes, many bacteria lack the necessary repair pathways to survive genome editing by the double-stranded break induced by CRISPR-Cas9. Thus, CRISPR-Cas's main applications in bacteria have come from using dCas as a platform technology that can deliver effectors to a specific locus in a programmable fashion. CRISPRi is the simplest example, where the dCas protein itself serves as an effector to silence gene expression by physically blocking the binding or procession of the RNA polymerase.
The development of CRISPRi has opened up a range of biological applications, from silencing individual genes for genetic studies to performing genome-wide fitness screens or engineering genetic circuits [3,4]. As an alternative screening technology to transposon mutagenesis [5], CRISPRi can directly target particular genes of interest, avoiding the need for large mutant libraries to achieve gene saturation. Another area of application is engineering synthetic regulatory circuits [6] or metabolic networks [7,8], where collections of gRNAs are used to coordinately downregulate and upregulate associated genes and pathways. However, all of these applications critically depend on the efficiency of silencing provided by selected guides. Genetic screens already routinely employ tens of thousands of guides simultaneously, and it is impractical to individually test each guide's efficiency. This problem will only be accentuated as the scale of applications increases through the use of CRISPR array technologies that allows multiplexed expression of suites of guides simultaneously [9,10] to dissect and engineer increasingly complex phenotypes. Reliable prediction of guide efficiency will therefore become increasingly important as applications of CRISPRi become more ambitious.
Given the impact of CRISPR-based genome engineering in eukaryotes, significant effort has been expended in developing methods for predicting editing efficiency. The first attempts used classical machine learning methods on relatively small datasets comprising efficiency measurements for thousands of gRNAs. The applied methods include logistic regression [11], support vector machines [12,13], linear regression [14], and gradient-boosted decision trees [15]. As the amount of Cas9 editing data has increased, deep learning approaches have become increasingly popular. These approaches include convolutional neural networks [16], which apply a collection of adaptive filters to automatically extract local sequence features, and long short-term memory networks (LSTM) [17], which retain a memory that potentially allows for the detection of long-range interactions between sequence features. Newer methods have put substantial effort into engineering deep learning architectures to further boost performance [18]. It is important to note that many of these deep learning methods have been trained on tens of thousands of measurements of guide efficiency, and fusing datasets has played an important role in further increasing performance [19].
So far, relatively little attention has been paid to predicting guide efficiency for bacterial CRISPRi. The only study to date developed a LASSO regression model for predicting CRISPRi guide efficiency [20] with a limited sequence feature set using data from a single genome-wide CRISPRi screen in Escherichia coli [21], providing important proof-of-concept but also leading us to ask if larger datasets and more complex machine learning approach could further improve prediction. Here, starting with an investigation of the features driving depletion of guides targeting essential genes in CRISPRi screens using automated machine learning, we find that features associated with the targeted gene, such as expression, explain most of the variation in the data. Starting from this foundation, we develop a mixed-effect random forest that separates features affecting guide efficiency from effects due to the targeted gene while learning from multiple independent CRISPRi screens, allowing us to arrive at a predictive model of guide efficiency that we show improves on the state-of-the-art using a saturating depletion screen of purine biosynthesis genes during growth in minimal medium. Our mixed-effect machine learning approach provides a general strategy for learning CRISPRi guide efficiency when only indirect measurements are available.

Automated machine learning and feature engineering identifies gene-specific effects in CRISPRi depletion screens
We set out to devise design rules for CRISPRi in bacteria by combining machine learning with large experimental datasets. The largest available datasets come from genome-wide depletion screens which provide only indirect measurements of guide efficiency ( Figure   1A). We began our investigation into the features affecting guide depletion by applying automated machine learning (autoML) in the form of the Auto-Sklearn package [22] that wraps classification and regression models implemented in the Python Scikit-learn package [23] in a Bayesian optimization framework.
We first asked how well we could predict gRNA depletion log2 fold-changes (logFCs) for essential genes as defined by the Keio collection [24], and what features would be required for accurate prediction. As essential genes should have an infinite fitness cost upon complete silencing, we initially assumed differences in depletion would mainly depend on features describing the gRNAs. We leveraged a published E. coli CRISPRi essentiality screen using dCas9 performed in rich medium [21], which included 1,951 guides targeting 293 essential genes. To predict depletion in this dataset, we engineered a series of feature sets of increasing complexity ( Figure S1; Table S1) starting with the one-hot-encoded gRNA and PAM sequence including four bases upstream of the gRNA target sequence and three bases following the NGG PAM. This resulted in a poorly performing model with a median Spearman's ρ of ~0.21 in 10-fold cross-validation ( Figure 1B; Table S2, S3). We therefore iteratively added a set of additional features while monitoring changes in model performance. As targeting efficiency in bacteria has been suggested to depend on distance to the transcriptional start site [1,25], the set included absolute and relative distance to the start codon. We also included a suite of thermodynamic features describing gRNA:target interactions predicted using the ViennaRNA package [26]: minimum free energy of the folded gRNA, hybridization of two gRNAs, and hybridization of the targeted DNA and gRNA [27]. These additional feature sets resulted in only moderate improvement in Spearman correlation (ρ ~ 0.25) for our predictions.
Given that features describing the guide sequences themselves were inadequate to predict guide depletion, we developed a series of genomic and expression features describing each targeted gene to investigate if they could improve prediction of guide depletion ( Figure 1B; see also Feature engineering, Methods; Table S1). We collected publicly available RNA expression data over growth in minimal medium [28] and computed minimum and maximum expression values. We collected transcription unit (TU) information from RegulonDB [29] and calculated the distance from the target site to the start of the TU, the number of downstream genes in each TU, and the presence of other essential genes in the TU, and gene GC content. Incorporating these additional gene features led to a major improvement in prediction accuracy, with cross-validation performance jumping to a Spearman's ρ of ~0.66. This shows that autoML can rapidly produce predictive models of CRISPRi guide depletion, given a sufficiently rich feature set describing both guide and gene features.
To understand the contribution of these features to the prediction of gRNA depletion, we used SHapley Additive exPlanation values (SHAP values) computed with TreeExplainer [30] on the best performing random forest model produced by Auto-Sklearn ( Figure 1C; Table S4). Of all considered features, maximal RNA expression in minimal medium had the single largest average effect on depletion, making an average of an ~1.6 fold-difference to the predictions. Unexpectedly, high target gene expression tended to be associated with higher depletion. The number of downstream essential genes also had a strong effect on depletion predictions with an average ~1.3-fold difference, indicating the presence of polar effects in CRISPRi screens. The rest of the top seven features all described the targeted gene, including GC content, gene length, and the distance to the operon start, but each had smaller effects than expression. The most predictive effects that could be manipulated during guide design were associated with guide distance to the transcriptional start site, but on average these had fairly small effects (~1.07 fold) compared to features associated with the target gene. These findings show that predictions made by our depletion models are dominated by the effects of gene features that can not be modified in guide design. Comparison of Spearman correlation between actual and predicted guide depletion in 10-fold crossvalidation (CV) of the best model trained with Auto-Sklearn with different feature combinations, using data from [21]. (C) The ten most predictive features determined using TreeExplainer on the optimal random

Data fusion improves prediction of gRNA depletion
Given that several gene essentiality screens have now been performed in E. coli, we next asked whether expanding our training dataset could improve the accuracy of our predictions, as has previously been shown for eukaryotic CRISPR editing applications [19]. To this end, we collected data from two additional CRISPRi screens of E. coli in rich medium. First, we included data from an additional screen using the same gRNA library but with dCas9 expressed from a stronger promoter, which we refer to here as E18 Cui [31]. Second, we included data from a completely independent screen using a higher density library containing twice as many guides targeting essential genes (4,197, with 528 identical to gRNAs contained in Cui/Rousset), which we refer to as Wang [25]. We refer to the original data set as E75 Rousset. It is also worth noting that while the E18 Cui and E75 Rousset libraries were grown repeatedly to stationary phase, the Wang library was collected in log phase. The level of depletion in each dataset exhibited qualitative differences, with Wang showing a clearer bimodal separation between depleted and nondepleted guides ( Figure 1D) and a reasonable correlation of depletion between datasets (ρ~0.9 between E18 Cui and E75 Rousset; ρ~0.75/0.79 between Wang and Cui/Rousset; Figure S2).
To investigate the impact of fusing these datasets on model performance, we trained a series of models using Auto-Sklearn with each dataset individually or in combination and then tested them on sets of guides held out from each dataset as well as a mixed test set ( Figure 1E; Table S5). We additionally included a one-hot encoded dataset indicator as a predictor of potential batch effects. Unsurprisingly, models trained on single datasets tended to perform best on their cognate test set. Similarly, models trained on E18 Cui and E75 Rousset appeared to generalize better to each other than to the Wang dataset and vice versa. Combined training datasets produced models that generalized better across datasets (ρ ~ 0.58-0.62 tested on mixed data, vs ρ ~ 0.68) without degrading performance relative to models trained on individual datasets. In some cases, particularly for the Cui dataset, fused training sets actually improved performance on a test set drawn from a single dataset (ρ ~ 0.62 trained on Cui; ρ ~ 0.67 trained on fused data). In each case, the best performing model chosen by Auto-Sklearn was either a random forest regression or a gradient-boosted decision tree model. In support of this observation, we trained a suite of different regressors individually and found that while all models benefited from data fusion, the gains were particularly large for random forest and gradient boosted tree models ( Figure S3). These findings suggest that both increased generalizability and accuracy can be achieved by integrating multiple data sources for training tree-based models for CRISPRi depletion.

CRISPRi guide efficiency
Our exploration of the features most predictive of gRNA depletion in competitive screens highlighted that features describing the targeted gene often made much larger contributions to the prediction than features describing the guide sequence. This is problematic for predicting guide efficiency from depletion screens, as this large gene-togene variation in depletion must be removed to properly extract the contribution of guide efficiency.
To isolate guide efficiency from measurements of depletion, we developed a method to fit and remove gene effects using Mixed-Effect Random Forest (MERF) regression [32] (Figure 2A). The MERF model handles data with an underlying cluster structure by defining two separate models: a linear model that captures random effects associated with the cluster, and a random forest (or other complex model) that captures fixed effects associated with each individual measurement after removing cluster effects.
These models are then jointly optimized in an iterative process using the expectationmaximization algorithm. In our case, the cluster structure corresponds to collections of guides targeting the same gene. Random effects capture the effects of features associated with each gene (e.g. expression level) as well as dataset. Fixed effects are fit to the residual guide efficiency after removing the effects of the targeted gene, and correspond to features that could be manipulated in gRNA design (e.g. PAM and guide sequence, thermodynamic properties). The final fixed effect model can then be used to predict the relative efficiency of guides targeting a gene of interest independently of gene features.
Since the fixed effect model from the MERF removes gene-specific effects to estimate guide efficiency, guide-wise cross-validation is not possible, as the true guide efficiency is unknown. As an alternative to guide-wise cross-validation, we developed a gene-wise cross-validation scheme. We trained new models using 10-fold cross validation, this time holding out all guides targeting a set of held-out genes (see also cross-validation, Methods; Table S2), evaluating the Spearman correlation between predictions and measured depletion within each gene under the assumption that rank order should reflect guide efficiency within a gene. As a comparison, we retrained the previously published Pasteur model [20] on our training sets using the authors' publicly available source code. Since the Pasteur model was designed to be trained on a single dataset, we implemented a scaling normalization method to make depletion values comparable across datasets (see Data integration, Methods; Figure S4) based on prior work on data fusion for Cas9 genome editing in eukaryotes [19]. A comparable normalization factor is calculated automatically within the MERF model through clustering guides targeting the same gene ( Figure S4E).
As we had previously observed in our evaluation of predictions of guide-wise depletion, data fusion between multiple CRISPRi screens consistently improved performance across models for both the MERF and Pasteur model ( Figure 2B). The MERF trained on three datasets showed the best overall performance (ρ = 0.396 vs 0.366 for the retrained Pasteur model). We additionally investigated two deep learning models, but neither had improved performance over the MERF (see Supplementary Note), possibly due to the limited size of the training data. In sum, the MERF approach provides a straight-forward means of integrating datasets while isolating effects important for guide efficiency. Spearman correlations between predictions and measured logFC for held-out genes. Genes were held out in 10-fold cross-validation, and the reported median Spearman correlation was calculated across all heldout genes.

Model interpretation with explainable AI illustrates rational design rules for CRISPRi
To understand the features underlying model performance, we again examined SHAP values for our fixed-effect random forest model using TreeExplainer ( Figure 3A; Table  S8) [30]. The strongest average effects were seen for distances from the start codon. It has previously been suggested that there is a negative correlation between distance from the start codon and guide efficiency [1], though this has been called into question in subsequent work [31]. To investigate this in our model, we plotted the relationship between the distance to the start codon and SHAP value. The SHAP value plots indicated a strong enhancement of guide activity within 60 bases of the start codon, but only weak effects deeper in the coding sequence ( Figure S5AB&DE), supporting the absence of a linear relationship between distance and guide efficiency. There was a small but significant difference in the strength of the distance effect in the first 60 bases depending on whether a gene was the first in an operon ( Figure S5C&F), suggesting that this effect may in part be due to interference with RNA polymerase recruitment or the initial stages of polymerase extension. Accordingly, including an indicator of whether a guide targets the first gene in an operon in our fixed effect model improved cross-validation performance ( Figure S5G).
Other top features involved residues in the vicinity of the PAM sequence ( Figure   3A & B). For instance, the presence of a cytosine directly downstream of the PAM increased guide efficiency, while guanosine at the same position decreased efficiency as has previously been observed for Cas9-based genome editing in eukaryotes [11]. This may in part be a result of sliding on the PAM sequence leading to formation of disfavored DNA bulges that destabilize the Cas9 complex [33]. Cytosine was favored at the variable position of the NGG PAM, also previously reported for Cas9 editing [11,14]. In contrast, directly before the PAM sequence at position 20 of the guide, guanine and particularly adenine negatively impacted silencing efficiency, while cytosine and thymine increased efficiency -almost the exact inverse of previous reports for Cas9 efficiency in eukaryotic genome editing applications [11,14]. These effects within and around the PAM sequence appeared to interact with each other, as we saw several instances where combinations of features had weaker or stronger effects than would be predicted from the effects of the single features alone (Table S9).

A saturating CRISPRi screen targeting purine biosynthesis genes independently validates performance of tree-based models and data fusion
Our cross-validation results indicated that the MERF-derived fixed-effect random forest trained on multiple datasets outperformed other methods in predicting guide efficiency.
As independent validation, we initially explored low-throughput validation approaches, but the resulting data was too sparse to justify strong conclusions ( Figure S6). To resolve this, we performed a high-throughput screen targeting nine genes from the purine biosynthesis pathway of E. coli known to be essential in minimal medium, spread across seven independent transcriptional units ( Figure 4A). These genes were not included in our initial training set as they are not essential in rich medium, providing a truly independent test set. To avoid any bias in guide selection, we saturated all potential target sites in each gene, ending with a total of 750 gRNAs, including between 35 and 223 guides per gene. Duplicate samples were then collected at three time points during growth in M9 minimal medium, and gRNA depletion was measured with reference to input samples, normalized using a set of 50 gRNAs designed not to target any E. coli sequence.
Of these nine genes, purE and purK showed only small variation in guide efficiency ( Figure S7A), and so were excluded as test genes.
We first compared our MERF-derived fixed-effect random forest to the published Pasteur model and our retrained version of the Pasteur model by calculating Spearman correlation between predictions from each model and the measured logFC for each gene ( Figure 4B). As in our cross-validation, we saw large boosts in performance for models trained on fused datasets (ρ~0.478 for the original Pasteur vs. ρ~0.569 for our retrained Pasteur model), with a smaller further boost in correlation for the fixed-effect model from the MERF (ρ~0.589). We additionally tested several methods for predicting CRISPRi or CRISPR editing efficiency in eukaryotic organisms, but all exhibited poor performance on our purine screen dataset ( Figure S7D).
To further investigate differences in performance between the three methods, we devised two additional metrics based on potential use cases. For the first, we calculated how well each method enriches for efficient guides, defined as being in the top 20% of guides for each gene based on observed depletion, mimicking guide selection that might be done in the context of genome-wide screening. Our fixed-effect model led to 10 -15% increases in median enrichment of efficient guides relative to the original Pasteur model, and 4 -10% increases relative to our retrained version of the Pasteur model, depending on the selected time point (Figure 4C). For the second, we calculated the positive predictive value for each method when selecting the top three guides targeting each gene, mimicking guide selection that might be done when designing guides to selectively silence a particular gene of interest. For the fixed-effect model, there was a 79% chance of selecting a guide that led to depletion within two-fold of the most efficient guide for each gene (Figure 4D), compared to 67% and 60% for our retrained and the original Pasteur model, respectively. Given that the most efficient guides we observed across genes were depleted ~60 -1,000-fold ( Figure 4E; Figure S7B), a two-fold deviation still represents substantial depletion. Qualitatively similar differences in PPV between methods were seen using different thresholds for efficiency or when picking the top four or five predicted most efficient guides ( Figure S7C). In summation, we find that data fusion improves performance across model types in our completely independent validation screen, and that the fixed-effect random forest model from the depletion MERF further increases performance over the state-of-the-art on a variety of realistic tasks. In total, 750 gRNAs were designed. Grey boxes represent genes, black arrows transcriptional start sites.

Discussion
In this study, we developed a predictive model for CRISPRi guide efficiency using mixed-effect machine learning and integrated data from three gene essentiality screens in E. coli. Our approach was based on two primary insights: that effects from the targeted gene dominate over the effects of guide efficiency in depletion screens, and that fusing data across multiple screens can improve prediction accuracy and generalizability. We trained a mixed-effect random forest model (MERF) that uses a linear random-effect model to capture the confounding effects of the targeted gene and dataset, and a fixedeffect random forest that captures the residual effect of guide efficiency on depletion. We showed that this fixed-effect random forest model improves on the previous state-of-theart using both gene-wise cross-validation on our training data as well as a fully independent screen of guides targeting purine biosynthesis genes essential in minimal medium. These investigations provide a blueprint for developing similar predictive models in the absence of direct measurements of guide efficiency, both for other CRISPR-Cas systems [34] and technologies, as well as for CRISPRi in different bacteria where design rules may vary. We have made a web server for predicting CRISPRi guide efficiency using our MERF publicly available at: https://ciao.helmholtz-hiri.de.
Beyond developing a predictor for CRISPRi guide efficiency, our process of model development and validation provided several new insights into the behavior of CRISPRi screens. Strikingly, we found that gene expression was the single largest contributor to predictions of gene depletion as measured by SHAP values, where higher expression was associated with higher depletion. As the availability of transcriptomics data may be lacking for some organisms, we also tested the possibility of using the codon adaptive index (CAI) as a proxy, with promising results (Figure S7 EF&G), indicating that this observation is not an artifact of the expression dataset we used.
Similarly, our saturating purine validation screen revealed previously unobserved features of CRISPRi depletion screens. It has been suggested that CRISPRi guide efficiency is highest near the transcription start site and declines further into the coding sequence [1,25]. In contrast to this, we saw at least three distinct patterns of depletion across genes (Figure 4E & S7B): purC and purM showed the expected trend towards decreasing efficiency along the gene; the majority of genes had good and bad guides distributed across the length of the gene with no clear positional preference; and purE and purK showed very little variation in guide efficiency. These differences in the relationship between distance to the start codon and guide efficiency were surprising, as distance features were clearly important to our model predictions ( Figure 3A). We attempted to train a model excluding distance features, but this substantially degraded performance on predicting depletion in our high-throughput screen ( Figure S7EG&F).
Investigation of the effects of our distance feature on predictions indicate a sensitive region of ~60 bases from the start codon where silencing is more effective, possibly associated with transcription start sites or the early steps of transcriptional elongation ( Figure S5); additional work will be required to confirm this. In sum, our screen of guides targeting purine biosynthesis genes highlighted some unexpected features of CRISPRi while also independently validating the improved performance of our random forest model compared to the state-of-the-art.
Prediction of guide efficiency will become increasingly important with more complex applications of CRISPR technologies. In particular, the potential for multiplexing CRISPRi could be transformative when compared to established technologies. One example of this would be in screening for fitness interactions between genes. The current state-of-the-art is based on arrayed mating of single gene deletion libraries [35,36], which is both labor intensive and technically challenging, and becomes increasingly so when querying higher-order interactions [37]. A similar example is in metabolic engineering where multiplexed CRISPRi can be used to modulate biosynthetic pathways to optimize production of a particular metabolite for industrial applications [38]. The development of CRISPR array technologies that can co-express as many as 22 guides simultaneously [9,10] should accelerate the development of these approaches. However, large-scale, multiplex applications will require better tools for guide design to ensure robust results.
Individually screening guides for activity quickly becomes prohibitive when one considers applications that require hundreds or thousands of guides. The machine learning approach presented here provides a straight-forward solution to this problem.
While we focused here on applications of CRISPRi with dCas9 in E. coli, the techniques we have developed are in principle generic and could be extended to CRISPRi with any catalytically-dead nuclease in any bacterium of interest, or even to entirely different CRISPR technologies. For instance, we recently applied the same basic methodology to investigate the features underlying autoimmune activation of Cas13 targeting cellular RNA [34]. It is becoming increasingly clear that the performance of CRISPRi depends on both genetic background and the specific Cas protein used. For instance, Streptococcus pyogenes dCas9 expression has low silencing efficiency in some bacteria and can even be toxic [39], forcing the adoption of alternative Cas effector nucleases [40]. Alternative Cas effectors have large differences in their PAM preferences and the stringency of the PAM requirement [41]; presumably, alternative dCas proteins may also respond differently to the other gene and guide features described here. The approach outlined here, applying autoML and explainable AI to rapidly arrive at a description of the design rules underlying the efficiency of CRISPRi silencing, provides a means to rapidly characterize the behavior of new dCas proteins as genome-wide screening data becomes available.              illustrates feature importance for each guide prediction.

Supplementary Tables
Table S1 Description of the features used in model training Table S2 Description of train-test splits in the study Table S3 Model evaluation of feature engineering Table S4 SHAP values from model predicting the gRNA depletion Table S5 Model evaluation of data fusion Table S6 Model evaluation of data fusion with other model types Table S7 Model evaluation of segregating gene and guide effects Table S8 SHAP values from segregation models Table S9 SHAP interaction values from MERF    [16][17][18][19]42]. Considering this, we asked if deep learning models would also improve performance in predicting gRNA efficiency for CRISPRi in bacteria. As a representative architecture, we implemented a one-dimensional convolutional neural network (CNN), which runs a series of kernel filters across the sequence to extract local features. As a second model representative of current deep learning approaches, we reimplemented a state-of-the-art deep learning architecture based on the CGx architecture used by CRISPRon [19] for Cas9 genome editing in eukaryotes, only trained using our CRISPRi data and using our feature set. Both models were trained using scaled data as done for re-training the Pasteur model (see Data integration for retrained Pasteur and deep learning models, Methods).
For our custom CNN architecture, we used the convolutional layers to extract sequence features before concatenating them to the rest of our guide feature set ( Figure   S8A). This concatenated feature set was then fed through a fully connected 4-layer multilayer perceptron (MLP) for regression using MS values for guide efficiency. Both the custom CNN and CGx_CRISPRi models exhibited lower Spearman correlations as compared to our previously trained random forest models when tested on held-out gene sets ( Figure S8B; Table S7; CNN ρ=0.321, CGx_CRISPRi ρ=0.357, vs. MERF ρ=0.396).

As one major difference between our CGx_CRISPRi implementation and that of
CRISPRon is in the use of the △Gb parameter that integrates a complete thermodynamic model of Cas9 binding [43], we additionally tested substituting this for our simpler thermodynamic features in both CGx_CRIPSRi and our MERF model, leading to no clear improvement in prediction performance ( Figure S8BCD&E). These results show that conventional machine learning approaches can outperform deep learning architectures and suggest that data may currently be limiting for more complex machine learning approaches, though it remains possible that specialized architectures may improve on the approaches implemented here.

Training datasets
We collected the data from three previous CRISPRi genome-wide essentiality screens in E.coli K12 MG1655 [21,25,31]. The sequence, targeted gene, gene position, and fitness effect of each gRNA were retrieved from the supplementary information of each study.
Gene sequences and positions were updated to be consistent with the latest reference genome version (NCBI: NC_000913.3). We discarded gRNAs from the Wang data set previously removed as having insufficient read counts [25] or sequences from the Rousset and Cui datasets that differed from the reference sequence due to differences in the genome versions. 8099 gRNAs targeting the coding-strand within the coding regions of essential genes were extracted in total from all three datasets.

Feature engineering
A Python script (feature_engineering.py) was used to compute 137 sequence, thermodynamic, genomic, and transcriptomic features (Table S1) RegulonDB [44]. Transcriptomic data including gene expression levels across growth at ten different ODs were obtained from a previous study [28]. Minimal or maximal expression levels were calculated across the range of ODs until the growth phase when cells were collected in each CRISPRi screen: OD 1.4 for the Wang dataset, and all ODs for the Rousset and Cui datasets. The codon adaptation index (CAI) for each gene was calculated using CAIcal [45]. The resulting feature sets are available on GitHub.

Cross-validation for machine learning methods
To evaluate the models for depletion prediction, training and test sets were split guidewise based on unique gRNA sequences. 10 For MERF, predictions were performed using the fixed-effect model. The gRNAs used in each train-test split are listed in Table S2.

Segregation of guide and gene effects with MERF
We removed genes with less than 5 gRNAs in each dataset to stabilize estimates of median gRNA activity scores for a fair comparison between the MERF and models requiring scaling data integration (see below), resulting in 7400 gRNAs in total. This To train simplified models excluding transcriptomic measurements ( Figure S7E-G), the CAI value, gene length, gene GC content, and dataset were included for the randomeffect model. The output from the complete MERF model was in the range of -12 to +2, and output from the fixed-effect random forest was in the range of -7 to -2.5.
In all cases, the fixed effect was used for prediction (i.e. in cross-validation and validation with the purine screen), requiring only features associated with the guide sequence.

Data integration for retrained Pasteur and deep learning models
For the Pasteur and deep learning models, logFC values were scaled to integrate the datasets, adapted from a previously applied data fusion method [19]. First, the mean of logFCs of E75 Rousset and E18 Cui were calculated and used as the scaled logFC ( Figure S4B-D). Then linear regression was performed between the logFCs in Wang and scaled logFCs in E75 Rousset for 378 overlapping gRNAs. All of the logFCs from Wang were then scaled by the fitted slope and intercept values ( Figure S4B). The 378 overlapping gRNAs in Wang were excluded in the subsequent training. Activity scores were calculated by subtracting the scaled logFC of each gRNA from the median scaled logFC for each gene across all 3 datasets (Figure S4A). For cross-validation, scaling was performed within each test fold to avoid possible leakage of information between test and training sets.

Deep learning model implementation and training
Deep learning models were trained using pytorch (version 1.8.1) [47] and pytorchlightning (version 1.5.10). For our custom 1D CNN model, sequence features were processed using 1D convolutional layers and later concatenated with other guide features ( Figure S8A). Concatenated features were further processed with fully connected layers.
For CGx_CRISPRi, we implemented the CRISPRon architecture (CGx) modified to include our nine non-sequential guide features (distance features, thermodynamic features, etc.) concatenated to the processed sequential features. To test the effect of incorporating the deltaGB score [43], the four thermodynamic features were replaced with deltaGB, resulting in concatenating five non-sequential guide features to the processed sequential features. We trained the model using the Adam optimizer [51] with a learning rate of 0.001 and batch size of 32. We additionally tested a learning rate of 0.0001 and batch size of 500 as used in the original CRISPRon implementation [19], but saw only minor differences in performance (Table S7).

Model interpretation
Tree-based models, including depletion prediction models, and the fixed-effect model from MERF were interpreted using TreeExplainer from the python shap package (version 0.39.0) [30].

Strains and growth conditions
All strains, plasmids, and primers are listed in Supplementary Table S15, S16, and S17.

Validation of GFP silencing by flow cytometry in E. coli and S. Typhimurium
To investigate gene repression efficiency, 19 sgRNAs were selected to target the coding strand of a degfp reporter gene at different positions in E. coli BL21(DE3) ( Table S15).
Cells were initially transformed with three compatible plasmids encoding dCas9, a degfptargeting sgRNA, and a deGFP reporter (Table S17) control, and the difference between the positive and the negative control, followed by log transformation. The mean log fold repression across three replicates was compared to predicted values from the machine learning models (Table S10 and S12).
For experiments with S. Typhimurium, the procedure was similar, but cells were grown until an OD600 of ~0.8 before analysis on an Accuri C6 flow cytometer. To eliminate noncellular events, the forward scatter (cutoff of 10,000) was used and the mean green fluorescence value (FL1-H) across 30,000 events within a gate set for S. Typhimurium was used for data analysis as described above across four replicates (Table S10 and S12).
All possible 20 nt gRNAs with an NGG PAM, GC content between 30% and 85%, and without BbsI restriction sites were included in the library, resulting in 750 gRNAs ( Table   S13). The minimum and median Hamming distance of the 30-mer sequences between the 750 gRNAs and the training data from three essentiality screens were 7 and 21.
For the sgRNA library, plasmid DC512 served as a backbone, following a previously established protocol [9]. To generate a library with 800 sgRNAs (including 50 randomized non-targeting sgRNAs;  Subsequently, 5 mL of the culture was sampled and the library was extracted by miniprep (Nucleospin Plasmid, Macherey-Nagel) to obtain the initial sgRNA distribution. The calculated amount of culture to reach OD600 0.01 in 50 ml M9 minimal medium, was sampled and washed twice with M9 minimal medium to remove traces of the LB medium.
The culture was incubated at 37°C with shaking until it reached OD600 1, allowing ~6 replications. 5 ml of the culture was sampled at OD600 0.2 and OD600 0.6, and at OD600 1, and the library was extracted by miniprep. The experiment was performed in duplicate starting from two independent transformations of MG1655 with the plasmid library.

Library sequencing
The sequencing library was generated using the KAPA HiFi HotStart Library Amplification Kit for Illumina® platforms (Roche) and the primers listed in Supplementary Table S17.
The first PCR adds the first index. The second PCR adds the second index and flow cellbinding sequence. The amplicons of the first and second PCR reactions were purified using solid-phase reversible immobilization beads (AMPure XP, Beckman Coulter) following the manufacturer's instructions to remove excess primers and possible primer dimers. The sequencing library samples, with the required DNA concentrations ranging from 100 pg -200 ng in a total volume of 10 µL, were submitted to the HZI NGS sequencing facility (Braunschweig, Germany) for paired-end 2 × 50 bp deep sequencing with 800,000 reads per sample on a NovaSeq 6000 sequencer.

Sequencing data processing
Paired-end reads were merged using BBMerge (version 38.69) [52] with parameters "qtrim2=t, ecco, trimq=20, -Xmx1g". Merged reads with perfect matches were assigned to the gRNA library using a Python script. After filtering guides for at least 1 count per million in at least 4 samples, read counts of each gRNA were normalized by factors derived from non-targeting guides using the trimmed mean of M-values method in edgeR (version 3.28.0) [53]. An extra column was added to the design matrix to capture batch effects between the two replicate experiments. Differential abundance (logFC) of gRNAs between time points and the input library was estimated using edgeR, and a quasilikelihood F-test was used to test for significance after fitting in a generalized linear model.