A model for accurate quantification of CRISPR effects in pooled FACS screens

CRISPR screens are powerful tools to identify key genes that underlie biological processes. One important type of screen uses fluorescence activated cell sorting (FACS) to sort perturbed cells into bins based on the expression level of marker genes, followed by guide RNA (gRNA) sequencing. Analysis of these data presents several statistical challenges due to multiple factors including the discrete nature of the bins and typically small numbers of replicate experiments. To address these challenges, we developed a robust and powerful Bayesian random effects model and software package called Waterbear. Furthermore, we used Waterbear to explore how various experimental design parameters affect statistical power to establish principled guidelines for future screens. Finally, we experimentally validated our experimental design model findings that, when using Waterbear for analysis, high power is maintained even at low cell coverage and a high multiplicity of infection. We anticipate that Waterbear will be of broad utility for analyzing FACS-based CRISPR screens.


The Waterbear model
The Waterbear model has a more general generative view which we use for the simulations and call the "cell-level" view of the model.It is described in Supplementary Section 1.1.The model used for inference is referred to as the "gene-level" model and a full description of the model is in Supplementary Section 1.2.

The cell-level generative model
The cell-level model can be seen in Algorithm 1.In this model, we assume the gene-level effect sizes, guidecomposition (the relative frequency of each guide in the input population), the experiment-level multiplicity of infection (MOI), and bin-sizes are fixed.While this is not necessary, it enables us to define the parameters for these arguments and thus iterate over them, while still keeping randomness in the remainder of the model.Fixed here is also relative to when the model is run.Parameters are fixed at the start of a simulation, but randomly sampled and passed in as arguments in practice.Additionally, a plate model can be seen in Supplementary Figure 1.1.

Algorithm 1
The cell-level generative view of the Waterbear model.Input µ, a vector of effect sizes, length number of genes.Genes with no effects have value 0. φ, a vector of length N dispersion values (one per sample).λ, the multiplicity of infection.guide composition, a unit simplex of length number of guides.

Output
A tensor of dimension N x number of guides x number of bins.
for n in N samples do for c in C n cells do The number of guides in this cell.
Choose which guides.
Given effect sizes µ, draw from the shifted distribution.
Given the reporter, sort the cell into the correct bin.
Update the true cell-guide counts.end for for g in G guides do The observation is a noisy version of the truth, post sorting.
end for end for S. Figure 1.1: Plate model for the cell-level model which is used for simulation.

The gene-level inference model
We begin by describing the full generative model.This model is represented in the plate model in Supplementary Figure 1.2.
First, the experiment-level parameters: π ∼ Beta(10, 10), Importantly, these priors are fairly diffuse.Additionally, φ MLE is the MLE estimate learned from a frequentist view of this model only on the non-targetting guides.
Next, we cover the effect parameters, where G(h) is a map from the guide index to the corresponding gene index.Finally, we cover the remaining nuisance parameters.Note that one can specify the model in terms of either the function q or t n (see Methods in the main manuscript).Because of this duality, we can specify a prior in either space, and thus choose to do so on q as the prior is more natural: Here, α is the estimated bin sizes (in unit scale).Finally, we can write the observation as The posterior offers no simple closed form solution, and thus we turn to Markov Chain Monte Carlo.The model is implemented using NIMBLE and can be found on the GitHub link.As implemented, it uses a Gibbs sampler where conditional conjugacy is possible and adaptive random walk MCMC otherwise.S. Figure 1.2: Plate model for the gene-level model used for inference.

Simulations
Here, we report additional simulations which were referenced, but not included in the main text for brevity.Additionally, we describe some of the modeling choices (e.g. the effect size generation).

Simulation effect sizes
Pooled CRISPR screens were performed as described (Methods, Pooled CRISPR screens) and effect sizes were learned from these data.An excess of cells from one donor were collected.After genomic DNA isolation, the genomic DNA was diluted to the equivalent of 50X, 100X, and 200X coverage.Sequencing libraries were made from each diluted sample as described for the other pooled CRISPR screens.We broke the effects into classes easy, moderate, and difficult based on whether MaGeCK called them significant across all dilution rates, some, or only the lowest dilution rate.Since MaGeCK effect sizes are a function of the bin sizes, we used Waterbear to learn the effect sizes in the standard normal space.Those effect sizes are plotted in Supplementary Figure 2.1a.We then smoothed the distribution using the kernel density estimator in R which resulted in the distributions seen in Supplementary Figure 2.1b.Of note, we thresholded the distributions so they would be discrete classes and their intepretation would be simpler.For all simulations, a gene-level effect is drawn from these distributions.To create the equalmixture simulations, we sampled classes uniformly at random, then sampled an effect size from the corresponding class.The final distributions are shown in Supplementary Figure 2.2 which shows the marginal distribution of the observed experiments and the simulated experiments is comparable.

Simulations across additional experimental configurations
In the following simulations we now vary • the proportion of gene effects (0.05, 0.10, 0.25), • the number of cells (50e3, 500e3, 1e6), • the total number of genes (500, 1,000, 5,000), • the MOI (0.3, 1, 2, 5, 10), while maintaining four targeting guides per gene assayed, three replicates, and 1,000 non-targeting guides as in the main text.The proportion of gene effects is represents the total fraction of genes that have true non-zero effects.The number of cells is the number of observed cells across all bins (post FACS sorting).The number of guides is the total number of targeting genes assayed.Each simulation is performed ten times and each line represents an average.
We present the figures grouped by number of genes targeted, as we believe that is most insightful.The figures thus represent 500, 1,000, and 5,000 genes in Supplementary Figures 2.3, 2.4, and 2.5, respectively.
The following overall trends hold true for all figures: • As the proportion of gene effects increase, sensitivity increases.
• As the MOI increases, sensitivity increases.
• As the number of cells increases, sensitivity increases.S. Figure 2.1: Effect size distributions from real data used to inform simulated data.
However, it is clear that this rate reaches an inflection point at 5,000 genes where there is very little sensitivity achievable with 50,000 cells (Supplementary Figure 2.5).Even at 500,000 cells, sensitivity only becomes comparable to the other configurations around MOI 2.

Effect of Non-Targeting Guides
To test the robustness of inference with varying number of non-targetting guides, we simulated under the similar conditions as in Figure 3 in the main text, while varying only the number of non-targetting guides.In particular, we used 10, 100, and 1,000 non-targetting guides (the main paper uses 1,000).We also maintained the total number of targetting guides.This means that the per guide coverage is technically slightly higher for simulations with smaller number of non-targgeting guides.Supplementary Figures 2.6 and 2.7 display the sensitivity and FDR, respectively.Note, there is little difference, if any in performance as a function of the number of non-targetting guides.

Sensitivity plots including MAUDE
In the main text we do not include MAUDE in the sensitivity analysis (Figure 3) as it is very miss-calibrated and thus is expected to have high sensitivity somewhat spuriously.Indeed, we see higher sensitivity in Supplementary Figures 2.9a and 2.9b, but perhaps not as high as expected given average FDR around 0.50.S. Figure 2.3: 500 gene simulations which vary the MOI, proportion of gene effects, and number of cells.In the sub panels, p refers to the proportion of effects, and nc refers to the number of cells.S. Figure 2.9: Sensitivity results from Figure 3 (main text) including MAUDE.

Experimental validation results
In this section we display further results from the arrayed knockdown experiments [1].We previously validated a number of regulators of IL2RA by individually knocking out each regulator and measuring the effect on IL2RA levels (Freimer et al.).Here we use that data to validate Waterbear's sensitivity identifying regulators from IL2RA CRISPR FACS screens under different experimental conditions including changing MOI, coverage, and number of replicates.

Down sampling replicates
Here, our goal is to see how much the results are affected by reducing the number of replicates from three to two.Of particular interest is the extreme of high-coverage, low MOI (a conventional experiment) to low-coverage, high MOI.These figures can be seen in Supplementary Figures 3.1 -3.4.In particular, we note reasonable concordance between all four cases, however, we note that (low coverage, low MOI) has the lowest, while (high coverage, high MOI) has the highest concordance.

Validation against arrayed knockouts
Here, we use the arrayed knockout data from [1] to benchmark all the tools against a positive control set (26 genes) and a much smaller null set (2 genes).These genes are listed in Supplementary Table 1.Waterbear is shown here with all gene names (Supplementary Figure 3.5), along with MaGeck (Supplementary Figure 3.6), and MAUDE (Supplementary Figure 3.7).In particular, both Waterbear and MAUDE show high sensitivity, whereas MaGeCK misses a large number of true positives, even some with modest effect sizes (e.g.JAK3, RELA).The validation set read from [1].Genes that were not included due to uncertain effects were: MYC, IKZF1, IKZF3, FOXO1, and SMARCB1.

Average number of collisions
Here, we derive a simple model to estimate the number of collisions.A collision occurs when two or more guides which have an effect on the phenotype enter the same cell.By definition, collisions are an upper bound of the expected fraction of epistatic events, since two guides may enter a cell and have independent effects on the phenotype.Our goal here is to compute for a given cell, the expectation that this cell receives two or more guides with an effect.In particular, there are two free parameters in this calculation: (1) p, the proportion of guides that have an effect on the phenotype, and (2) λ, the multiplicity of infection.Let X be a random variable that counts the number of guides in a cell.We assume, where λ is the multiplicity of infection (MOI) of the experiment.If we assume that p proportion of total guides in our pool have an effect, then, if we select a guide at random, let G i = 1 if guide i has an effect, 0 otherwise.
Further, we assume P (G i = 1) = p (a Bernoulli random variable).Finally, define, By definition, Y is 1 if a collision happens, and 0 otherwise.Thus, by construction, we want to compute E[Y ] = (a) The effect size distribution as inferred from the dilution experiments.
(b) The effect size distribution after smoothing the density.
S. Figure2.2:The ECDF of the complete effect size distribution after pooling easy, moderate, and difficult effects.
Average sensitivity results while varying coverage and exactly one guide per cell.Average sensitivity results while varying MOI.

1 :
The effect size distribution and significant hits at LFSR ≤ 0.10 when including only replicates 1 and 2 versus all three replicates with the high coverage, low MOI data.3.2: The effect size distribution and significant hits at LFSR ≤ 0.10 when including only replicates 1 and 2 versus all three replicates with the high coverage, high MOI data.3.3: The effect size distribution and significant hits at LFSR ≤ 0.10 when including only replicates 1 and 2 versus all three replicates with the low coverage, low MOI data.3.4: The effect size distribution and significant hits at LFSR ≤ 0.10 when including only replicates 1 and 2 versus all three replicates with the low coverage, high MOI data.3.5: Validation using low coverage, high MOI data for Waterbear.3.6: Validation using low coverage, high MOI data for MaGeCK.