Weakly deleterious natural genetic variation greatly amplifies probability of resistance in multiplexed gene drive systems

Evolution of resistance is a major barrier to successful deployment of gene drive systems to suppress natural populations, which could greatly reduce the burden of many vector borne diseases. Multiplexed guide RNAs that require resistance mutations in all target cut sites is a promising anti-resistance strategy, since in principle resistance would only arise in unrealistically large populations. Using novel stochastic simulations that accurately model evolution at very large population sizes, we explore the probability of resistance due to three important mechanisms: 1) non-homologous end-joining mutations, 2) single nucleotide mutants arising de novo or, 3) single nucleotide polymorphisms pre-existing as standing variation. Our results explore the relative importance of these mechanisms and highlight a complexity of the mutation-selection-drift balance between haplotypes with complete resistance and those with an incomplete number of resistant alleles. We find this leads to a qualitatively new phenomenon where weakly deleterious naturally occurring variants greatly amplify the probability of multi-site resistance. This challenges the intuition that many target sites would guarantee prevention of resistance, where in the face of standing genetic variation, it can be probable even in not very large populations. This result has broad application to resistance arising in many multi-site evolutionary scenarios including multi-drug resistance to antibiotics, antivirals and cancer treatments, as well as the evolution of vaccine escape mutations in large populations.


Introduction 24
Suppression gene drive systems have the potential to be highly effective for population control of 25 many vectors of human disease, including malaria carrying mosquitoes, as well as for invasive species 26 control (1). With the recent discovery of the CRISPR-Cas9 gene editing system (2), such drives 27 can be realised by targeting specific sequences for insertion of a drive construct into a gene of 28 nonfunctional. We find that the probability of resistance has a universal sigmoidal behaviour, which is of the form where for small population sizes N ≪ N * resistance is very unlikely, whilst for sufficiently large 116 population sizes (N ≫ N * ) resistance arises with near 100% certainty. We use the simulations to 117 develop a heuristic theory to characterise how N * depends on the parameters β, ξ, σ, N and then 118 focus in particular on contours in the parameter space of these variables where the probability of 119 resistance is p = 0.05, or 5%. On the other hand for de novo SNPs N * takes the following similar functional form: but unlike for NHEJ mutants, the fitting constant γ m is not independent of m and we show in the 165 Supplementary material is of the form γ m = s b τ m , which arises when multi-fold resistance alleles arise 166 sequentially rather than concurrently; here, s b is an effective/aggregate beneficial selection coefficient 167 for the resistant haplotype with m functional resistant alleles and τ is the timescale over which these producing functional mutants at all m sites, the critical population size for de novo generation of 175 mutants is smaller, and it is easier for resistance to arise de novo comnpared to NHEJ generation of 176 mutants and the disparity increases as the number of gRNAs m increases.

177
For both NHEJ and de novo SNPs, we can summarise this information in Fig.2 and this has no effect on contours, which is intuitive as these mutants only become advantageous once 185 drive has nearly fixed, and their effective frequency dependent selection coefficient has only a weak 186 dependence on σ, the cost in the presence of wild type only.

187
Probability of resistance: preexisting and de novo SNP mutants only 188 We now allow the possibility that functional resistant (R) mutants may exist in a mutation-selection 189 balance before the introduction of drive, by running each replicate simulation for a period of time ; (4) represents the effective mutational flow from m − 1 mutants to m mutants, which is balanced by 207 Figure 2: Contours for probability of resistance p = 0.05 as a function of N and β for m = 1 (blue), 2 (red) or 3 (yellow) gRNAs. Different symbols correspond to interpolated values from simulations for different fitness costs for functional mutants: square symbols σ = 0.01, triangle symbols σ = 0.001, circle symbols σ = 0.0001, and the solid line is a plot of p = 1 − e −N/N * n for γ n = 0.2.
negative selection, where each m mutant has fitness cost mσ, before the introduction of drive. By 208 averaging the probability of fixation over this distribution we recover the scaling law for N * s stated 209 above. We do not attempt to calculate an exact theory, but use a fitting parameter γ to represent 210 the scaling between these heuristic considerations and an exact theory. 211 We first fit the curves of probability of resistance vs N for m = 1, which gives an approximate mutations is to very greatly reduce the effective population size needed before resistance arises for 220 more than m > 1 gRNAs. Importantly, unlike for de novo SNPs or NHEJ mutations, there is a 221 significant effect for changing selection, where the probability of resistance increases for decreasing 222 σ, because less harmful mutations will segregate at a higher frequency before release. Though 223 Hermisson and Pennings (17) showed this to be the case for a single site (m = 1) (6, 17), the effect, 224 as we see is only weak and logarithmic in σ; here however, for m > 1, N * ∼ σ m−1 , which represents 225 a significant amplification of more weakly selected standing variation.

226
Probability of resistance: NHEJ, de novo and preexisting SNPs combined 227 Finally, we combine all three mechanisms by which resistance can arise to assess their relative im-228 portance as a function of varying β, ξ and σ. The summary of these results are shown in Fig.4, for 229 contours of p = 0.05 for β vs N , for the case of a) all SNPs being functional (ξ = 1) and b) 1% 230 functional SNPs (ξ = 0.01). The same broad trends as observed in the simulations of the previous 231 sections are seen, where decreasing β increases the range of population sizes for which resistance 232 is improbable. However, in the presence of preexisting SNPs, we find that for sufficiently small β, 233 this effect plateaus and the 5% contours are constant with respect to N , as the dominant and more 234 rapid mechanism of resistance becomes SNPs from standing variation. We also find that increasing 235 ξ decreases the population size at which resistance arises at 5%, given by a uniform shift of these 236 curves to the left, as seen by comparing panel a) to b). The solid lines in each plot correspond to 237 assuming the probability of resistance from NHEJ mutations and preexisting (& de novo) mutations 238 are independent: The plots use the parameters γ n = 0.2 and γ s = 2, as derived from fitting in the previous sections,

242
where the different line thicknesses correspond to different values of σ; we see that asymptotically Information) and explicitly accounting for a fraction β of functional NHEJ mutants. 260 We can find an equivalent expression for de novo SNPs using Eqn.3, but since our results indicate 261 that standing variation will always be at least as important as de novo generation of SNPs, we 262 directly consider the constraint on m for standing variation. Using Eqn.1&4, we can in principle 263 calculate the minimum number of gRNAs m required to prevent resistance to a specified probability.

264
Eqn.4 is transcendental, but if we replace the weak m dependence in the logarithm (ln (1 + s b mσ ) → 265 ln (1 + s b σ )) then for σ ≫ ξµ, we find the approximate expression: where W k (x) is the Lambert W function, which are solutions of the equation we w = x.

267
When both mechanisms of resistance are possible, the effective critical population size N * is 268 a combination of N * n and N * s , as shown in Fig.4 Eqn.5 is a reasonable approximation, but it is 269 difficult to invert this expression to find how many gRNAs are needed to prevent resistance to a we can compare to whichever target population size we have in mind for an application of drive. 273 We can examine two different extremes: 1) β = 10 −4 , which corresponds to assuming functional 274 NHEJ mutants are quite rare, as we might expect given the expectation that NHEJ will tend to 275 produce significant genetic changes like multiple base-pair insertions and deletions and 2) β = 10 −2 ,

301
In particular, we have highlighted the role of the critical population size N * , as a useful summary 302 measure of the probability of resistance compared to some target natural population size. Multi-303 plexing of guide RNAs, such that resistance is required in all gRNAs, is a promising anti-resistance 304 strategy, as it aims reduce the individual rate of resistance sufficiently that N * is much greater than 305 any realistic population size. Our results have highlighted 5 key parameters determining N * and 306 the probability of resistance evolving for multiplexed drive: β, ξ, σ, m, and N . The expressions we 307 derive from simulations for N * in terms of these parameters inform on the relative importance of 308 non-homologous end-joining vs de novo single nucelotide mutations vs standing variation of SNPs.

309
Our key novel finding is a significant amplification of the role that weakly deleterious standing ge-310 netic variation plays in determining resistance in multi-site evolutionary systems, compared to de 311 novo mutation, such that resistance is probable even when the number of gRNAs is large and for 312 not very large population sizes. In other words, the intuition that many gRNAs are guaranteed to 313 prevent resistance can potentially fail if standing variation is not taken into account.

314
Researchers developing gene drive constructs for population suppression will have some level of 315 control over all of the key parameters identified above, but the last of these, by choice of target site. give an indication of how many gRNAs are required to achieve a probability of resistance less than 349 0.05, in each scenario, but including both mechanisms, we refer to Table.1. Given a target population such that each mutation contributes to partial reduction in cleavage efficiency; within our model 368 this would correspond to a different effective value of ξ. We also assume there is no recombination 369 between sites and related to this that resistance only arises due to target site effects; recombination 370 and off-target resistance has been explored using deterministic modelling (20, 21). We also have not 371 explored the role of varying a number of parameters like cleavage efficiency ϵ, the intrinsic growth rate 372 R m , or the fitness parameters of drive; some of these have been previously explored in the context 373 of single gRNAs (6, 7) and these works and ours in general indicate these will have a secondary role 374 compared to the population scaled rate of NHEJ and de novo mutation. For example, increasing 375 ϵ will tend to increase the rate that drive replaces wild type, and we may expect one consequence 376 is a quantitative reduction in the probability of resistance as there is less time to generate mutants 377 and establish before population elimination, but the qualitative results we have presented will not 378 change. However, there is evidence that the overall effective efficiency of suppression may diminish 379 when there are large numbers of gRNAs (22). There is also the possibility that homing events may be 380 more error-prone than normal DNA replication and could lead to the loss of function of one or more 381 gRNAs, which we leave to future modelling efforts, but could increase the probability of resistance 382 evolving depending on the rate of such events occurring.

383
On one hand, it is not clear that spatial structure will strongly affect these results, since the

477
The alleles are assumed to have no effect on male fitness, while fitness effects in females are as which gives a limit of roughly 4 × 10 9 .

526
An alternative approach is to use the multivariate Gaussian approximation to the multinomial 527 distribution: 528 p(n 1 , n 2 , ..., n K |x 1 , x 2 , ..., x K , N ) where n = (n 1 , n 2 , ..., n K ) T , x = (x 1 , x 2 , ..., x K ) T , are the vectors of the numbers drawn of K 529 alleles, and their expected frequency, respectively, and Σ is the scaled covariance matrix of the 530 multinomial distribution, where Σ ij = N x i (δ ij − x j ). However, this approximation is poor when 531 for any of the alleles N x k ∼ 1. In this limit, these rare alleles are well-approximated by a Poisson 532 distribution. The approach taken in this paper is therefore to partition the alleles into a rare category 533 R, if N x k ≤ 10, and non-rare if N x k > 10, where the former is drawn from independent Poisson distributions, while the latter from a multivariate Gaussian distribution conditioned on a smaller total 535 population size N ′ = N − k∈R n k : where the vectors n ′ and x ′ only take elements k / ∈ R, whose length is K ′ = K − |R|, and the . We can assume independent Poisson distributions for      Simulations with a single guide RNA (gRNA) or target site require specification of a 4 × 4 fitness 7 matrix for both males W m and females W f , whose elements w ij are the fitness of the genotype i/j, 8 where for example genotype 1/3 ≡ W/N. In the absence of mutation and drive the allele frequency 9 vector in next generation of females x t+1 and males y t+1 can be expressed succinctly in matrix 10 notion in terms of the frequency vectors in the current generation x t and y t as follows where the mean fitness of males and females is given by the quadratic forms and where the ⊙ operation is the element by element (Hadamard) multiplication operation, i.e.  and how de novo SNPs arise by mutation (right). On the right, wild type is the background colour, a white marker indicates the cleavage state, a long red bar represents drive inserted at the target site, a grey marker is an NHEJ event, which are functional (green) or non-functional (red).
Including mutations is straightforward using a mutation probability matrix M , whose elements 26 M ij represent the probability of generating allele i per generation from allele j, where this stochastic 27 matrix, which acts on probability/frequency column vectors must have the property i M ij = 1: We make the assumption that there are only two mutational paths W → R and W → N, which 29 happen with rate ξµ and (1 − ξ)µ, respectively and there are no backmutations to W (Fig.1. This   30 gives a mutation matrix: Finally, we add the effect of drive by following (1), where we assume drive acts during meiosis after 32 viability selection. Drive is only produced by heterozygotes W/D, whose frequency after selection where H t = (x 1 ) t (y 4 ) t + (x 4 ) t (y 1 ) t is the drive-heterozygote frequency (before viability selection 36 acts) and the vector κ is stoichiometry like vector describing the change in fractions of each alleles 37 produced as gametes from the W/D heterozygote relative to no drive: here R m is the absolute mean number of offspring from females, and α is a parameter controlling 49 density dependent growth, which we typically tune to give a carrying capacity K = (R m (w f +w m ) − 50 1)α = (R m − 1)α = N 0 , the initial population size we set for the population in the absence of drive.

51
In these simulations we take a typical value of R m = 6.

53
In general, notational complexity increases considerably when we consider 2-fold and higher orders 54 of multiplexed drive. We will only consider 2-fold and 3-fold multiplex in this paper, but we will set 55 out a general framework for any order of multiplex m. 3-fold multiplex, there are a total of n H = 11 haplotypes frequencies to be tracked. 66 We also need to decide on the ordering of the haplotypes in the frequency vectors x and y. The 67 convention we use is column ordering and its generalisation for higher dimensional tensors, which 68 gives the mapping of indices to haplotypes as shown in Table 1 and Table 2 69 The fitness matrices for 2-fold multiplex are 7 × 7, while for 3-fold they are 11 × 11. For males 70 their fitness matrices are filled with all ones. We do not explicitly write out the fitness matrix of  Table 2: Mapping between frequency vector index and haplotypes for 3-fold multiplex female genotypes for multiplex drive, but its construction follows the same rules as for 1-fold drive  The mutation matrix, for mutiplexed drive of degree m, M (m) , is also a n H × n H matrix, the j th 81 column representing the rate of mutations from the j th haplotype to each of the other haplotypes.
where haplotype k 1 k 2 maps to j and haplotype k ′ 1 k ′ 2 maps to i, using Table 1 and k 1 , k 2 , k ′ 1 , k ′ 2 can take values {1, 2, 3} corresponding to alleles {W, R, N}, respectively. The multinomial coefficient accounts for the redundancy in the number of ways haplotype k ′ 1 k ′ 2 can be obtained by mutation of genotype heterozygote frequency WW/DD H 1 = x 1 y 7 + x 7 y 1 WR/DD H 2 = x 2 y 7 + x 7 y 2 WN/DD H 4 = x 4 y 7 + x 7 y 4 Table 3: Drive heterozygotes for 2-fold multiplex and their frequencies. Time dependence is implicit.
Explicitly, evaluating this for m = 2 we arrive at the following mutation matrix For m = 3, equivalent considerations give the following mutation probability from the j th to i th haplotype where haplotype k 1 k 2 k 3 maps to j and haplotype k ′ 1 k ′ 2 k ′ 3 maps to i, using Table 2 and can take values {1, 2, 3} corresponding to alleles {W, R, N}, respectively.

92
For multiplex drive, we make the assumption that drive acts independently at each target site, shown shaded in red in the table, to calculate the total fraction of DD gametes. This is formalised 98 mathematically below and extended to any number of m of gNRAs.

99
To consider how multiplexed drive affects haplotype frequencies, we need to consider now that 100 there is more than just a single genotype that is affected by drive. All genotypes with drive on one 101 chromosome and at least a single W allele on the other can be converted to homozygous drive; for 102 m = 2 there are three which are enumerated in Table 3 and for m = 3 there are six enumerated in 103 Table 4.

104
Each of these heterozygotes produces different fractions of the n H possible haplotypes, depending 105 on the number of W alleles in each haplotype. We create an analogous matrix to M (Eqn.8), whose 106 j th column gives the fraction of gametes produced by the j th allele assuming it is in genotype with 107 D: (17) Figure 2: Diagram showing gametogenesis for m = 2 gRNAs with NHEJ for a WW/DD individual, where the model assumes the action of drive is independent on each target site. wild type is the background colour, a white marker indicates the cleavage state, a long red bar represents drive inserted at the target site, a grey marker is an NHEJ event, which are functional (green) or nonfunctional (red).
With this matrix the fraction of gametes that have the i th haplotype from drive heterozygote j 109 can be calculated for any value of m. For haplotypes i < n H (i.e. not including drive) this is given where n W is the number of W alleles in the drive heterozygote j, and n ′ W , n ′ R , n ′ N are the number 112 of W, R, N alleles in haplotype i, and as before the haplotypes k 1 k 2 and k ′ 1 k ′ 2 map to indices using 113 Table 1. For i = n H , the drive haplotype (i.e. DD or DDD), we need to sum over all haplotypes i 114 that produce at least a single D allele (which is effectively converted to, for example, DD or DDD ); 115 genotype heterozygote frequency WWW/DDD H 1 = x 1 y 11 + x 11 y 1 WWR/DDD H 2 = x 2 y 11 + x 11 y 2 WRR/DDD H 3 = x 3 y 11 + x 11 y 3 WWN/DDD H 5 = x 5 y 11 + x 11 y 5 WRN/DDD H 6 = x 6 y 11 + x 11 y 6 WNN/DDD H 8 = x 8 y 11 + x 11 y 8 where on the RHS the relation between i ∈ D j and k ′ ℓ , k ′ ℓ , n ′ W , n ′ R , n ′ N and n ′ D is implicit, and 117 straightforward to enumerate.

118
Together these equations allow us to write down expressions for the update equations for haplo-119 type frequencies between generations for m−fold multiplex 120 where H m is the set of indices of haplotypes that have at least a single W allele, which is a function NHEJ mutations is β = 10 −4 and that all de novo SNPs are functional (ξ = 1). We see in all cases that initially drive replaces the wild type in less than 10 generations, which causes the mean 132 population fitness to decrease and the population size itself to decrease; following this, if N > N * 133 we see that functional resistance mutants arise and then fix giving rise to population rescue, while 134 for N < N * , resistance mutants do not arise on the timescale at which the population is eliminated.

135
As we increase the degree of multiplexing m, the same story unfolds, except there are many more joining mutants arise at a rate proportional to ϵν, and a fraction of β of these we assume to be 157 functional (R), and so the rate of producing functional NHEJ mutants is proportional to ϵνβ. Hence, 158 of r t is the frequency of the R allele in the population in generation t, then the difference equation 159 describing the dynamics is given by where q t is the frequency of the D allele and to a good approximation q(t)(1 − q(t)) is the frequency 161 of the W/D heterozygotes, assuming all other alleles apart from the W and D are at much smaller 162 frequencies initially. Mutants generated earlier will generally have contribute most to the ultimate 163 probability of establishment and population rescue, and so we make the approximation that the 164 frequency of the heterozygotes is roughly q(t) and so which has solution r(t) = r(0) + ϵνβQ(t) = ϵνβQ(t), since we assume at t = 0 when drive 166 is introduced there are no mutants, and where Q(t) = t−1 t ′ =0 q(t ′ ). If we assume there is some 167 characteristic time over which mutants are generated, related to the time to fixation of drive τ , then 168 the number of copies of the mutant generated in this time is τ is therefore 2N r(τ ) = 2N ϵνβQ(τ ).

169
If each mutant has probability of fixation of π = 2s b , where s b is an effective selection coefficient 170 of the benefit the functional resistance allele has in the presence of drive, then given this number 171 of mutants, we calculate the probability of fixation by 1-probability that none of these mutants 172 establishes: where N * n = (4s b ϵνβQ(τ )) −1 , which gives us the result that we expect resistance to arise significantly, 186 r 2 (t + 1) = r 2 (t) + ϵνβr 1 (t)(1 − q(t)) + (ϵνβ) 2 q(t)(1 − q(t)) (26) 187 where r 1 is the frequency of mutants with a single functional resistance allele at either of the two 188 target sites and r 2 is the frequency of mutants that have a functional resistance allele at both target 189 sites. If we make the same assumption that initially the frequency of drive will be small, such that 190 q(1 − q) ≈ q, then the solution for the frequency of single-fold functional resistance mutants is 191 r 1 (t) = 2(1 − ϵ)ϵνβQ(t), and so we have a difference equation for only r 2 : The second term represents the generation of two-fold mutants from single-fold heterozygotes with 193 drive (WR/DD or RW/DD), whilst the third term represents generation directly from wild type-194 drive heterozygotes. We can see that if 1 − ϵ is small then later direct mechanism of generation 195 will be dominant, as long as the cumulative frequency of drive Q(t) is not significantly larger than 196 the current frequency q(t); as Q(t) is a cummulative frequency it will be of order 1/s D , where dominate. If this is the case then our difference equation for double functional mutants is 202 r 2 (t + 1) ≈ (ϵνβ) 2 q(t).
and so comparing to Eqn.22, we have the same difference equation but ϵνβ → (ϵνβ) 2 and so would 203 expect the probability of resistance to be of the form p = 1−e −N/N * n with N * n = (4s b (ϵνβ) 2 Q(τ )) −1 .
where we assume that initially q ≪ 1. The solution is simply r(t) = ξµt. Using the same simple 211 heuristic as for NHEJ, this then leads to the probability of fixation of mutants generated up to some 212 time τ , related to the timescale for fixation of drive, to be:
Now here the second term, representing the sequential generation of mutants, will dominate for any 228 sufficiently number of generations t > 1 and so ignoring the third term which represents concurrent 229 generation of mutants, the solution is r 2 (t) = 2(ξµ) 2 t(t + 1)/2 ∼ (ξµ) 2 t 2 for large t. This means on 230 a time scale τ the number of functional double-mutants generated will be 2N r 2 (τ ) ∼ 2N ×(ξµ) 2 t 2 = 231 2N (ξµ) 2 t 2 . Following the previous arguments, the probability of fixation will then be p ≈ 1−e −N/N * d 232 with N d = (4(ξµ) 2 s b τ 2 ) −1 . Again the arguments can be extended to m = 3 gRNAs and we find 233 N d = (4(ξµ) 2 s b τ 3 ) −1 , to leading order in τ . 234 We can fit the simulation data of the de novo probability of resistance vs N as ξ is varied, using which has shape parameter θ = 4N ξµ and rate α. The mean of this distribution is ⟨r⟩ = θ/α = 252 ξµ/σ, which is the classic mutation-selection balance frequency. Now the probability of fixation given 253 an initial frequency r 0 and beneficial selection coefficient where the last approximation assumes α b = 4N s b ≫ 1. The average probability of fixation assuming 255 a distribution ρ(r 0 ) is then simply calculated by averaging over the initial frequency r 0 , which gives 256 ⟨π⟩ = 1 − Z(α b + α, θ) Z(α, θ) , where Z is the normalisation constant of the gamma distribution with population scaled fitness f :  Fig. (m = 1 panel), which is larger than found from de novo simulations, and likely reflects 264 some mechanistic aspect not reflected in these simple heuristic considerations.

265
For m = 2, we could try to calculate exactly the joint steady-state distribution of the frequency of 266 all mutants, under the sequential mutation scheme WW → {WR, RW} → RR, but this is in general 267 an unsolved problem in population genetics, since there is a net flux or current of probability through 268 the states of the system (detailed balance is not obeyed) and so calculating this joint distribution is not straightforward. Here, instead we appeal to a simple heuristic, which reproduces the scaling 270 with σ seen in the simulation data; we assume the distribution of the double-mutant frequency r 2 271 follows the same gamma distribution, but that mutation to this allele is solely from single mutants 272 whose frequency is assumed fixed at it's equilibrium frequency ⟨r 1 ⟩ = 2ξµ/σ, where the factor 2 273 arises from the multiplicity 2 of paths to single mutants from the wild type; this is reasonable given 274 that sequential mutation rather than concurrent is dominant, as demonstrated for de novo mutation. 275 Given this and that the fitness cost of double mutants, before drive is introduced, is −2σ, the 276 distribution of the double functional resistance mutants is 277 ρ 2 (r 2 ) = α 2 where α 2 = 8N σ, and θ 2 = 4N ξµ⟨r 1 ⟩ = 8N (ξµ) 2 /σ. The average frequency of double func- ∼ σ 2 , which is consistent with the 283 power law scaling we see in the simulations (Fig.3b). In practice, we find that the following form

286
Note that there is an implicit assumption in all these calculations, that establishment must arise 287 before the population is removed, which will be true if ∼ 1/s b is much smaller than the timescale 288 over which the population is removed.