Fitness Estimation for Viral Variants in the Context of Cellular Coinfection

Animal models are frequently used to characterize the within-host dynamics of emerging zoonotic viruses. More recent studies have also deep-sequenced longitudinal viral samples originating from experimental challenges to gain a better understanding of how these viruses may evolve in vivo and between transmission events. These studies have often identified nucleotide variants that can replicate more efficiently within hosts and also transmit more effectively between hosts. Quantifying the degree to which a mutation impacts viral fitness within a host can improve identification of variants that are of particular epidemiological concern and our ability to anticipate viral adaptation at the population level. While methods have been developed to quantify the fitness effects of mutations using observed changes in allele frequencies over the course of a host’s infection, none of the existing methods account for the possibility of cellular coinfection. Here, we develop mathematical models to project variant allele frequency changes in the context of cellular coinfection and, further, integrate these models with statistical inference approaches to demonstrate how variant fitness can be estimated alongside cellular multiplicity of infection. We apply our approaches to empirical longitudinally sampled H5N1 sequence data from ferrets. Our results indicate that previous studies may have significantly underestimated the within-host fitness advantage of viral variants. These findings underscore the importance of considering the process of cellular coinfection when studying within-host viral evolutionary dynamics.


Introduction
H5N1 experimental challenge study performed using the ferret animal model. Our findings indicate that the 66 fitess effect of this mutation is considerably higher than previously estimated and that cellular coinfection 67 precipitously slowed down the rate of within-host influenza virus adaptation. Several studies to date have used longitudinal allele frequency data to estimate the relative fitness of a mutant 71 allele over a wild-type allele within an infected host or from passage studies [9,19,20]. None of these models, 72 however, account for the impact that cellular coinfection can have on variant allele frequency changes over 73 time. To accommodate cellular coinfection, we first start with an evolutionary model that projects allele 74 frequencies from one viral generation to the next in the absence of coinfection: 75 q m (t g+1 ) = q m (t g ) e σm q m (t g ) e σm + (1 − q m (t g )) e σw (1) where q m (t g ) is the frequency of the variant (mutant) allele in viral generation g, σ m (with range −∞ to ∞) 76 is the selective advantage/disadvantage of the focal mutation, and e σm (with range ≥ 0) is the relative fitness haplotypes and further incorporates de novo mutation in its projection of allele frequencies. Here, we ignore de 80 novo mutation over the course of infection and limit our analysis to two viral haplotypes: a wild-type viral 81 genotype and a variant genotype carrying a mutant allele at a single locus. We adopt these simplifications to 82 focus attention on the effect of cellular coinfection in within-host evolution. 83 To extend this initial model to allow for the effect of cellular coinfection, we first assume that viral Under the assumption that viral protein products within cells have additive effects, the fitness of a viral genome present in a cell carrying k variant viral genomes and l wild-type viral genomes is given by: F (k, l) = k k + l e σm + l k + l e σw Note that this fitness does not depend on whether the focal genome is a variant viral genome or a wild-type 90 viral genomes, since all viral genomes within a cell share their protein products and thus have the same fitness.

91
The realized mean fitness of a viral variant in the context of cellular coinfection is calculated by taking a fitness average of the viral variant across its cellular contexts: Similarly, and the realized mean fitness of the wild-type virus in the context of cellular coinfection is given by: Examination of these equations indicates that the realized mean fitness of the viral variant and of the wild-type 93 virus approach e σm and e σw , respectively, as cellular MOI becomes small, as expected. As cellular MOI becomes 94 large, e σm and e σw converge in their values, as expected.

95
Variant allele frequency changes in the context of cellular coinfection can then be projected using a 96 modified version of Eqn. 1, where realized mean fitnesses replace individual-level viral fitnesses: below, we use N rather than N e to denote the effective viral population size. With a variant allele frequency of 106 q m (t g ) in generation t g , the variant's effective population size is given by: and the effective population size of the wild-type virus in generation t g is given by: Defining the number of target cells as C, the mean cellular multiplicity of infection is given by N /C, the mean given by the variant population size N m . We similarly stochastically determine the distribution of wild-type viruses across target cells by using a multinomial distribution with the event probability of being in a cell given 115 by 1/C (for all C cells) and the number of trials given by the wild-type viral population size N w . The mean 116 fitness of a viral variant in the context of cellular coinfection can then be calculated in a manner similar to the 117 one specified in Eqn. 4. With F (k i , l i ) as the fitness of a viral genome present in cell i with k i variant viral 118 genomes and l i wild-type viral genomes, the mean fitness of a viral variant is obtained by considering the 119 stochastically-realized viral content in each cell: Similarly, the mean fitness of the wild-type virus is given by: We then use Eqn. 6 to project the frequency of the viral variant in the next generation. Calling this projected 122 frequency p m (t g+1 ), we generate a stochastic realization of this frequency by letting the variant effective 123 population size N m be drawn from a binomial distribution with N trials and a probability of success of 124 p m (t g+1 ). The realized frequency of the viral variant, q m (t g+1 ), in generation t g+1 is then given by N m /N . 126 We simulated the models described above to ascertain the effect of cellular coinfection on variant allele frequency changes at various levels of coinfection. We also simulated mock datasets and used them to test the statistical inference methods described in detail below. We simulated one mock dataset using the deterministic within-host evolution model, with observed variant allele frequencies including measurement noise. To implement measurement noise, we let the observed variant allele frequency in generation t g , q m o (t g ), be drawn from a beta distribution with shape parameter α = νq m (t g ) and shape parameter β = ν(1 − q m (t g )):

Simulated data
where ν quantifies the degree of measurement noise. The parameter ν is constrained to be positive, with higher 127 values corresponding to less measurement noise. We simulated a second mock dataset using the stochastic 128 within-host model, similarly assuming beta-distributed measurement noise. loci, we thus decided to exclude day 5 from our analysis to be able to focus more specifically on estimating the 143 fitness of G788A in the context of cellular coinfection. and N as given. Particle MCMC is a Bayesian inference approach that combines particle filtering with MCMC 156 to estimate parameters of stochastic state-space models and to reconstruct unobserved state variables. This 157 statistical inference method is increasingly used in the infectious disease modeling community [25, 26] but as 158 of yet has not been applied to within-host viral models.

159
For both the deterministic and stochastic within-host models, let P (q m o (t g )) be the probability of 160 observing a variant allele frequency of q m o in generation t g . This probability is given by the beta probability 161 density function, with shape parameters νq m sim (t g ) and is the model-simulated allele frequency in generation t g . This simulated variant allele frequency depends 163 on parameters e σm , M , and q m (t 0 ), and for the stochastic model also N . For the deterministic model, the 164 likelihood of the model is then given by: where g indexes the generation times of all the measured variant allele frequency data points. For the stochastic 166 model, P (q m o (t g )) is used to calculate the particle weights in the pMCMC algorithm.

167
Statistical inference code was implemented using Python 3.7.4 and Matlab R2020A and is available 168 from https://github.com/koellelab/withinhost_fitnessInference.   183 We first aimed to determine if longitudinal allele frequency data could be used to infer variant fitness in 184 the context of cellular coinfection under the assumption of deterministic within-host evolutionary dynamics. 185 We therefore first generated a mock dataset by forward simulating the deterministic model and adding 186 measurement noise ( Fig. 2A). Prior to applying the MCMC methods described above to this mock dataset,  204 We now apply the same MCMC approaches to experimental data from an influenza A subtype H5N1 challenge 205 study performed in ferrets. Figure 3A shows the frequencies of the G788A variant that was present in the 206 inoculum stock at a frequency of 4.40% and increased in all four of the experimentally infected ferrets. For 207 the reasons provided above, we used only days 1 and 3 for estimation of variant fitness. We also used the 208 measured stock frequency of 4.40% as the day 0 data point for all ferrets. In fitting our model to these data, 209 we first converted days post inoculation to viral generations by assuming an 8 hour influenza virus generation 210 time. Replicate samples for this experiment were not available, so we set the degree of measurement noise ν to 100, but consider the sensitivity of our results to this value (see below). We used an informative prior on 212 the mean cellular MOI, specifically a lognormal prior with a mean of log(4) and a standard deviation of 0.4. 213 We used this prior based on studies that indicate that 3-4 virions are generally required to yield progeny virus 214 from an infected cell [11]. We ran the MCMC chain for 20,000 iterations ( Figure S2). Posterior distributions 215 for mean cellular MOI and variant fitness are shown in Figures 3B and C, respectively. The joint density plot of 216 MOI and variant fitness (Fig. 3D) indicates that there is a positive correlation between these two parameters, 217 consistent with our findings on simulated data (Fig. 2B). Posterior distributions for the initial frequencies of 218 the variant in each ferret are shown in Figure S3.

219
The results shown in Fig. 3B indicate that cellular MOI is relatively high, although the informative In Figure 4, we further plot model simulations that assume no cellular coinfection. Specifically, we

Statistical inference with experimental H5N1 challenge study data
We now apply the same pMCMC approaches to H5N1 experimental data analyzed already using the determin- impact on within-host viral dynamics than a variant with a smaller selective advantage. We might, for example, 331 expect a variant with a higher selective advantage to result in higher peak viral loads and potentially longer 332 durations of infection. This would impact both symptom development as well as onward transmission potential.

333
Our models, like all models, make some simplifying assumptions. First, we assume low viral diversity, 334 with diversity comprising just one locus and two alleles (a wild-type and a variant allele). We have chosen to genetic linkage between loci can be considered, and epistatic interactions between loci can also be inferred.

344
Our models, as presented here, however, could still be applied to higher diversity viral systems if recombination 345 occurred freely between loci, as may be the case between influenza gene segments or some viruses with high 346 recombination rates.

347
A second assumption present in the current formulation of our models is that viral fitness is additive: low MOI), followed by a greater degree of phenotypic hiding later on in the infection (due to higher MOI).

373
To accommodate these changes in MOI, the structures of the within-host models presented here would not 374 need to be significantly altered; MOI could simply be made into a time-varying parameter. For simplicity, we 375 here instead decided to assume that MOI is fixed over the course of infection, in part because of the lack of 376 empirical data to inform MOI at multiple time points over the course of an infection. A further argument 377 against incorporating dynamic changes in MOI is that spatially-structured within-host viral dynamics, such 378 as those characterized for influenza [32], may result in cellular MOIs that are more uniform over time than 379 expected from a spatially unstructured setting.

380
Despite these limiting assumptions, a general takeaway from the evolutionary models presented here 381 is that cellular coinfection will slow down the rate of viral adaptation within hosts. This is good news from 382 the perspective of the host population, as this will also slow down viral adaptation at the population-level.

383
This finding has clear implications for emerging zoonotic viruses that are adapting to a new host population.

384
Analogously, cellular coinfection will result in less effective purging of deleterious mutations. By making 385 natural selection a weaker evolutionary force, cellular coinfection may thus be one reason why stochastic 386 processes appear to dominate within-host viral dynamics and why selection does not seem to act efficiently 387 over the course of an acute infection for viruses such as seasonal influenza [21,33]. A second takeaway is that 388 variants whose fitness levels (relative to wild-type) have been quantified using models that do not include  The authors declare no conflict of interest.