Abstract
Influenza virus has a high mutation rate, and this low replicative fidelity contributes to its capacity for rapid evolution. Clonal sequencing and fluctuation tests have suggested that the mutation rate of influenza A virus is 7.1 × 10−6 − 4.5 × 10−5 substitutions per nucleotide per cell infection cycle and 2.7 × 10−6 − 3.0 × 10−5 substitutions per nucleotide per strand copied (s/n/r). However, sequencing assays are biased toward mutations with minimal impacts on viral fitness and fluctuation tests typically investigate only a subset of the twelve mutational classes. We developed a fluctuation test based on reversion to fluorescence in a set of virally encoded mutant green fluorescent proteins. This method allowed us to measure the rates of selectively neutral mutations representative of all 12 mutational classes in the context of an unstructured RNA. We measured an overall mutation rate of 1.8 × 10−4 s/n/r for PR8 (H1N1) and 2.5 × 10−4 s/n/r for Hong Kong 2014 (H3N2). The replication mode was linear. The mutation rates of these divergent strains are significantly higher than previous estimates and suggest that each replicated genome will have an average of 2-3 mutations. The viral mutational spectrum is heavily biased toward A to G and U to C transitions, resulting in a transition to transversion bias of 2.7 and 3.6 for the two strains. These mutation rates were relatively constant over a range of physiological temperatures. Our high-resolution analysis of influenza virus mutation rates will enable more refined models of its molecular evolution.
Significance The rapid evolution of influenza virus is a major problem in public health. A key factor driving this rapid evolution is the virus’ very high mutation rate. We developed a new method for measuring the rates of all 12 mutational classes in influenza virus, which eliminates some of the biases of existing assays. We find that the influenza virus mutation rate is much higher than previously reported and is consistent across two distinct strains and a range of temperatures. Our data suggest that influenza viruses replicate at their maximally tolerable mutation rates, highlighting both the virus’ evolutionary potential and its significant constraints.
Introduction
The rapid evolution of influenza virus has led to reduced vaccine efficacy, widespread drug resistance, and the annual emergence of novel strains. While complex ecological, environmental, and host demographic factors influence the evolutionary dynamics of influenza virus, the virus’ adaptability is driven in large part by its capacity to generate genetic diversity through mutation and reassortment (1). Like other RNA viruses, influenza virus replicates with extremely low fidelity. The influenza virus RNA-dependent RNA polymerase (RdRp) complex, which includes the viral proteins PB1, PB2, PA, and NP, lacks proofreading and repair activity (2). Its mutation rate has been reported to be approximately 10−5 to 10−6 mutations per nucleotide per cellular infection (3–8).
An accurate accounting of influenza virus’ mutation rate and mutational bias is essential for defining its evolutionary dynamics and for informing control efforts. The mutation rate will determine the probability that a mutation conferring drug resistance, antibody escape, or broadened host range will be generated within a given virus population. It will also define a virus’ sensitivity to drug-induced lethal mutagenesis, a broad-spectrum antiviral strategy that exploits the high mutation rate and low mutational tolerance of many RNA viruses (9, 10). We have shown that the antiviral activity of three different nucleoside analogues is due to increased viral mutation rates, and a new anti-influenza drug, favipiravir, has been found to act through a similar mechanism (11–13). As in other RNA viruses, the mutational bias of the influenza polymerase complex is largely undefined. While viral mutation rates are typically reported as a single measurement, each of the 12 mutational classes will have its own rate and determine the accessibility of various nucleotide and amino acid substitutions. Many RNA viruses appear to exhibit a pronounced bias toward transition mutations (3). Because transitions are more likely to be synonymous than transversions, this bias can impact models of molecular evolution and inferences of natural selection based on dN/dS ratios.
Mutations are typically reported as either frequencies or rates (3, 14). Mutation frequency is the number of mutations identified in a sample per nucleotide sequenced. Frequency measurements therefore quantify not only the rate at which a mutation is generated but also that mutation’s ability to persist in a population. In contrast, mutation rates measure how many mutations are made in a discrete unit of time (e.g. per infection cycle or strand copied) and are a better representation of polymerase error. Viral mutation rates have often been measured by Sanger sequencing of randomly selected clones obtained through plaque purification or limiting dilutions (6–8, 15–17). Mutation frequencies obtained in this manner can be converted to mutation rates by adjusting for the number of replication cycles prior to sampling (3). With these adjustments, sequencing-based estimates of influenza virus mutation rates range from 7.1 × 10−6 to 4.5 × 10−5 substitutions per nucleotide per cell infection cycle. While sequencing approaches can potentially measure the rate of all substitution classes, they lack precision and have poor power for detecting differences across strains or conditions. They are also biased towards sampling of genomes with higher fitness. Next generation sequencing platforms have increased the throughput and power of clonal sequencing, but in many cases, the impact of reverse transcription error in library preparation has not been thoroughly investigated.
A more direct way to measure mutation rates is to use a Luria-Delbrück fluctuation test (18–22). In this method, a large number of parallel cultures are infected with small inocula and assessed for a set of newly generated mutants exhibiting a scoreable phenotype after a period of exponential growth. Because the mutations are rare and random, they follow a Poisson distribution in each culture. Mutation rate estimates from a null class model are robust to the mode of replication, which may vary across viral species (20, 21). Using resistance to monoclonal antibodies as a scoreable phenotype, influenza’s mutation rate has been estimated to be 2.7 × 10−6 to 3.0 × 10−5 substitutions per nucleotide per strand copied (4, 5). While fluctuation tests are more precise than sequencing assays, most scoreable phenotypes sample just a few sites or mutational classes.
Here, we apply two new approaches for measuring the influenza virus mutation rate that overcome the drawbacks of sampling bias and low statistical power inherent to currently available methods. The first relies on measurements of nonsense mutation frequencies within a short segment of the influenza genome using PrimerID, an error-controlled next-generation sequencing approach (23, 24). Because nonsense mutations are lethal and generally not propagated, their frequencies approximate the mutation rate in the prior replication cycle (25). The second is a Luria-Delbrück fluctuation test that scores reversion to fluorescence in virally encoded green fluorescent protein (GFP) mutants (26). The GFP method enabled interrogation of all 12 substitution mutation classes independently and under distinct replication conditions.
Results
The vast majority of premature stop codons in RNA virus open reading frames are lethal and are therefore likely to have been generated during the previous replication cycle (e.g. (27)). Eighteen of the 61 sense codons are a single mutation away from a nonsense codon, and the frequency at which these nonsense mutational targets (NSMT) mutate to stop codons approximates the viral mutation rate. When combined with a highly accurate next generation sequencing approach, the NSMT method can provide rate estimates for eight mutational classes (25, 28, 29). We identified a 402 base fragment within the PA gene of A/Wisconsin/03/2007 H3N2 that contains a balanced distribution of 80 NSMT, and used the PrimerID method to sequence individual PA clones from an influenza population on the Illumina platform. PrimerID sequencing utilizes a library of barcoded reverse transcription primers to generate consensus sequences for each cDNA template, thereby controlling for the PCR or base-calling errors that plague many next generation sequencing datasets (23, 24). The PrimerID method does not control for errors introduced during reverse transcription (RT), and the mutation rates of reverse transcriptases are similar to those of viral RNA dependent RNA polymerases (RdRp) (3).
In an attempt to distinguish RT errors from mutations introduced by the influenza RdRp, we compared PrimerID-NSMT estimates of the mutation rate for influenza virus to a control, in which the segment 3 (PA) RNA was expressed from a plasmid pol I promoter in transfected cells (30). Mutations identified in the viral genome are derived from either the influenza RdRp or RT, and the control establishes the background error rate of the assay due to pol I transcription and RT (Figure 1A). We obtained over 449,000 aligned PA fragment consensus sequences for each sample, representing approximately 75% of starting RNA templates. The frequencies of nonsense mutations were similar for 5 of the 8 mutation classes (Figure 1B, Table S1). The frequencies of the other three mutation classes (A to U, C to A, and U to G) were only slightly higher in the samples derived from RNA replicated by the influenza RdRp than those that were not. The G to A mutation rate was highest in both samples (1.3 × 10−4 and 8.5 × 10−5 substitutions per nucleotide for cell-derived and viral-derived samples, respectively), and analyses of the RT mutational spectrum consistently show this to be the most common mutation, with rates of 1 × 10−4 substitutions per nucleotide (28, 31 –33). These data demonstrate that the background error rate of reverse transcriptase during sample preparation is equal to or higher than the rate of mutations introduced by the influenza RdRp.
We also compared the frequency of nonsense mutations to the frequency of all observed mutations. Mutations in the transfected control were evenly distributed across the PA fragment and no more common than the subset of nonsense mutations (Figure 1C). In contrast, the frequency of mutations in the replicated viral RNA was higher than the subset of nonsense mutations. The accumulation of mutations to frequencies above those of nonsense mutations and the background signal indicate the action of selection on newly generated mutations. Together, these data suggest that the high background error rate of reverse transcriptase and issues of selection bias may confound sequencing-based measurements of the basal mutation rate and mutational bias of riboviruses.
A fluorescence-based fluctuation test
We developed a Luria-Delbrück fluctuation test for influenza virus mutation rates that scores reversion to fluorescence in a set of 12 mutant green fluorescent proteins (GFP). The fluorescent chromophore of enhanced GFP contains three essential amino acids (T65, Y66, and G67) (34), and nonsynonymous substitution at any of these positions results in a GFP with either absent or altered fluorescent properties (35–37). We used a plasmid that contains GFP instead of hemagglutinin on influenza A virus segment 4 (ΔHA-GFP, (38)), to generate a set of 12 recombinant influenza A viruses that each express a mutant GFP protein (Table 1). Each of these mutant GFP proteins has a single nucleotide mutation that, with reversion to fluorescence, will interrogate a specific mutational class introduced by the viral RdRp during replication. These viruses were replicated in cells stably expressing the HA protein in trans.
Because our mutant GFP proteins are not fluorescent, we used anti-GFP antibody staining and immunofluorescence microscopy to verify GFP expression from each of the 12 mutant ΔHA-GFP viruses. In virally infected cultures, we occasionally identified rare cells expressing GFP that was fluorescent at the excitation and emission wavelengths consistent with reversion to fluorescence (Figure 2A). We used antibody staining to titrate the total number of viruses expressing GFP. The growth kinetics of mutant ΔHA-GFP A/Puerto Rico/8/1934 H1N1 viruses were slower than the parental PR8, but similar among the 12 mutants (Figure 2B). In all cases, titers of 1 × 105 per milliliter were achieved by 22 hours at 37˚C. This corresponds to 104 viruses per well of a 96 well plate, which is the maximum that can be accurately measured by fluorescence microscopy. In subsequent experiments, we used antibody staining of infected cells to titrate the total number of viruses expressing GFP – the mutational target – since a subset of viruses will delete the GFP open reading frame during replication.
Fluctuation tests are most accurate when the marker is selectively neutral (20, 21). We measured the replicative fitness of viruses expressing the mutant ΔHA-GFP relative to those expressing the wild type ΔHA-GFP. We competed a subset of the mutant viruses against a wild type ΔHA-GFP virus containing a neutral PB1 sequence barcode and used RT-quantitative PCR to measure the frequency of the competitors over serial passage on MDCK-HA cells (27). Each of 6 mutant ΔHA-GFP viruses, which sample mutations in the 3 mutated amino acid positions, maintained stable frequencies over 4 passages. They were just as fit as the wild type ΔHA-GFP virus (Figure 2B, p>0.05, n=3 replicates, one way ANOVA), confirming that the scoreable phenotype and the mutations interrogated are selectively neutral.
The secondary structure of genomic RNA in positive sense viruses is known to influence mutation rates in a site specific manner (39–42). In influenza virus, the formation of stable RNA structures in the replication complex is limited by the binding activity of the viral nucleoprotein (2). We performed an in silico analysis of the ΔHA-GFP RNA to further exclude the possibility that reversion of mutations could be influenced by local RNA structure (Figure 2D). A sliding window analysis of the minimum free energy of RNA folding suggests that the introduced mutations are located within a region of high minimum free energy in the ΔHA-GFP RNA (43, 44). The rates at which these mutations revert to the wild type sequences are therefore likely to be representative of mutation rates across the influenza virus genome.
We used a Luria-Delbrück fluctuation test to convert reversion frequencies to viral mutation rates (18, 20). For each of the 12 mutant ΔHA-GFP, we inoculated parallel cultures of MDCK-HA cells and transferred replicated virus to MDCK cells in a 96-well imaging plate (Figure 3A). The replication time and transfer volume were empirically determined for each mutation class, drug, temperature, and virus tested. Because the ΔHA-GFP viruses do not express the HA protein, they only replicate in MDCK-HA cells, and there was no viral spread in the imaging plate. We used a null class model to calculate mutation rates based on the number of parallel cultures without a revertant (green fluorescence) and the degree of viral replication (anti-GFP antibody staining of the inocula and replicated virus). The mutation rates we report using this method are in the coding (+) sense of the RNA, rather than the genomic (−) sense.
We validated the specificity of our assay for specific mutational classes by performing a set of fluctuation tests in each of three different mutagenic nucleoside drugs. We and others have previously shown that ribavirin increases the frequency C to U and G to A transitions, 5-azacytidine increases the frequency of C to G and G to C transversions, and 5-fluorouracil increases the frequency of all transitions (A to G, C to U, G to A, and U to C) in influenza virus (11, 12). Each of these mutagens increased the rates of only the expected mutation classes (Figure 3B-D). In some cases, the amount of replicated virus was sufficiently low and the mutations were sufficiently rare that we were not able to obtain replicate fluctuation tests in which the null class (P0) lay within the ideal range of 0.1-0.7 (19, 20). Here and elsewhere, these less precise mutation rate measurements are indicated with open symbols (Figures 3–5).
The mutation rates of influenza A virus
We used our GFP fluctuation test to measure the mutation rates of two evolutionarily divergent influenza viruses. Influenza A/Puerto Rico/8/1934 H1N1 (PR8) was the second influenza virus isolated and was extensively passaged in various cell culture environments prior to cloning (45). We cloned a circulating seasonal influenza virus, influenza A/Hong Kong/4801/2014 H3N2 (Hong Kong 2014), after limited passage in MDCK. The PR8 virus contained the ΔHA-GFP segment with seven PR8 genome segments. The Hong Kong 2014 virus contained the ΔHA-GFP segment, the segments coding for the polymerase complex – PB2, PB1, PA, and NP – from A/Hong Kong/4801/2014 H3N2, and the segments encoding NA, M, and NS from PR8. This chimera was necessary to obtain high titer stocks.
The mutation rates of the PR8 and Hong Kong 2014 viruses were higher than previously reported for influenza A virus and generally biased toward transitions (6–8). In both viruses, mutation rates were highest for the reciprocal transitions, A to G and U to C (Figure 4 and Table S2). The rates for the other two transitions (C to U and G to A) were approximately six fold lower and similar to the rates of the more common transversion mutations. We note that this G to A mutation rate is much lower than the rate estimated using the PrimerID-NSMT assay (see Figure 1). This discrepancy may reflect differences in mutational bias between the influenza RdRp and retroviral reverse transcriptases. The overall rate and spectrum of mutations for the PR8 and Hong Kong 2014 viruses are very similar, and given the base composition of each virus, we estimate that each replicated 13.5 kb genome contains, on average, 2 to 3 mutations. The transition to transversion ratio is 2.7 in PR8 and 3.6 in Hong Kong 2014.
We did identify differences between the two viruses in specific mutation classes. The rate of G to A mutations was two-fold higher in Hong Kong 2014 than in PR8 (7.2 × 10−5 vs. 3.1 × 10−5, p = 0.0018, multiple t-test with Holm-Sidak correction), and the Hong Kong 2014 virus also exhibited a marginally increased rate of G to U mutations that was not statistically significant (6.0 × 10−5 vs. 3.5 × 10−5, p = 0.083). For both viruses the rates of mutations away from A are symmetrical to the reciprocal mutations away from U. Interestingly, mutations away from C were much less common than the reciprocal mutations away from G. In PR8 G nucleotides are 3.8 times more likely to mutate than C nucleotides. In the Hong Kong virus, this difference is 2.7 fold.
Because reversion mutations can be introduced during either plus or minus strand RNA synthesis, our fluctuation test estimates the mutation rate per strand copied. To determine whether influenza virus replicates through a linear or a binary mechanism, we calculated the total number of revertants per number of replicated viruses across our parallel cultures. This method estimates mutations per cellular infection cycle and allows for the possibility that cells with reversions that occur early in a replication cycle will produce multiple viruses that express fluorescent GFP. The ratio of the number of mutations per cell cycle and the number of mutations per strand replicated mutation rates therefore approximates the number of strand copying events that occur prior to viral release (3). In linear, or stamping machine replication, there is only one cycle of strand copying per cellular infection cycle and the ratio is two. In binary replication, multiple rounds of stand copying occur, leading to higher rations. Assuming a single cellular replication cycle occurred during each experiment and using all 12 mutation classes, we determined the average ratio to be 1.2 and 1.1 for the PR8 and Hong Kong strains, respectively. These data suggest that influenza largely replicates through a linear mode.
Influenza virus mutation rates across physiologic temperatures
Biochemical studies of purified influenza virus RdRp suggest that replication temperature can affect enzyme fidelity (46). Influenza viruses replicate over a range of temperatures in nature from 32˚C and 37˚C across the respiratory tract of humans to 39˚C in febrile illness to 41˚C in birds (47–50). We used the PR8 virus encoding mutant ΔHA-GFP representing the 5 most frequent mutational classes to measure mutation rates at different temperatures. This virus replicated reasonably well in MDCK-HA cells at 32°C and 39°C, albeit to lower titers (see Figure 2C). The mutation rates for these 5 classes were generally stable over this 7 degree range of physiological temperatures (Figure 5). We were unable to measure mutation rates at temperatures higher than 39˚C due to host cell intolerance.
Discussion
We developed two new methods to define the mutation rate and mutational bias of H1N1 and H3N2 influenza viruses. We found that the background error rate of reverse transcriptase may confound measurements of influenza virus mutation rates that are based on sequencing of RT-PCR amplified templates. We therefore developed a high throughput GFP based assay to estimate the mutation rates for all 12 substitution classes. This assay can be easily adapted to any virus that tolerates the addition of the GFP open reading frame. While PR8 (H1N1) and Hong Kong 2014 (H3N2) viruses varied in their mutation rates for individual classes, the overall mutation rate was consistent across these evolutionarily divergent influenza polymerases at a range of temperatures. These mutation rates are considerably higher than previously reported, and given the impact of mutational load, suggest that the virus is replicating at the maximally tolerable mutation rates.
Sequencing assays for viral mutation rates are plagued by ascertainment and sampling biases. The mutations that are detected in plaque-derived populations represent only the viable fraction and those identified in passaged supernatants are often heavily biased toward mutations with less deleterious fitness effects. While Sanjuan and colleagues have appropriately adjusted for typical viral mutational fitness effects in sequenced-based estimates of mutation rates (3), these fitness effects may not be uniform across viruses or in the genes analyzed (27, 51). Next generation sequencing can minimize these biases by improving the detection of rarer, more deleterious mutations (28, 39, 40, 52). However, our data from PrimerID-controlled next generation sequencing suggest that reverse transcriptase error can be a significant confounder that needs to be considered in studies of RNA viruses. In our experiments, the high background RT error rate made it difficult to distinguish mutations introduced by the influenza polymerase complex from those generated during reverse transcription of the viral genomic RNA. The mutational bias of RT may also differ from that of viral RNA-dependent RNA polymerases. Guanine to adenine transitions are the most common mutation made by RT (28, 31–33) and are the ones found most frequently in our dataset as well as many others that rely on RT-PCR amplification for sequencing (e.g. (11, 12, 53, 54)). In contrast, our fluctuation test suggests that A to G and U to C mutations are the most common classes in influenza.
Fluctuation tests are sensitive for rare mutational events and avoid many of the issues with sequencing assays (18, 20–22). Our reversion to fluorescence assay has several additional advantages over ones that rely on phenotypic markers such as drug or antibody resistance (26). First, the marker was selectively neutral, as the mutant GFP and revertant wild type GFP viruses had equal fitness. Second, we were able to measure all 12 mutational classes in a format that allowed for sufficient replicates. Third, we were able to control the number of cellular infection cycles by expressing the HA protein in trans (38). Fourth, we used an anti-GFP antibody to measure the number of mutation targets directly. One shortcoming of all fluctuation tests is that genomic mutation rates are extrapolated from data at one specific site. While RNA structures are unlikely to play a major role in mutation rate variability in influenza virus (55), we cannot exclude that sequence context could modulate mutation rates across the genome.
We found that the mutation rates of the lab adapted PR8 H1N1 strain are similar to those of a recently circulating H3N2 strain, and both sets of measurements are considerably higher than those obtained in previous sequence-based studies. While these earlier works estimated rates between 7.1×10−6 and 4.5×10-5 mutations per nucleotide per cell infection, our composite mutation rates were 1.8 − 10−4 and 2.5 − 10−4 mutations per nucleotide per strand replicated for PR8 (H1N1) and Hong Kong/2014 (H3N2), respectively (3, 6–8). Consistent with the biases detailed above, our measurements are closer to those obtained for specific classes obtained in antibody-based fluctuation tests (4, 5). These very high mutation rates mean that each replicated genome has, on average, 2-3 mutations. We have found that 28-31% of randomly selected mutations in influenza virus are lethal (27). Using a 70% probability that a given mutation results in a viable virus, the likelihood of any given genome being able to replicate is only 34 to 49%. We suggest that mutational load accounts for a sizable portion of the 90-99% of genomes in influenza populations that are non-infectious (e.g. (12, 13)). This mutation rate clearly places influenza close to a theoretical maximum rate, and we and others have shown that small increases in the virus’ mutation rate leads to considerable losses in genome infectivity (11–13).
The mode of replication for influenza virus is near linear as the ratios of mutation rates per cell infection to per strand replicated were 1.1 and 1.2 for the strains evaluated. This finding suggests that (−) sense genomic RNA introduced during an infection is copied to (+) sense replication intermediates just once before new (−) sense genomes are synthesized. The fact that our observed ratios are less than 2 – the theoretical minimum for a single stranded RNA virus – may be due to our method of measurement (3). We assessed only a single mutation site in the viral genome for each mutation class. We found that reciprocal mutation classes (e.g. G to U and C to A) do not always occur at the exact same rate. This mutational bias, combined with our measurements of mutation rates at a single nucleotide position, may invalidate this key assumption for determining replication mode and give a ratio less than 2. Additionally, since we allowed approximately one replication cycle and started with very low numbers of infecting virus genomes, our measurements would by strongly biased towards observing mutations generated during synthesis of the (−) sense genomic RNA rather than during synthesis of the (+) sense replication intermediates in a system with a linear replication mode. This linear or “stamping machine” replication mode, may actually assist the virus in tolerating its high mutation rate because it allows fewer opportunities for mutations during each viral replication cycle (56–58).
Cellular replication environments are often hypothesized to influence RNA virus mutation rates, and yet these effects have rarely been documented (22, 33, 42, 59, 60). Here, we found no significant differences in rates of the five most common mutational classes across a seven degree range of physiologic temperatures. Nucleotide pools could potentially influence the observed mutational biases. Intracellular concentrations of nucleotide triphosphates are much higher than those of deoxynucleotides, and cellular pools are typically biased towards ATP and GTP, which have other metabolic functions (61, 62). While it is tempting to speculate that pool bias could lead to the observed asymmetry in mutations away from guanine, this is unlikely to be the case in MDCK cells. The concentrations of all four NTP in MDCK cells are at least ten fold higher the Km of the influenza polymerase for each (46, 62, 63). We can’t exclude that biases in pools could play a role in primary cells where NTPs may be more limiting. However, Combe and Sanjuan found that VSV mutation rates were similar across a range of primary and immortalized cell lines that were cultured under a range of conditions (22). It is also intriguing that A to G and U to C were the most common mutations, as these classes are characteristic of the host enzyme adenosine deaminase acting on RNA (ADAR) (64 –66). As ADAR-editing occurs almost exclusively on double stranded RNA, it is not clear that it would contribute to the mutation rates measured on our presumably unstructured GFP messages.
We expect that our data will lead to improved models of influenza evolution. For example, our estimates of the virus’ transition to transversion bias can inform null models for inference of selection in protein coding genes. The availability of a complete nucleotide substitution matrix will also enable studies of selection on codon usage and dinucleotide content. The nucleotide frequencies of both PR8 (H1N1) and Hong Kong 2014 (H3N2) are far from what would be predicted by the 12 mutation rates. This suggests either that selection is maintaining the virus’ nucleotide content away from the mutational equilibrium or that the virus has not had sufficient time to achieve it. Finally, our measurements for the rate of each mutation class, coupled with recent studies on mutational fitness effects in influenza will also greatly improve our ability to construct more accurate phylogenies.
Methods
Viruses, plasmids, and cells
Madin-Darby canine kidney (MDCK) cells were provided by Arnold S. Monto (University of Michigan School of Public Health) and HEK 293T cells were provided by Raul Andino (University of California, San Francisco). Both cell lines were maintained in Dulbecco’s modified Eagle medium (Gibco 11965) supplemented with 10% fetal bovine serum (Gibco 10437) and 25 mM HEPES (Gibco 15630). Cells were maintained at 37°C and 5% CO2 in a humidified incubator except where indicated.
Influenza A/Puerto Rico/8/1934 H1N1 was obtained from the ATCC (VR-1469). The A/Hong Kong/4801/2014 H3N2 strain was obtained from the Centers for Disease Control and Prevention International Reagent Resource (FR-1483). The A/Wisconsin/03/2007 H3N2 strain was provided by Dr. Arnold S. Monto (University of Michigan School of Public Health). Molecular clones were derived from each of these isolates by reverse transcription polymerase chain reaction (RT-PCR) amplification and insertion of all eight genomic segments into the pHW2000 plasmid (30, 67).
Cells expressing the hemagglutinin (HA) protein of influenza A/Puerto Rico/8/1934 H1N1 (MDCK-HA cells) were generated by co-transfection Madin Darby canine kidney (MDCK) cells with pCABSD, which expresses a gene for Blasticidin S resistance, and pCAGGS-HA, which expresses the influenza A/Puerto Rico/8/1934 H1N1 HA (38). Pools of cells stably expressing HA were selected in growth media containing 5 μg/mL Blasticidin S. These pools were enriched for cells with high HA expression by staining with an anti-HA antibody (1:1000 dilution, Takara c179) and an Alexa 488-conjugated anti-mouse IgG (1:200 dilution, Life Technologies A11001) followed by fluorescence-activated cell sorting on a FACSAria II (BD Biosciences). Cells were sorted three times over the course of 5 passages and >99% of cells in the final population were positive for high level HA expression.
A pPOLI vector encoding eGFP with influenza genomic packaging sequences was kindly provided by Luis Martinez-Sobrido (University of Rochester). This construct, which we call ΔHA-GFP, expresses eGFP flanked by the 78 3’-terminal bases (33 noncoding, 45 coding) and 125 5’-terminal bases (80 coding, 45 noncoding) of segment 4 from influenza A/WSN/33 H1N1. It lacks the HA translation initiation codon (38). Twelve mutant ΔHA-GFP constructs (Table 2) were generated using the QuikChange II site-directed mutagenesis kit (Agilent Technologies 200523) with primers 5’- CTCGTGACCACCCTG<mutant sequence>GTGCAGTGCTTCAGC-3’ and 5’- GCTGAAGCACTGCAC< rev comp mutant sequence>CAGGGTGGTCACGAG-3’, where mutant sequence corresponds to the sequences in Table 1 and rev comp mutant sequence is the reverse complement of each.
A neutral genetic barcode was incorporated into the PB1 segment of A/Puerto Rico/8/1934 H1N1 in the pHW2000 vector by overlap extension PCR using inner primers 5’-GATCACAACTCATTTCCAACGGAAACGGAGGGTGAGAGACAAT-3’ and 5’-ATTGTCTCTCACCCTCCGTTTCCGTTGGAAATGAGTTGTGATC-3’, and outer primers containing BsmB1 sites for cloning into the pHW2000 plasmid.
Recombinant viruses were rescued in 12-well plates after transfection of co-cultures of 2 × 105 293T cells and 1 × 105 MDCK cells with mixtures of pHW2000 plasmids encoding all 8 influenza genome segments (500ng each) using 2 μL of TransIT-LT1 (Mirus 2300) per nanogram of DNA (30). Viruses expressing GFP were rescued in the same manner except that the pPOLI vector encoding ΔHA-GFP or its mutants and pCAGSS-HA were used in place of the pHW2000 plasmid encoding influenza HA, and MDCK-HA cells were used in place of MDCK cells.
PrimerID Sequencing
A custom R script (https://github.com/lauringlab/NGS_mutation_rate_assay) was used to identify the 402 base region in the A/Wisconsin/03/2007 H3N2 genome (positions 865 to 1266 of the PA gene) with the highest concentration of pre-nonsense codons. Total cellular RNA was isolated using Trizol (Life Technologies 15596) from 293T cells 48 hours after transfection with a plasmid expressing A/Wisconsin/03/2007 H3N2 segment 3 (PA). Virus RNA was isolated using Trizol from cell free supernatants of MDCK cells infected with A/Wisconsin/03/2007 H3N2 virus at a multiplicity of infection (MOI) of 0.5 for 24 hours. In both cases, the RNA was treated with DNase I (Roche 04716728001) to remove residual plasmid DNA. The copy number of segment 3 (PA) RNA in each sample was determined by reverse transcription with SuperScript III (Invitrogen 18080051) and primer 5’-AGCAAAAGCAGG-3’ followed by quantitative PCR on a 7500 Fast Real-Time PCR system (Applied Biosystems) with Power SYBR Green PCR Master Mix (Applied Biosystems 4367659) and primers 5’-TCTCCCATTTGTGTGGTTCA-3’ and 5’-TGTGCAGCAATGGACGATTT-3’. A plasmid encoding PA was used to generate a standard curve to relate cycle threshold to copy number. The absence of plasmid DNA containing the PA sequence was confirmed by lack of signal in qPCR of RNA that was not reverse transcribed.
Sequencing libraries were prepared from 2 × 105 copies of segment 3 (PA) RNA using Accuscript high fidelity reverse transcriptase (Agilent Technologies 200820) and primer (5’- CCTACGGGAGGCAGCAGNNNNNNNNNNAATTCCTCCTGATGGATGCT-3’), which binds to bases 842 to 861 of the PA gene (positive strand numbering) and contains a degenerate N10 barcode sequence (1,048,576 unique sequences). Because the RNA copy number was just one-fifth of the total number of barcode sequences, it is unlikely that the same barcode would prime multiple complementary DNA (cDNA) molecules (25). Three separate reverse transcription reactions were performed for RNA harvested from both transfected and infected cells to increase the total number of RNA templates in the experiment. The resulting PrimerID barcoded cDNA was purified using Agencourt AMPure XP beads (Beckman Coulter A63881) to remove residual primers. The purified cDNA was amplified by PCR for 26 cycles (10 seconds at 98°C, 30 seconds at 69°C, and 30 seconds at 72°C) using Phusion high fidelity DNA polymerase (New England Biosciences M0530) and primers 5’- CAAGCAGAAGACGGCATACGAGAT<i7>AGTCAGTCAGTATGGGGCTACGTCCTCTCCAA-3’ and 5’- AATGATACGGCGACCACCGAGATCTACAC<i5>TATGGTAATTGGCCTACGGGAGGCAGCA G-3’, where i5 and i7 are 8 base Illumina indexing sequences. These primers contain the Illumina flow cell adapters at their 5’-ends. Unique index primers were used in the PCR for each of the three RT replicates. Products were gel purified using a GeneJET Gel extraction kit (Thermo Scientific K0691) and replicates were pooled with each product at 1.5 ng/μL. The two pooled sets (one for transfected cells and one for infected cells) were each sequenced on an Illumina MiSeq with 2 × 250 paired end reads, V2 chemistry, and the sequencing primers 5’- TATGGTAATTGGCCTACGGGAGGCAGCAG-3’, 5’- AGTCAGTCAGTATGGGGCTACGTCCTCTCCAA-3’, and 5’- TTGGAGAGGACGTAGCCCCATACTGACTGACT-3’. Each pooled set, one derived from transfection and one from infection, made up half of the DNA input on a separate sequencing run with the remaining DNA being composed of bacterial genome libraries. This allowed for sufficient sequencing diversity at each base. We obtained over 15 million reads from each of the samples.
Consensus sequences that met empirically determined count cutoffs were generated for each PrimerID using Ruby scripts kindly provided by Ronald Swanstrom and colleagues (University of North Carolina). We obtained greater than 449,000 consensus sequences for each of the two samples, suggesting that nearly 75% of the original RNA templates (6 × 105 copies among 3 separate reactions) were sampled. Consensus sequences were aligned to the A/Wisconsin/03/2007 H3N2 PA sequence using Bowtie2 and analyzed using Samtools. A custom Python script was used to determine the base composition at each position (https://github.com/lauringlab/NGS_mutation_rate_assay) and the number of stop codons within each PrimerID consensus sequence. The mutation frequency for each of the eight mutational classes was determined by dividing the number of stop codons resulting from each class by the number of sites sequenced that could possibly mutate to a stop codon through that same class. Raw sequencing fastq files from this experiment are available at the Sequence Read Archive under BioProject accession number PRJNA347826.
Competition assay
Equal quantities (TCID50) of selected mutant ΔHA-GFP viruses were mixed with wild type ΔHA-GFP viruses containing a neutral sequence barcode in the PB1 gene, and used to infect 4 × 105 MDCK-HA cells in a 6-well plate at an MOI of 0.01. At 24 hours post infection, supernatants were harvested and infectious particles were titered by TCID50. The resulting virus was passaged three more times on MDCK-HA cells, maintaining an MOI of 0.01 at each passage. Each viral competition was performed in triplicate. Viral RNA was harvested from the initial mixture and passaged supernatants using a Purelink Pro 96 viral DNA/RNA kit (Invitrogen 12280). Complementary DNA was synthesized using Superscript III and random hexamers. Quantitative PCR was used to determine the relative amount of total PB1 (primers 5’-CAGAAAGGGGAAGATGGACA-3’ and 5’-GTCCACTCGTGTTTGCTGAA-3’), barcoded PB1 (primers 5’–ATTTCCAACGGAAACGGAGGG-3’ and 5’-AAACCCCCTTATTTGCATCC-3’), and non-barcoded PB1 (primers 5’- ATTTCCAACGGAAACGGAGGG-3’ and 5’-AAACCCCCTTATTTGCATCC-3’) in each sample. The relative amounts of barcoded and non-barcoded PB1 at each passage were normalized by subtracting the Ct threshold for the total PB1 primer set from the respective Ct thresholds (e.g. ΔCt = Ctcompetitior – Cttotal PB1). The normalized values at each passage were compared to the initial viral mixture to obtain a relative Ct (ΔΔCt = ΔCtP1 − ΔCtP0). The relative Ct was converted to reflect the fold change in genome copies (Δratio = 2−ΔΔCt). The slope of the differences between the log10Δ ratios of the two viruses as a function of the passage number is equal to the log10 relative fitness of the non-barcoded virus ([log10 Δrationon-barcoded-log10Δratiobarcoded]/passage) (27).
Growth curves
One hundred TCID50 of each mutant ΔHA-GFP virus (in 100μL of media) were used to infect 1.2 × 104 MDCK-HA in a 96-well plate. At two hour intervals between 14 and 26 hours post infection, supernatants from 4 wells were transferred to a black 96-well plate containing 1.5 × 104 MDCK cells and 50μL of viral media. Virus equivalent to the initial inoculum was added to 4 wells so that the virus present at 0 hours post infection could be determined. At 14 hours after supernatant transfer, the cells were fixed, stained and imaged as described below.
RNA minimum free energy
The minimum free energy of the ΔHA-GFP RNA was determined using the RNA sliding window python script that is included with the CodonShuffle package (44) (https://github.com/lauringlab/CodonShuffle).
GFP-based Luria-Delbrück Fluctuation Test
Passage 1 (P1) stocks of ΔHA-GFP viruses were made by passing rescued virus once on MDCK-HA cells at an MOI of 0.01 for 48 hours. For each fluctuation test, 24 or more parallel cultures of MDCK-HA cells were infected with P1 influenza viruses encoding one of the twelve ΔHA-GFP mutants in viral media (Dulbecco’s modified Eagle medium (Gibco 11965) supplemented with 0.187% BSA, 25 mM HEPES, and 2 μg/mL TPCK treated trypsin (Worthington Biochemical 3740)). Depending on the mutation class, these infections were performed in either 96-well plates (1.2 × 104 cells infected with 400 TCID50 of virus in 100 μL), 48-well plates (3.6 × 104 cells infected with 1200 TCID50 of virus in 300 μL), or 24-well plates (7.2 × 104 cells infected with 2400 TCID50 of virus in 600 μL). At 17-30 hours post infection (depending on the mutation class, drug treatment, and assay temperature) supernatants were transferred to black 96-well plates (Perkin Elmer 6005182) containing 1.5 × 104 MDCK target cells and 50 μL of viral media. Supernatants from each well of 48-well and 24-well plates were transferred in 150 μL aliquots to 2 or 4 wells of the black 96-well plate, respectively. In addition to the supernatants derived from the parallel replication cultures, two to four wells were infected with the amount of virus used to seed these cultures (see Ni, below)
At 14 hours post-infection, cells were fixed with 2% formaldehyde for 20 minutes, rinsed with phosphate buffered saline (PBS), and permeabilized with 0.1% triton-X-100 for 8 minutes. Cells were then rinsed again with PBS, incubated at room temperature for one hour in PBS with 2% BSA and 0.1% tween-20 (PBS-T), and stained with 1:5000 Hoechst (Life Technologies 33342) and 1:400 anti-GFP Alexa 647 conjugate (Life Technologies A31852) diluted in 2% BSA in PBS-T for 1 hour. Cells were washed three times with PBS-T, and the plates were sealed with black tape prior to removal of the final wash. Plates were imaged using an ImageXpress Microscope (Molecular Dynamics) using DAPI, Cy5, and FITC specific filter cubes with a 4x magnification lens. Four non-overlapping quadrants were imaged from each well to ensure that the entire surface area was captured, Cellular nuclei and antibody stained cells were counted using MetaXpress version 6 software (Molecular Dynamics). Cells expressing fluorescent GFP were manually counted from the collected images.
Mutation rates were calculated using the null-class model, μ(s/n/r) = −ln(P0)/(Nf-Ni), where μ(s/n/r) is the mutation rate per strand replicated, P0 is the proportion of cultures that do not contain a cell infected by a virus encoding fluorescent GFP, and Nf and Ni are the final and initial viral population sizes, as determined by staining with the anti-GFP antibody, which recognizes both fluorescent and non-fluorescent eGFP (20, 21). Cultures that contained a number of green cells greater than or equal to 0.8 (Nf/Ni) were removed from the calculation because they were likely to have contained a pre-existing fluorescent revertant in the inoculum. These events were extremely rare given the low titer inocula. The null class model is most precise when P0 is between 0.1 and 0.7 (20). Due to the rarity of certain mutation classes and the constraints of the maximum viral population size per culture and per well on the imaging plate, not all of our measurements fell within this range. Measurements where the P0 was above 0.7 are indicated in the graphical representations of our data.
Ribavirin (1-[(2R,3R,4S,5R)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-1H-1,2,4-triazole-3-carboxamide) (Sigma-Aldrich R9644) was dissolved in PBS at 100 mM. 5-azacytidine (4-Amino-1-(β-D-ribofuranosyl)-1,3,5-triazin-2(1H)-one) (Sigma-Aldrich A2385) and 5-Fluorouracil (2,4-Dihydroxy-5-fluoropyrimidine) (Sigma-Aldrich F6627) were dissolved in dimethyl sulfoxide (DMSO) at 100 mM and 384 mM, respectively. For mutation rate measurements in the presence of drug, MDCK-HA cells were pretreated with viral media containing 2.5 μM ribavirin, 0.625 μM 5-azacytidine, or 15 μM 5-fluorouracil for three hours. Mutation rate assays were carried out according to the above protocol except that the viral media for the initial infections contained drugs at the indicated concentrations.
Mutation rate measurements at different temperatures were carried out as above, except that the initial replication was performed in incubators maintained at 32°C or 39°C. The imaging plates were maintained at 37°C for the 14 hours after the supernatant transfer.
Acknowledgments
This work was supported by a Clinician Scientist Development Award from the Doris Duke Charitable Foundation (CSDA 2013105) and R01 AI118886, both to ASL. MDP was supported by the Michigan Predoctoral Training Program in Genetics (T32GM007544). We thank Judy Opp and April Cockburn from the microbial sequencing core of the University of Michigan Host Microbiome Initiative for assistance with next generation sequencing, JT McCrone for assistance with sequence analysis, and Nick Santoro from the University of Michigan Center for Chemical Genomics for assistance with high content imaging and analysis. We thank Robert Woods for helpful discussion.