Abstract
Influenza A viruses (IAVs) are segmented single-stranded negative sense RNA viruses that constitute a major threat to human health. The IAV genome consists of eight RNA segments contained in separate viral ribonucleoprotein complexes (vRNPs) that are packaged together into a single virus particle1,2. While IAVs are generally considered to have an unstructured single-stranded genome, it has also been suggested that secondary RNA structures are required for selective packaging of the eight vRNPs into each virus particle3,4. Here, we employ high-throughput sequencing approaches to map both the intra and intersegment RNA interactions inside influenza virions. Our data demonstrate that a redundant network of RNA-RNA interactions is required for vRNP packaging and virus growth. Furthermore, the data demonstrate that IAVs have a much more structured genome than previously thought and the redundancy of RNA interactions between the different vRNPs explains how IAVs maintain the potential for reassortment between different strains, while also retaining packaging selectivity. Our study establishes a framework towards further work into IAV RNA structure and vRNP packaging, which will lead to better models for predicting the emergence of new pandemic influenza strains and will facilitate the development of antivirals specifically targeting genome assembly.
Influenza A viruses cause seasonal epidemics as well as occasional pandemics. The segmented nature of the IAV genome allows reassortment of viral genome segments between established human influenza viruses and influenza viruses harboured in the animal reservoir5. This can lead to emergence of novel influenza strains, against which there is little pre-existing immunity in the human population6. However, due to a lack of understanding of the molecular mechanisms governing the packaging of the eight genome segments into a single virion, it remains unclear which influenza virus strains have the potential to form reassortants. In virions, as well as in infected cells, the RNA genome segments are assembled into viral ribonucleoprotein (vRNP) complexes in which the termini of the viral RNA (vRNA) associate with the viral RNA-dependent RNA polymerase while the rest of the vRNA is bound by oligomeric nucleoprotein (NP)2,7. Although cryo-EM studies revealed the overall architecture and organisation of vRNPs, the resolution of currently available structures is not sufficiently high to provide information about the conformation of the vRNA8,9. It is thought that, through specific RNA-RNA interactions, exposed regions of vRNA in the vRNP mediate segment-specific interactions during virion assembly, ensuring that the correct set of eight vRNPs is selected3,4. However, the identities of these interacting regions as well as the overall structure of vRNA in vRNPs are currently unknown.
To better understand the vRNA structure in the context of vRNPs, we employed SHAPE-MaP (Selective 2’-Hydroxyl Acylation Analysed by Primer Extension and Mutational Profiling), which probes the conformational flexibility of each vRNA nucleotide both ex virio and in virio (Extended Data Fig. 1)10,11. For the ex virio experiments, the eight vRNA segments were individually transcribed from plasmid DNA using T7 RNA polymerase (ivtRNA) or “naked” vRNA was purified from deproteinated influenza A/WSN/1933 (H1N1) (WSN) particles (nkvRNA). For the in virio experiments, vRNA was probed in the context of vRNPs directly inside purified virions (Extended Data Fig. 1a).
Three biological replicates of the in virio SHAPE-MaP experiments resulted in highly reproducible SHAPE reactivity profiles suggesting that there is an underlying structure of vRNA in the context of vRNP (Extended Data Fig. 2a). We found that the eight different vRNPs in virio have unique vRNA conformations, as demonstrated by the different SHAPE reactivity profiles (Fig. 1a). Regions of extensive low SHAPE reactivities in virio (where the median-SHAPE value is below zero, Fig. 1a) indicate that the vRNA in the context of vRNPs is capable of accommodating secondary RNA structures with extensive base-pairing. However, comparison of in virio and ex virio SHAPE profiles shows a significant shift in the distribution of SHAPE reactivities in virio, consistent with an overall more open (less structured) RNA conformation (Fig. 1b). In addition, the vRNA forms fewer high-probability secondary RNA structures in virio, suggesting that the binding of NP remodels and partially melts secondary structures in vRNPs (Fig. 1c), in agreement with early studies using enzymatic and chemical probing of naked and NP-bound short RNAs12. Regions of high correlation between ex virio and in virio SHAPE profiles reveal regions of vRNA which are capable of local secondary structure formation even when NP is bound. Our data also recapitulate the RNA structures that have been identified previously using computational methods13,14 (Fig. 1d-e; Extended Data Fig. 2b; Extended Data Fig. 3-4).
The finding that secondary RNA structures are accommodated in vRNPs suggests that NP is not distributed on the vRNA uniformly, in agreement with a recent study of NP association with vRNA15. This raises the possibility that parts of the vRNA could be exposed and accessible to form intermolecular RNA-RNA interactions. Indeed, it has been previously proposed that the specific packaging of the eight different vRNPs into the budding virus is mediated by selective RNA-RNA interactions forming between the vRNPs16-21. We therefore proceeded to analyse the intermolecular RNA interactions occurring in virio using SPLASH (Sequencing of Psoralen Crosslinked, Ligated, and Selected Hybrids)22. SPLASH uses a reversible intercalating reagent, psoralen, which crosslinks base-paired RNAs and allows mapping and identification of crosslinked regions using high-throughput sequencing. We preformed two biological replicates of SPLASH analysis using purified virions and focussed our analysis on the most prevalent RNA interactions found in both experiments (Extended Data Fig. 5). 84% of interactions were observed in both replicates.
Our analysis shows that the distribution of intermolecular interaction sites varies between the eight different vRNA segments and interactions sites are not restricted to certain regions, e.g. vRNA termini (Fig. 2a). Most segments can interact with multiple other segments and in some cases the same region can mediate interactions with multiple segments. While it is unlikely that the same region would interact with multiple other segments in the same virion, this finding suggests that certain loci in the vRNA are more likely to be involved in intermolecular RNA base-pairing than others. These data also suggest that there is a level of redundancy in the intermolecular interactions, allowing multiple RNA conformations to be packaged in virio. Notably, when we used an intermolecular RNA interaction algorithm to compute the intermolecular free energy of the RNA-RNA interaction sites identified by SPLASH, we saw a significant enrichment in low energy (highly favourable interactions) compared to random permutations of the same interaction dataset (Fig. 2b-c). Furthermore, SHAPE-informed intramolecular RNA interaction prediction shows that the identified interactions are compatible with the in virio SHAPE reactivity profiles we have determined previously and indicates that the low free energy of the interactions is maintained (Extended Data Fig. 6).
Previous studies suggested that the eight vRNA segments are assembled in a hierarchal manner, with some segments being more critical than others1. Specifically, it was found that the NA and NS segments are the most easily exchangeable between viral strains, suggesting that they make the least contribution to the hierarchal assembly of the eight vRNPs23-25. In agreement, we observe that NA and NS segments form the fewest interactions with other segments. Surprisingly, we also identify very few interactions between the NA and HA segments (0.13% of the mapped reads), suggesting that influenza viruses maintain the greatest possible antigenic diversity by limiting the interactions between these two segments during genome packaging.
The sequence of the IAV genome undergoes changes as a result of the antigenic drift and shift5. Given that the intermolecular RNA interactions we have identified may be important for reassortment between the different IAV strains, we questioned whether the same interactions could occur in other IAV strains as well. We selected a set of IAV strains representing the pandemic strains of the last century (A/Brevig Mission/1/1918 (H1N1), A/Singapore/1/57 (H2N2), A/Hong Kong/1/68 (H3N2), and A/England/195/2009(H1N1)), and analyzed their potential to form intermolecular RNA-RNA interactions in the same regions we identified in WSN. Though not all of the interactions can form in the different strains, the regions corresponding to those we identified in WSN are more likely to be involved in the intermolecular base-pairing than permutated datasets (Fig. 3a), and a number of extensive interactions are maintained in the different viral strains (Fig. 3b).
To address the biological role of the identified in virio RNA-RNA interactions we used synonymous mutagenesis to disrupt RNA interactions while preserving the encoded amino acid sequence. We find that mutant viruses with decreased strength of intermolecular RNA-RNA interactions have significant differences in the ratios of the different segments packaged into the virions compared to both the wild type virus and to a control virus with mutations which do not interfere with identified intermolecular RNA-RNA interactions (Fig. 4a). In addition, weakening of intermolecular RNA-RNA interactions leads to the production of defective viral particles and changes in the kinetics of virus growth (Fig. 4b-c).
Overall, we present the first global map of the IAV genome structure in virio. We show that the IAV genome contains both intramolecular and intermolecular RNA structures. Importantly, our study shows that in virio, IAV maintains an extensive set of RNA-mediated interactions between vRNPs, which is important for the packaging of the viral genome. Maintaining a redundant inter-vRNP interaction network to facilitate the selective packaging of the different genomic segments (vs. a limited set of interactions), could be a strategy to balance the need for selective packaging with the ability to allow reassortment to occur. Our analysis suggests that some, though not all, of the identified intermolecular interactions are present in evolutionarily distant IAVs. A redundant inter-vRNP interaction network could allow multiple pathways towards assembling the full set of eight vRNPs; having multiple pathways is potentially important for the emergence of novel influenza virus strains through reassortment of genome segments of two evolutionarily distant viruses.
This evolutionary flexibility of genome assembly is also apparent in the location of specific intermolecular RNA-RNA interaction regions. A number of these overlap with regions of the genome that are evolutionarily less constrained with respect to their protein-coding capacity. For example, highly prevalent interactions of the NA and NS segments fall into regions encoding the NA stalk and the linker between the N-terminal RNA-binding domain and the C-terminal effector domain of NS1, respectively (Extended Data Fig. 7). Furthermore, a prominent interaction hotspot in the PA segment, involved in interactions with multiple other segments, lies immediately downstream of the overlapping PA-X open reading frame (ORF) in a region that encodes the linker between the N-terminal endonuclease and C-terminal domain of PA. These observations suggest that the positioning of some of the intersegment RNA interaction sites may be constrained by the balance between the constantly drifting RNA sequences and maintenance of the encoded amino acids.
In virions, the vRNPs are organized in a “7+1” pattern with seven segments of different lengths surrounding a central segment17,26-28. This model has led to the suggestion that the central segment may act as a ‘master segment’ mediating the selection of the other segments. We note that our analysis shows that the PA segment is capable of forming multiple strong intermolecular interactions, with the 5’ end and the 1400-1500 nt region involved in multiple redundant interactions with many other segments.
We anticipate that further studies of IAV genome structure and RNA-RNA interaction networks will lead to an improved ability to predict the potential for reassortment between different IAV strains and consequently will facilitate the prediction of the emergence of new pandemic influenza strains. Furthermore, such studies may guide the design of new antivirals targeting the assembly of the eight vRNPs and blocking virus packaging.
Contributions
B.D. Designed and preformed the experiments, analysed the data and wrote the paper; E.B. made mutated viruses. E.F. supervised virus work and edited the manuscript; A.L. supervised SHAPE data analysis and RNA modelling. D.V.L.B. designed SPLASH experiments, analysed data, supervised sequencing work and edited the manuscript.
Author Information
Sequencing data have been deposited to Sequence Read Archive (accession SRP127020 & SRP126994) and the processed SHAPE reactivities and SPLASH data are available in SNRNASM format as a supplemental table 1. The authors declare no competing financial interests. Correspondence and requests for materials should be addressed to E.F. (ervin.fodor{at}path.ox.ac.uk) and D.L.V.B. (david.bauer{at}path.ox.ac.uk).
Methods
Cell culture, virus growth and purification
Madin-Darby Bovine Kidney (MDBK) epithelial cells were grown in Minimum Essential Medium (MEM; Merck), supplemented with 2 mM L-glutamine and 10% fetal calf serum. Human embryonic kidney 293T (HEK 293T) cells were maintained in Dulbecco’s Modified Eagle Medium (DMEM; Merck). Viral stocks were produced by infecting MDBK cells with influenza A/WSN/33 (WSN) (H1N1) virus at an MOI of 0.001. Virus was harvested 2 days post infection. Virus stocks were purified by ultracentrifugation: firstly, the infected cell culture medium was clarified by centrifugation at 4000 rpm for 10 min at 4°C followed by centrifugation at 10,000 rpm for 15 min at 4°C. The virus was then purified by centrifugation through a 30% sucrose cushion at 25,000 rpm for 90 min at 4°C in a SW32 rotor (Beckman Coulter). The purified virus pellet was resuspended in a resuspension buffer (0.01 M Tris-HCl (pH 7.4), 0.1 M NaCl, 0.0001 M EDTA). Viruses containing synonymous mutations were produced using the 12-plasmid rescue system as described previously 29. The primers used to generate mutations are provided in Extended Data Table 1. Virus growth curves were generated by infecting a 70% confluent MDBK cell layer with viral stocks at an MOI of 0.001. Supernatant from infected cells was collected at 24, 48 and 72 h post infection. The infectious virus titres were determined by plaque assay. The significance of the differences between mutated and wild type virus growth kinetics was assessed using ANOVA with Dunnett’s Multiple Comparison Test on GraphPad Prism software. Haemagglutination assays were carried out by serially-diluting the supernatants from infected cells in phosphate-buffered saline (PBS) in a 96-well plate. An equal volume of 0.5% chicken blood was added to the serial dilutions and the plates were incubated at 4°C until hemagglutination was observed.
Selective 2’-hydroxyl acylation analysed by primer extension and mutational profiling (SHAPE-MaP)
1-methyl-7-nitroisatoic anhydride (1M7) was custom synthesised from 4-nitroisatoic anhydride as described previously30. For the ivtRNA experiments each vRNA segment was synthesised from a linear DNA template using the HiScribe™ T7 High Yield RNA Synthesis Kit (NEB). The products were checked for size and purity on a 3.5% PAGE-urea gel. nkvRNA samples were prepared by purifying the WSN particles over sucrose cushion as described above. Purified viruses were treated with 250 μg/mL of Proteinase K (Roche) in PK buffer (10 mM Tris-HCl (pH 7.0), 100 mM NaCl, 1 mM EDTA, 0.5% SDS) for 40 min at 37°C. Before the modification ivtRNA and nkvRNA samples were folded at 37°C for 30 min in folding buffer (100 mM Hepes-NaOH (pH 8.0), 100 mM NaCl, 10 mM MgCl2. 1M7 (dissolved in anhydrous DMSO (Merck)) was added to a final concentration of 10 mM to the folded RNA and the samples were incubated for 75 s at 37°C. The in virio modifications were performed by adding 1M7 directly to the purified virus stocks. The ability of SHAPE reagents to penetrate viral particles was initially tested as described previously 31 by preforming 32P-labelled primer extensions on RNA extracted from SHAPE reagent-treated viral stocks using an NA segment targeting primer (5’-AATTGGTTCCAAAGGAGACG-3’). In parallel to the 1M7-treated samples, control samples were treated with DMSO. RNA extracted from purified viral particles or denatured T7 RNA polymerase transcribed RNA was used for denatured controls (DC). To prepare DC samples the RNA was mixed in DC buffer (50 mM Hepes-NaOH (pH 8.0), 4 mM EDTA) with 55 % formamide and incubated at 95°C for 1 min. 1M7 was then added to 10 μM and the samples were incubated at 95°C for an additional 1 min. N-methylisatoic anhydride (NMIA, Thermo Fisher) SHAPE reagent was also tested in virio. Experiments with NMIA were preformed as described above for 1M7, except the purified virions were treated with NMIA for 45 min.
Sequencing library preparation was done as described previously11 following the randomer workflow. In brief, after 1M7 or control treatments, RNA was cleaned up using the RNA Clean & Concentrator™-5 kit (Zymo Research). The RNA was reverse transcribed using Random Primer Mix (NEB) with Superscript II in MaP buffer (50 mM Tris-HCl (pH 8.0), 75 mM KCl, 6 mM MnCl2, 10 mM DTT and 0.5 mM dNTPs). Nextera XT DNA Library Prep Kit (Illumina) was used to prepare the DNA libraries. Final PCR amplification products were size selected using Agencourt AMPure XP beads (Beckman Coulter) and quality assessed using the Agilent DNA 1000 kit on a Bioanalyser 2100 instrument (Agilent). The libraries were sequenced (2x150bp) on a HiSeq4000 instrument (Illumina).
Sequencing of psoralen crosslinked, ligated, and selected hybrids (SPLASH)
SPLASH samples were prepared as published previously22,32 with some modifications. Purified virus stocks were incubated with 200 μM of EZ-Link™ Psoralen-PEG3-Biotin (Thermo Fisher) and 0.01% digitonin (Merck) for 5min at 37°C. The viruses were spread on a 6-well dish, covered with a glass plate, placed on ice, and irradiated for 45 min using a UVP Ultra Violet Product™ Handheld UV Lamp (Fisher). Cross-linked virus stock was treated with Proteinase K (Merck) and the viral RNA (vRNA) was extracted using TRIzol (Invitrogen). An aliquot of extracted vRNA was used to detect biotin incorporation using chemiluminescent nucleic acid detection module kit (Thermo Fisher) on Hybond-N+ nylon membrane (GE Healthcare Life Science). The rest of the extracted vRNA was fragmented using NEBNext® Magnesium RNA Fragmentation Module (NEB), and size selected for fragments below 200nt using RNA Clean & Concentrator™-5 (Zymo Research). The samples were enriched for biotinylated vRNA using Dynabeads MyOne Streptavidin C1 beads (Life Technologies) and on-bead proximity ligation and psoralen crosslink reversal were carried out as published previously. Sequencing libraries were prepared using adaptor ligation as described22,32 for the first SPLASH experiment, and using the commercial SMARTer smRNA-Seq Kit (Clontech) for the second SPLASH experiment. Final size selection was done by running the PCR-amplified sequencing libraries on a 6% PAGE gel (Thermo Thermo Fisher Fisher) in TBE and selecting for 200-300bp DNA. Libraries were sequenced either 1x or 2x150bp on a NextSeq 500 instrument (Illumina).
Processing of SHAPE-MaP sequencing reads
The sequencing reads were trimmed to remove adaptors using Skewer33. The SHAPE reactivity profiles were generated using the published ShapeMapper pipeline11, which aligns the reads to the reference genome using Bowtie 2 and calculates mutation rates at each nucleotide position. The mutation rates are then converted to the SHAPE reactivity values defined as: , where mutr1M7 is the nucleotide mutation rate in 1M7 treated sample, mutr1M7 is the mutation rate in the DMSO treated sample and mutrDC is the mutation rate in the denatured control. All SHAPE reactivities are normalised to an approximate 0-2 scale by dividing the SHAPE reactivity values by the average reactivity of the 10% most highly reactive nucleotides after excluding outliers (defined as nucleotides with reactivity values that are greater than 1.5x the interquartile range).
Processing of SPLASH sequencing reads
The sequencing reads were first deduplicated using clumpify.sh (BBMap package; https://sourceforge.net/projects/bbmap/) and adaptors were trimmed using Cutadapt34. STAR35 was used to align the reads to the WSN viral reference genome. Only the chimeric reads in which at least 30 nucleotides aligned to the reference segments were used in further processing (STAR parameter –chimSegmentMin 30). CIGAR strings in each read alignment were processed to find the read start and end coordinates. The reads aligning to the same partner segments and overlapping positions in the first and second SPLASH experiment were combined. Overlapping reads between the same partner segments in the final read set were merged and expanded to cover the total read window. The start and end coordinates for all interaction sites were defined as the 5’ and 3’ terminal positions of the expanded read site. The set of interactions was visualised using Circos36. Final chimeric read set is provided in Supplementary table 2.
RNA structure predictions
The IntaRNA (v2.0.4) algorithm37 with the minimum seed requirement of 4 bp was used to predict the ability of RNA-RNA interactions to occur in the regions identified during the SPLASH analysis. Permutated data sets were generated by randomly shuffling the specific interaction partners identified by SPLASH and assessing the interaction ΔG energies using IntaRNA. The significance of the difference between the probability distributions of the ΔG energies associated with the SPLASH-identified intermolecular RNA interactions versus the permutated datasets was calculated using Wilcox Rank-Sum Test in R software. The IntaRNA structure predictions were then used to trim the interaction regions to the nucleotides involved in the base-pairing. For SHAPE-informed RNA-RNA interaction predictions RNAcofold from ViennaRNA package (v2.4.1) was used38. For intramolecular RNA structure predictions RNAStructure package (v6.0) was employed 39 using the Fold and Partition commands to predict secondary RNA structures and partition functions for each segment, respectively. SHAPE reactivities were included as pseudoenergy restraints. A 50 nt sliding median window correlation analysis between the ex virio and in virio SHAPE reactivity profiles was used to determine the extent of SHAPE correlation between the T7-transcribed RNA and vRNP-associated RNA. We found that no correlation existed >150 nt, and therefore set the maximum pairing distance constraint for structure and partition function predictions to 150 nt. For the intramolecular structure predictions we set the nucleotides within the promoter region to be single stranded.
RT-qPCR
vRNA was extracted from rescued virus stocks using the Direct-zol™ RNA MiniPrep kit (Zymo Research), including a DNase treatment step. vRNA was reverse transcribed using SuperScript™ III (Thermo Fisher), per the manufacturer’s instructions, using an equimolar ratio of the universal vRNA primers (Extended Data Table 2) and primer extension at 37°C for 1 h. qPCR was performed using the Brilliant III Ultra-Fast Probe High ROX QPCR Master Mix (Agilent) with segment-specific primers and probes (Extended Data Table 2).
Acknowledgements
We thank J. Kenyon for helpful discussions and sharing protocols and J. Robertson for making 1M7 reagent. This work was supported by a Wellcome Trust studentship [105399/Z/14/Z] (to B.D.), a Medical Research Council programme grant [MR/K000241/1] (to E.F.), and an EPA Cephalosporin Junior Research Fellowship (to D.L.V.B.). This work was also supported by the U.S. National Institutes of Health [grant numbers HL111527, GM101237 and HG008133].