Comprehensive characterisation of molecular host-pathogen interactions in influenza A virus-infected human macrophages

Macrophages in the lung detect and respond to influenza A virus (IAV), determining the nature of the immune response. Using terminal depth 5’-RNA sequencing (CAGE) we quantify transcriptional activity of both host and pathogen over a 24-hour timecourse of IAV infection in primary human monocyte-derived macrophages (MDM). We use a systems approach to describe the transcriptional landscape of the host response to IAV contrasted with bacterial lipopolysaccharide treated MDMs, observing a failure of IAV-treated MDMs to induce feedback inhibitors of inflammation. Systematic comparison of host RNA sequences incorporated into viral mRNA (“snatched”) against a complete survey of background RNA in the host cell enables an unbiased quantification of over-represented features of snatched host RNAs. We detect preferential snatching of RNAs associated with snRNA transcription and demonstrate that cap-snatching avoids transcripts encoding host ribosomal proteins, which are required by IAV for replication. Graphical Abstract (A) Overview of bioinformatics pipeline. (B) Host gene expression reveals that human macrophages exposed to IAV exhibit sustained production of key inflammatory mediators and failure to induce expression of feedback inhibitors of inflammation. (C) Unbiased comparison with total background RNA expression demonstrates that IAV cap-snatching has a strong preference for, and aversion to, different groups of host transcripts.

162 Filled-in area shows standard deviation between donors.
163 164 intracellular transport (Table S 2  We observed that the response of MDMs to viral infection was immediate. IL1B was rapidly and 174 strongly induced by IAV at 0 hours (effectively 1 hour post virus addition) and peaked at 2 hours 175 together with the detectable induction of host response genes observed above. Relative 211 expression of segment 8 (NS1/NS2) was highest at 2 hours. The late structural segment 212 transcripts (4, 6, 7, encoding HA, NA and M1/M2 respectively) peaked at 7 hours, towards the end 213 of the expected 6-8 hour viral life cycle. By 24 hours, the pattern was less defined, which may be a 214 consequence of mRNA decay or potential reinfection by the virus. In addition, reads plausibly 215 corresponding to the known mRNA3 splice variant transcript from segment 7 which utilises a 216 splice donor site at the 3'-boundary of the conserved promoter sequence (Lamb, Lai and Choppin, 217 1981) were also seen (Figure S 3 D). Similar sequences that potentially represent alternative splice 218 variant mRNAs were observed, most abundantly from segments 5 and 6 (Figure S 3 D, Table S 5). It 219 is unknown if these can be translated. 220

Characterisation of host leader sequences incorporated into viral capped RNA 221
We identified 4,575,918 unique leader sequences, heterogeneous in both sequence and length, 222 snatched from the host and incorporated into viral mRNA. Of these leader sequences, 18.8% 223 (859,789) appeared more than once and 1.5% (69,443) appeared ten times or more across all 224 samples. This 1.5% of the most frequently occurring accounted for 53.6% of the total number of 225 snatched leaders. Thus, at least two populations of snatched sequences exist: those that were 226 heavily snatched as they occur multiple times, and those that were seemingly randomly snatched 227 as they appear only once. Most (74%) of the leader sequences were between 10 and 14 228 nucleotides long (Figure 2  an 'AG' preference at the 5' end of the leader (Gu et al., 2015). 260 The unsnatched 10mers also showed an enrichment of two distinct motifs, CTAG and 261    approach costs statistical power, but is necessary to avoid any bias that might be introduced into 287 the identification based on a quantitative measure, such as abundance. We repeated the Fisher's 288 Exact analysis for all 10mers assigned to a gene name to provide summary data for that gene. The genes. This is consistent with previous observations that snRNAs are snatched frequently and 302 shows that this may represent a true preference for these RNAs. In view of the preferential 303 snatching of multiple snRNAs, we considered whether specific classes of capped host RNAs might 304 be targeted. Of the RNA types we considered, only snRNAs were strongly preferentially snatched 305 This sequencing method also allows the observation of histone mRNA which enabled us to 307 observe that 10mers corresponding to histone mRNAs were also significantly over-represented. In 308 addition, we observed that many host mRNAs encoding spliceosome-and transcription-309 analysis performed by querying various pathway/gene ontology datasets (listed in Table S 7).   Pathway enrichment also allowed us to look for pathways that were avoided by the cap-snatching 342 mechanism. We identified pathways associated with translation and ribosome formation as were identified, these were not independent: these associations were largely driven by presence 345 of a group of transcripts encoding the same set of ribosomal proteins (Table S 7). These data 346 show that IAV avoids snatching caps from ribosomal mRNA transcripts. Interestingly, not all 347 mRNAs encoding ribosomal subunits were avoided. We compared our results to a recent study 348 reporting the effect of targeted knockdown of specific ribosomal subunit mRNAs in the context of 349

Discussion
This comprehensive analysis of host and viral transcripts reveals key features of host-pathogen 359 interaction at a molecular level. We demonstrate that IAV cap-snatching has a strong preference 360 for host transcripts associated with splicing and transcription, and avoids host ribosomal subunit 361 transcripts. By comparing against a canonical innate immune stimulus, LPS, we systematically 362 characterise the host response of human macrophages to IAV exposure. 363

Characteristics of MDM response to IAV 364
MDMs can be infected with IAV and produce both viral protein (NP) and infectious virus. 365 This initial permissiveness may be related to the fact that IFITM3, the protein product of which The type III interferons were highly specific to IAV-treated MDMs. These were recently shown to 437 mediate a key mechanism preventing viral spread to the lower respiratory tract in mice 438 snatched sequences and the background distribution. Since CAGE reads sequences directly from 460 the 5' end, we can be confident that we have quantified the background pool of potential leader 461 sequences that were available to be snatched. By limiting our analysis to sequences of a specific 462 length (10mers), we eliminate bias that may occur due to differential mapping or identification of 463 sequences of different lengths. 464 Our timecourse design allows us to mitigate another potential source of bias. An unknown period which may not be recognised by the IAV polymerase . The minor 486 spliceosome splices <1% of introns in the human genome and its activity -and hence functional 487 expression of these splice variants -is regulated by RNU6ATAC and increased by signalling 488 through the p38MAP kinase pathway (Younis et al., 2013). The viral NS1 protein is known to 489 inhibit the formation of RNU12/RNU6ATAC complexes (Wang and Krug, 1998). Our results 490 suggest that IAV may have evolved more than one mechanism to suppress gene expression 491 through the minor spliceosome pathway. 492

RNA that codes for ribosomal subunits is avoided by IAV cap-snatching 493
Our comparison of LPS and IAV-treated cells shows that genes encoding ribosomal subunits are contribute to pulmonary inflammation that is a feature of severe IAV pathology in a clinical setting 530 (Teijaro, 2014). Finally, many genes encoding surface receptors and signalling molecules required infection, which may increase susceptibility to secondary bacterial infections that produce 533 morbidity and mortality in IAV infected patients (Morris, Cleary and Clarke, 2017). 534 HeliScope CAGE at terminal depth allows for an unbiased observation of both IAV segment mRNA 535 and host-derived leader sequences in cap-snatching. Our comprehensive analysis of leader 536 sequences identified motifs that the IAV polymerase may favour and two others that it apparently 537 avoids during the snatching process. We discovered strongly preferential cap-snatching of host 538 sequences associated with splicing, with evidence of the avoidance of key cellular components 539 required for viral replication. These results hint at a mechanism of host evasion through which IAV 540 may downregulate RNA processing machinery through cap-snatching while specifically evading 541 altering translational machinery specifically required for the replication of the virus. 542

Ethics, cell culture, virus propagation and infections 544
Cells were isolated from fresh blood of volunteer donors under ethical approval from Lothian 545 Research Ethics Committee (11/AL/0168). Primary CD14 + human monocytes were isolated from 546 whole blood as described previously (Irvine et al., 2009) from 4 human donors. Monocytes were 547 plated for 7 days in RPMI-1640 supplemented with 10% (vol/vol) FBS, 2 mM glutamine, 100 U/ml 548 penicillin, 100 μg/ml streptomycin (Sigma Co.), and 10 4 U/ml (100 ng/ml) recombinant human 549 colony-stimulating factor 1 (rhCSF1; a gift from Chiron, Emeryville, CA, USA) for differentiation into 550 macrophages. Cells were maintained at 37°C with 5% CO2. A/Udorn/72 (H3N2) was generated as 551 described previously (Hoeve et al., 2012). Differentiated macrophages were infected on day 8. 552 Cells were washed in serum free media after which they were infected at MOI 5 in a volume of 553 200μl infection media. Cells were incubated for 1 hour at 37°C then washed three times with 554 serum-free media and incubated in RPMI-1640 supplemented with 1μg/ml TPCK-trypsin, 0.7% 555 hour after addition of the virus), 2 hours, 7 hours and 24 hours. Uninfected samples were also 558 collected at 0 and 24 hours. LPS treatments were carried out as described previously (Baillie et al., 559 2017). Briefly, cells were treated with 10ng/ml bacterial lipopolysaccharide (LPS) from salmonella 560 Minnesota R595 and harvested at time points from 15 minutes to 48 hours after treatment. Only 561 time points with corresponding IAV treated time points were used in this analysis. 562

CAGE 563
RNA was extracted using the Qiagen miRNeasy mini kit (217004). RNA quality was assessed and 564 CAGE was performed as described previously (Takahashi et al., 2012) as part of the FANTOM5 565 project. Virus genome information is available in Table S 9. 566

Data Analysis 567
Computational analysis was performed using custom Python scripts and as described previously 568 forward for analysis. We used the glmFit function to fit the models and glmLRT to perform testing 590 between the LPS and IAV treated samples. Benjamini-Hochberg correction was applied to p-591 values. A significance threshold of FDR < 0.05 was used. 592

Identification and analysis of IAV mRNA 593
Capped IAV RNAs were identified by the conserved 11 base promoter sequence expected to be in 594 all viral mRNA ('GCAAAAGCAGG'), as described in the text. Sequences that contained the 595 promoter were classified as capped viral mRNA and aligned to the Udorn genome (Table S 9) using 596 custom Python scripts. 597

Unbiased analysis of leader sequence preference 598
The first ten nucleotides of each CAGE tag (10mers) that reached the abundance threshold in our 599 dataset were extracted and this set of unique 10mers were used in subsequent analysis. The 600 abundance threshold was set to 1,000 occurrences across all samples. To determine the 10mer 601 sequences that were over-and under-represented in the snatched population based on 602 background abundance, the number of times a 10mer was associated with the IAV promoter was 603 counted ("snatched") along with the number of times the 10mer occurred without the promoter 10mer was snatched was compared to the number of times it occurred unsnatched at the 607 previous time point by Fisher's Exact test. 608

Analysis of leader motifs using convolutional neural networks 609
A sub-set of 10mers that reached the following threshold: 0.3 < (OR) > 3, -log(FDR < 10) were 610 brought forward for analysis of motif preference using convolutional neural networks. We 611 optimised an existing network (Budach and Marsico, 2018) for our use by using altering the 612 parameters to find suitable setting by using the grid search to explore various kernel lengths (2, 3, 613 4, and 5) and drop rate (0, 0.1, and 0.5); for other parameters, we used the default settings of 614 Pysster (kernel number: 20, convolutional layer number: 2) apart from learning rate at 0.0001 and 615 patience, stopping at 100. Since our analysis was restricted to 10mers, we did not use the pooling 616 method. We randomly selected the training set and validation set in the proportion of 60% and 617 30% independently. The purpose of this experiment was to explore the existing data, not to make 618 predictions, so we reused the training data to explore the result. Optimisation experiments 619 demonstrated that a kernel length of 4 gave us relatively high, and relatively consistent, precision 620 and recall. We maximised the area under the receiver operator characteristic (ROC) and the area 621 under the precision recall curve. Motifs were considered if they reached a score of at least 50% 622 the maximum score for that time point. 623

Assignment of transcript identity to 10mer sequences 624
CAGE tags were mapped to the human reference genome (hg19) as described (Forrest et al., 625 2014). To identify the possible transcription start site from which a 10mer arose, we extracted 626 every possible chromosomal location for a 10mer that met the abundance threshold of 1000 627 across all samples from the original alignment BAMfiles created as part of the Fantom5 project. the coordinates of the assigned transcription start site and exact matches only were used to assign 634 promoter identity. 635 In order to avoid any effect of abundance that may bias transcript identification, for 10mers with 636 more than one possible promoter identity, a site was chosen at random from the list of possible 637 sites. Promoter names were converted to HGNC format. To determine over-and under-638 representation of promoters and genes, all 10mers that were assigned to that promoter or gene 639 name were counted and the Fisher's Exact test was performed. Benjamini-Hochberg FDRs were 640 calculated using the scipy.stats v 0.18.1 statsmodels.stats.multitest.mutlipletests function with 641 method = 'fdr_bh'. Significance was determined by an FDR < 0.05. RNA type was assigned to gene 642 names using reference data downloaded from Biomart (http://www.ensembl.org). Only named 643 transcripts were assigned an RNA type. 644 In order to determine if a gene was significantly snatched compared to its abundance at the 645 previous time point, we compared snatched at t to unsnatched at t-1 for the collated values of all 646 10mers that were assigned that gene name in each sample separately. This was performed for 647 2hrs versus 0hrs, 7hrs versus 2hrs, and 24hrs versus 7hrs. A gene was declared 'True' if the 648 combined p-value for that gene was significant in 2 out 4 donors at that time point. 649

Pathway and Gene Set Enrichment Analysis 650
All named genes that appeared significant were included in this analysis. Gene names were 651 Component 2018 were used. Genes were ranked by -log10(p-value), and log10(OR). Benjamini-662 Hochberg correction was applied to p-values. 663 Primary human monocyte derived macrophages were differentiated, as described above, on glass 666 coverslips. Cells were infected as described. At 0, 2, 7, and 24 hours post infection cells were fixed 667 for 20 min in 4% formaldehyde in PBS. After permeabilization with 0.2% Triton X-100 in PBS for 5 668 min at room temperature, cells were incubated with mouse monoclonal influenza A NP AA5H 669 (BioRad) at 1:500. After 1 hour cells were washed three times with PBS and incubated with goat 670 anti-mouse Alexa Fluor 488 at 1:1000 (ThermoFisher). After 1 hour cells were washed three times 671 with PBS and incubated in DAPI (ThermoFisher) for ten minutes after which they were washed 672 three times with PBS and mounted on slides using VECTASHIELDâ Antifade Mounting Medium. 673 Cells were viewed on a Leica fluorescence upright microscope and imaged using a Hamamatsu 674 Orca-ER low light mono camera. Scale bars were added using ImageJ. 675

Cell viability and Virus Titration 676
Cell viability was measured using Cell Titre Gloâ at 0, 2, 7, and 24 hours post infection. Virus that did not align proximal to the IAV promoter sequence in the Udorn genome were extracted. 688 These novel 'promoter proximal' sequences were hypothesised to be derived from putative 5'UTR 689 sequences internal to a segment arising from mRNA from splice variants. These sequences were 690 aligned throughout the Udorn genome using custom Python scripts. The abundance of each 691 sequence was divided by the number of locations in the Udorn genome it could map to. The 692 weighted abundances at each position were then summed and graphed. Segment 7 mRNA3 was 693 used as a proof of principle. 694

Determination of Coefficient of Variation Across 4 Donors 695
The expression profile of CTSS across all samples was extracted from the Fantom5 data in TPM 696 (tags per million). The coefficient of variation for each gene was calculated for each of the 6 697 treatments across the 4 donors (SD:Mean * 100). Gene names were assigned to CTSS by Bedtools 698 overlap with a window of +/-5 bases using the hg19 annotation from Fantom5. Unnamed genes 699 were removed from the final list. Results were separated by treatment. 700 701 Spicing has been observed in segments 7 and 8 of IAV. In particular, in segment 7 the splice donor 704 site for the mRNA3/M3 transcript is found at the end of the promoter sequence (Lamb, Lai and 705 Choppin, 1981) . Over 400,000 reads contained the IAV promoter sequence and a leader 706 sequence, but did not originate from the genome sequence proximal to the promoter in any of 707 the 8 segments. The leader and promoter sequences were removed and the sequences aligned 708 throughout the Udorn genome. In order to quantify RNA expression at these loci, we summed the 709       Analysis was restricted to the dominant promoters (p1) and used averaged data from the 4 859 donors. A Pearson correlation coefficient threshold of 0.94 and an MCL of 1.7 was used. 860  (Kuleshov et al., 2016) . 865

Table S 4: Differential expression analysis of LPS-and IAV-treated MDMs. 866
Analysis was performed using edgeR. All transcripts with a -log2(Fold Difference) >= 1 are shown. 867 All results shown reach the threshold FDR < 0.5. 870 Table S  The coefficient of variation for named genes. Expression at CTSS were compared across donors 874 (n=4) and annotated using publicly available Fantom5 annotation.