ABSTRACT
Virus and host factors contribute to cell-to-cell variation in viral infections and determine the outcome of the overall infection. However, the extent of the variability at the single cell level and how it impacts virus-host interactions at a systems level are not well understood. To characterize the dynamics of viral transcription and host responses, we used single-cell RNA sequencing to quantify at multiple time points the host and viral transcriptomes of human A549 cells and primary bronchial epithelial cells infected with influenza A virus. We observed substantial variability of viral transcription between cells, including the accumulation of defective viral genomes (DVGs) that impact viral replication. We show a correlation between DVGs and viral-induced variation of the host transcriptional program and an association between differential induction of innate immune response genes and attenuated viral transcription in subpopulations of cells. These observations at the single cell level improve our understanding of the complex virus-host interplay during influenza infection.
IMPORTANCE Defective influenza virus particles generated during viral replication carry incomplete viral genomes and can interfere with the replication of competent viruses. These defective genomes are thought to modulate disease severity and pathogenicity of the influenza infection. Different defective viral genomes also introduce another source of variation across a heterogeneous cell population. Evaluating the impact of defective virus genomes on host cell responses cannot be fully resolved at the population level, requiring single cell transcriptional profiling. Here we characterized virus and host transcriptomes in individual influenza-infected cells, including that of defective viruses that arise during influenza A virus infection. We established an association between defective virus transcription and host responses and validated interfering and immunostimulatory functions of identified dominant defective virus genome species in vitro. This study demonstrates the intricate effects of defective viral genomes on host transcriptional responses and highlights the importance of capturing host-virus interactions at the single-cell level.
INTRODUCTION
The productivity of viral replication at the cell population level is determined by cell-to-cell variation in viral infection [1]. The genetically diverse nature of RNA viruses and the heterogeneity of host cell states contribute to this inter-cell variability, which can impact therapeutic applications [2, 3]. Although previous studies of cell-to-cell variation during viral infection have mainly centered on non-segmented viruses such as poliovirus [2, 4], vesicular stomatitis virus (VSV) [5, 6], or dengue (DENV) and zika (ZIKV) viruses [7], heterogeneity across cells may be further complicated for viruses with a segmented genome, such as influenza A virus (IAV). IAV-infected cells display substantial cell-to-cell variation as it pertains to relative abundance of different viral genome segments [1, 8], transcripts [9], and their encoded proteins [10], which can result in non-productive infection in a large fraction of cells.
Defective interfering (DI) particles readily generated during successive high multiplicity of infection (MOI) in cell culture passages [11–14] and observed in natural infections [15, 16] are likely by-products of the IAV replication process and have a significant effect on productive infection. DI viruses have incomplete viral genomes with large internal deletions and they possess the ability to interfere with the replication of infectious viruses. Because DIs diminish the productivity of infectious progenies and could impact disease outcome, they are of great interest for therapeutic and prophylactic purposes (reviewed in [17]). A specific influenza DI (i.e., DI244) was shown to effectively provide prophylactic and therapeutic protection against IAV infection in mice [18]. One of the ways influenza DIs are thought to modulate viral infections is through interaction with a cytosolic pathogen-recognition receptor (PRR), RIG-I, essential for interferon (IFN) induction [19]. Besides enhanced IFN induction during infection with DI-rich influenza virus populations observed in vitro and in vivo, it has been assumed that DI virus could also compete with standard virus for cellular resources (reviewed in [17, 20]).
While diverse DI viruses can arise during IAV infection [21], the emergence and accumulation of distinct DIs, as well as other defective virus genomes (DVGs), has not been characterized at a single cell resolution, although the diversity of DIs present could be contributing to the observed cell-to-cell variation in host transcription.
We probed viral and host transcriptomes simultaneously in the same cells using single-cell RNA-seq to monitor host-virus interactions in cultured cells over the course of the infection. This data established a temporal association between the level of viral transcription and the alteration of the host transcriptome, and characterized the diversity and accumulation of DVG transcripts.
RESULTS
Cell-to-cell variation in virus gene expression
To determine how both the viral and host cell transcriptional programs relate to each other over the course of an influenza infection, we infected two cell types—the adenocarcinomic human alveolar basal epithelial A549 cell line and human primary bronchial epithelial cells HBEpC—at high multiplicity of Infection (MOI; 5) with A/Puerto Rico/8/34(H1N1) (PR8) and performed single cell and bulk RNA-seq expression analyses. A high MOI infection ensures that virtually all the cells can be rapidly infected, promotes the accumulation of DVGs, and consequently enables the characterization of both host response and DVG diversity. We first determined the percentage of reads that uniquely aligned to viral genes from the total number of mapped reads to obtain the relative abundance of virus transcripts within cells at each time point. Similar to what has been observed at early stages of infection during a low MOI infection of IAV [9], the relative abundance of virus transcripts was heterogeneous across cells from both cell types, with 0 to 70% of the total reads in each cell being derived from virus transcripts and the relative abundance of these transcripts increasing over time (Supplementary Fig. 1a). The same trend was also seen when analyzing segment-specific virus transcripts within individual cells over the course of the infection (Supplementary Fig. 1b).
Heterogeneity of defective virus segment expression across the cell population
Since DVGs are known to accumulate in cell culture and can serve as templates for transcription, we characterized their abundance and diversity by examining gap-spanning reads in the sequencing data. To quantify the relative abundance of DVGs identified, we calculated the frequency of each unique DVG transcript type (determined by 5’ / 3’ gap coordinates) across single cells and the ratio of these to non-gap-spanning transcripts derived from the corresponding viral segments (i.e., DVG/FL ratio). Although DVG transcripts could be detected in all viral segments, those originating from the polymerase segments (i.e., PB2, PB1, and PA) occurred with the highest abundance and frequency in both cell types over the course of the infection (Supplementary Fig. 2), consistent with observations made in other studies [15, 22]. We thus focused our analyses on the detection of DVG transcripts derived from those segments. We collected reads with large internal deletions spanning ≥1000 nucleotides (nt) (Fig. 1a) and identified the junction coordinates of these gap-spanning reads. We observed a diverse pool of DVG transcripts, including some shared with the viral segments (vRNA) from the PR8 stock (Supplementary Fig. 3-5). The sizes for the majority of these DVG transcripts are estimated to be between 300nt and 1000nt. To evaluate the accumulation of DVG transcripts over the course of the infection, we compared changes in the ratio of DVG transcripts to full-length (FL) transcripts derived from a given polymerase segment (DVG/FL) across time points. The DVG/FL ratios increased significantly between the early (6hpi) and late (24hpi) stages of the infection (p < 2.2 x 10-16) and displayed a high level of heterogeneity across single cells from both cell types (Supplementary Fig. 6).
Interestingly, the junction sites aggregated in specific genomic regions, as they systematically occurred in a high percentage of cells. We identified two predominant types of DVG transcripts corresponding to PB2 and PB1 defective segments (Fig. 1b-c and Supplementary Fig. 7-8). These same DVG transcripts were present in the stock, increased in prevalence over the course of the infection, and were conserved in different cell types and at different MOIs (Supplementary Fig. 9-10), indicating likely carryover from the stock virus rather than de novo formation in each cell type. For PA, one particular DVG type that was present in the stock was found in A549 and MDCK cells at different MOIs, but not in HBEpC cells. Its prevalence was also much lower than for PB2 and PB1 DVGs (Supplementary Fig. 11-12). The dominant DVG PB2 and PB1 transcripts derived from corresponding defective segments in both the stock and the infected cells showed stable relative abundance across individual cells over the course of the infection, suggesting the persistence of the defective viral segments.
To determine if any of the inferred defective polymerase segments could lead to the generation of defective interfering particles (DIs), we tested their interference potential experimentally at the bulk level. We generated clonal PB8-DI162 and PR8-DI222 viruses carrying the deletion sites in the PB2 segment identified as those in two dominant PB2-DVG species, and then co-infected each of the DI viruses with PR8 wild-type (WT) virus in A549 cells. Both PR8-DI162 and PR8-DI222 viruses inhibited the productivity of WT virus as well as PR8-DI244, known to be an effective DI [18] (Fig. 1d). To confirm that inhibition is specific to the DIs and not due to competition between co-infecting strains, we also tested a full-length PR8 virus with a frame-shifting point mutation in the PB2 gene. Co-infection with this mutant virus did not affect the productivity of WT virus (Fig. 1d). The inhibitory effect of PR8-DI222 virus could still be detected at a DI/WT ratio of 1:1 (Fig. 1d). This confirmed the interfering ability of the two types of defective PB2 segments identified in the single cell data. The interference test was not intended to mimic a single-cell scenario given the difficulties in recreating single-cell infection conditions at a bulk level.
Cell-to-cell variation in concurrent virus-host transcriptional changes over the course of the infection
Viral infection can trigger massive changes in the host transcriptional program, including the activation of the interferon (IFN) response. Since the induction of IFNs is also subject to cell-to-cell variation [23], we first evaluated the expression frequency of type I (i.e., IFN beta) and III (i.e., IFN lambda) IFNs to determine the extent of the variation during infection. While these were significantly differentially expressed over the course of the infection at the population level, as measured by bulk RNA-seq (Fig. 2a), the expression of IFNs was only detected in less than 3% of cells until the late stage of infection, although this proportion increased over time (Fig. 2b and Fig. 2c). The same expression profile was observed for a number of IFN-stimulated genes (ISGs), such as RSAD2, CXCL10, GBP4, GBP5, IDO1, and CH25H in A549 cells, and IFI44L, CMPK2, IFIT1, BST2, OASL, and XAF1 in HBEpC (Fig. 2). In silico pooling of the single cell data to mimic the population level measurement resembles the bulk RNA-seq data, thus excluding the possibility of substantial technical limitation of single-cell RNA-seq (Fig. 2a).
To evaluate cell-to-cell variation in the host response manifested by the alteration in the host transcriptome following IAV infection, we performed un-supervised cell clustering solely using the host transcriptional data. We then annotated single cells in each subpopulation with information about the relative abundance of viral transcripts and the DVG/FL ratio. We observed a correlative pattern between the distribution of the relative abundance of virus transcripts and the host transcriptional landscape across cells within a population, as cells with a high abundance of virus transcripts formed a separate cluster by 12hpi in both cell types and this population structure, marked by a drastic difference in the level of viral transcripts between clusters, was retained at 24hpi (Fig. 3a and Fig. 3b). However, cell-cycle status, which contributes to the heterogeneity of the cell population by 12hpi and the formation of clusters in distinct cell-cycle stages (Supplementary Fig. 13a and Supplementary Fig. 13b) with a similar abundance of virus transcripts, does not seem to have a significant impact on population structure in A549 cells at 24hpi. Consistent with previous reports on G0/G1 cell-cycle arrest [24, 25], at 24hpi we observed a drastic increase in the proportion of cells in the G0/G1 phase as compared to earlier time-points (Supplementary Fig. 13c). Interestingly, this same pattern was not observed in HBEpC cells at the same time-point; in these cells the host and viral transcriptome landscape at 24hpi is similar to that in A549 cells at 12hpi, suggesting a temporal difference in host response and viral replication between cell types.
To characterize host genes driving the changes associated with viral transcription, we assessed the over-representation of Gene Ontology (GO) terms associated with differentially expressed genes in each subpopulation of A549 and HBEpC cells compared to the rest of the cells at 24hpi. In two subpopulations of A549 cells with a high relative abundance of virus transcripts (clusters 0 and 1 in Fig. 4a), there is an enrichment of genes involved in viral transcription, RNA processing, translation, SRP-dependent co-translational protein targeting to the membrane, and mitochondrial electron transport (Fig. 4a). These GO terms are also associated with genes highly expressed in HBEpC cells with the highest abundance of virus transcripts at 24hpi (cluster 3 in Fig. 4b; p < 2.2 x 10-16 compared to the other clusters). In contrast, genes involved in antiviral responses, such as the type I IFN signaling pathway and the negative regulation of viral genome replication, are highly expressed in the other two subpopulations of A549 cells with a similar or severely reduced relative abundance of virus transcripts (clusters 3 and 2, respectively in Fig. 4a). However, some antiviral genes are differentially induced in cells with different levels of viral transcription. For example, type I and III IFNs, as well as a subset of ISGs, are highly expressed in cluster 3, and another subset of ISGs are highly expressed in cluster 2. In HBEpC cells, the antiviral responses, such as the type I IFN signaling pathway, are primarily observed in a cluster that has a low abundance of virus transcripts (cluster 2 in Fig. 4b) and is mostly comprised of cells in the G0/G1 phase (Supplementary Fig. 13b at 24hpi), as seen for A549 cells at 12hpi (Supplementary Fig. 13a and Supplementary Fig. 14).
To further understand how viral and host transcriptional activities are coordinated in clusters of cells that are subjected to different levels of viral stimuli, we performed gene co-expression network analyses using multiscale embedded gene co-expression network analysis (MEGENA) [26] and subsequent correlation analyses to directly characterize virus-host interactions. This was done for each cluster individually for A549 and HBEpC cells collected at 24hpi to identify modules of co-expressed genes representing coherent functional pathways. The relevance of a module to viral infection was evaluated by correlating the first principal component of a module with the relative abundance of virus transcripts. Consistent with the processes identified in the GO term enrichment approach using differentially expressed genes identified within a given cluster of cells (Fig. 4), those related to the host cell cycle and viral life cycle, including transcription, translation, RNA processing, metabolic process, and protein localization to membrane, were enriched in the modules that were correlated with viral transcription and identified in the clusters of A549 cells with a high abundance of virus transcripts (Supplementary Fig. 15). Conversely, the host innate immune response (e.g., response to type I IFN) was associated with viral transcription-correlated modules in cells with the lowest abundance of virus transcripts (cluster 2 of A549 cells at 24hpi in Fig. 5a). For example, module M100 negatively correlated with relative abundance of virus transcripts (ρ = −0.262, p = 2.9 x 10-5) in cluster 2 of A549 cells has genes—including ISG20, RNF213, IFI35, STAT1, RBCK1, IFI27, OAS2, OAS3, XAF1, STAT2, IFIT5—that are involved in the innate antiviral response and that were significantly highly expressed, as compared to their expression in other cells of the same population (Fig. 5b). In HBEpC cells, viral transcription-correlated modules, such as M227, enriched for type I IFN signaling genes (e.g., ISG15, MX1, MX2, IFI35, IFI27, IFIT1, IFITM3, OAS1, OASL, ISG20.) and positively correlated with relative abundance of virus transcripts (ρ = 0.103, p = 0.021; Fig. 5c and 5d), were detected in one cluster of cells with a low abundance of virus transcripts (cluster 2 of HBEpC cells at 24hpi in Fig. 5c). Modules correlated with viral transcription in the other two clusters of HBEpC cells with a low abundance of virus transcripts (clusters 0 and 1 in Supplementary Fig. 16) were enriched for genes associated with the host cell cycle, consistent with observations in A549 cells at 12hpi (Supplementary Fig. 14), while the viral transcription correlated modules in HBEpC cells with a high abundance of virus transcripts (cluster 3 in Supplementary Fig. 16) were enriched for genes involved in the ER-associated ubiquitin dependent protein catabolic process, small molecule metabolic process, vesicle mediated transport, etc. Overall, by identifying modules of genes that have a direct correlation with virus transcript abundance, the gene co-expression network analysis further demonstrates a complex association between virus transcript abundance and host responses, consistent with the pattern revealed by GO term analysis of highly expressed genes in individual clusters, and thus provides additional evidence of complex virus-host interactions.
Differential expression of host genes linked to defective viral genomes
Since DVGs are known to play an important role in IFN induction and inhibition of viral replication (reviewed in [20]), we determined whether the accumulation of DVGs was associated with attenuated viral transcription and induction of innate immune antiviral response genes. We observed a subtle but consistent inverse trend between the relative abundance of virus transcripts and the PB2 and PB1 DVG/FL ratios at 12hpi in A549 cells and at 24hpi in HBEpC cells (Fig. 6a-d). The cluster of cells with the highest abundance of virus transcripts (cluster 1 of A549 cells at 12hpi in Fig. 6a and cluster 3 of HBEpC cells at 24hpi in Fig. 6b) has the lowest level of PB2 and PB1 DVG/FL ratios (Fig. 6c-d; p < 2.2 x 10-16 comparing PB2 and PB1 DVG/FL ratios in cluster 1 of A549 cells or cluster 3 of HBEpC cells against the other clusters). Similarly, in A549 cells at 24hpi, we see the lowest abundance of virus transcripts in cells (cluster 2 in Fig. 6e) that have a higher level of PB2 DVG/FL ratios (cluster 2 in Fig. 6f; p < 2.2 x 10-16 compared to clusters 0 and 1) and the highest level of PB1 DVG/FL ratios (p = 1.943 x 10-9 comparing cluster 2 against the other clusters). Moreover, innate immune response genes are highly expressed in cells with either an elevated DVG/FL ratio for both DVG PB2 and PB1 transcripts and a low abundance of virus transcripts (cluster 2 of A549 cells at 24 hpi in Fig. 6e-f), or an elevated DVG/FL ratio for PB2 transcripts and a relatively high abundance of virus transcripts (cluster 3 in Fig. 6e-f; p = 4.036 x 10-11 compared to clusters 0 and 1 in Fig. 6f). It suggests that both DVG and FL transcripts could play a role in stimulating the innate immune response and that the enrichment of DVG transcripts in the total virus transcript pool is associated with strong immunostimulation, independent of viral transcription levels. Notably, the abundance of two dominant and ubiquitous PB2-DVG transcripts in the total virus transcript pool varies between cells and across clusters. The DVG PB2 transcripts carrying the same deletion sites as for PR8-DI222 were highly abundant in the cluster of cells with the lowest abundance of virus transcripts (cluster 2 of A549 cells at 24 hpi in Fig. 6g; p < 2.2 x 10-16 compared to the other clusters), while the other DVG PB2 transcripts corresponding to PR8-DI162 were highly abundant in both cluster 2 and cluster 3 (Fig. 6g; p = 2.067 x 10-10 and p = 0.001241, respectively, compared to clusters 0 and 1), suggesting a potential difference in the induction of innate immune response genes by different DVG species. We did not detect a difference in the relative abundance of defective PA transcripts among cells with different levels of viral transcription (Supplementary Fig. 17).
Given the enrichment of PB2 DVGs and the over-expression of ISGs in cluster 2 of A549 cells at 24hpi compared to the rest of the cells in the same population, we experimentally validated the immunostimulatory impact of PB2 DVGs on the host cells. We infected A549 cells with each of our 3 PR8-DIs (PR8-DI162, -DI222, or -DI244) or with WT virus, and quantified the expression level of the influenza virus M gene and of ISGs that were shown to be significantly highly expressed in cluster 2 of A549 cells, including IFI35, IFI27, and MX1, the type I (IFN beta) and III (IFN lambda) IFNs that were shown to be significantly highly expressed in cluster 3 of A549 cells (Supplementary Fig. 18a and 18b). Due to their limited ability to replicate without a helper virus, DI viruses cultured on their own exhibit a significantly lower level of viral transcription compared to WT virus (Supplementary Fig. 18c). Despite the lower replication of DIs, there was a higher induction of ISGs in DI-infected cells compared to cells infected with WT virus, while there was a comparable or marginally higher induction of IFN beta (IFN-ß) and IFN lambda (IFN-λ) in the WT-infected cells (Supplementary Fig. 18c). Together, these data demonstrate that at the late stage of infection PB2 DVGs have a strong immunostimulatory effect on the host cells manifested by a significant induction of ISGs even at very low levels of viral transcription and independent of type I and III IFN expression levels.
DISCUSSION
The goal of this study was to quantitatively and qualitatively characterize viral and host factors that contribute to cell-to-cell transcriptional variation observed during IAV infection. We identified DVGs as potentially important factors in the temporal variation of the host cell transcriptome. The substantial cell-to-cell variation in viral replication and host response that we detected highlight the potential of single-cell virology to provide novel insight into complex virus-host interactions.
The inherent stochasticity of molecular processes, especially during the initial stages of infection, can contribute to the cell-to-cell variation in viral replication [1]. Evidence from studies examining low MOI infections with different virus strains and cell lines suggests that this intrinsic stochasticity can impact viral replication and result in the loss of viral genome segments, transcripts, or proteins [1, 8–10]. Similarly, at different time points of the high MOI infection, we observed cell-to-cell variation in viral transcription, including that in segment-specific transcription. Although the mechanisms leading to this observation are unclear based on our data, it may be explained by the following possibilities as discussed in [10], including (i) the absence of one or more viral genome segments in infecting virions, (ii) a deficiency in intracellular trafficking of incoming viral ribonucleoprotein complexes (vRNPs), (iii) the random degradation of vRNAs prior to transcription, or (iv) mutations resulting in decreased stability of viral mRNAs.
Another source of variability in viral replication stems from the emergence and accumulation of various vRNA species synthesized by the error-prone viral polymerase. Indeed, viral genetic diversity, including amino acid mutations in the NS1 and PB1 proteins, the absence of the NS gene, and an internal deletion in the PB1 gene, contributes to the cell-to-cell variation in the innate immune response as shown in a recent report on a low MOI infection [27]. Although the 3’ sequencing strategy we used in our study limits the identification of mutations in the full-length segments, our experimental setup provides a unique opportunity to evaluate the diversity of DVGs by characterizing the DVG transcripts, since a high MOI infection promotes the accumulation of DVGs. Late stages of infection also emphasize the impact of accumulated DVGs on the host response. We identified a diverse pool of DVG transcripts derived from the three polymerase segments, including several dominant species. Notably, the fact that all the dominant DVG PB2 and PB1 transcripts identified in the infected cells were also present in the virus stock, and that the dominant DVG PB1 transcripts were detected in an increasing proportion of cells over the course of infection, suggests potential cellular transmission of DI viruses carrying these DVGs. However, we cannot exclude the possibility that the regions where the hot spots were located are most prone to polymerase skipping and that DVG enrichment over the course of the infection resulted in their abundance above a detection threshold in an increasing number of cells.
The accumulation of DVGs can have consequences on viral replication by directly competing with the full-length genomes and modulating host innate immune responses. The immune-modulatory effect of DVGs has been attributed to efficient activation of the IFN induction cascade and antiviral immunity, a mechanism well-known for the non-segmented negative-sense RNA viruses, such as the murine parainfluenza virus Sendai (SeV) (reviewed in [20]). Consistent with previous reports, we detected enrichment of highly expressed genes involved in the IFN response in one subpopulation of cells with significantly attenuated viral transcription. In this cluster of cells we also observed that DVG transcripts accumulated to significantly higher levels compared to FL transcripts. The co-expression network analysis provides an opportunity to identify modules of co-expressed host genes and directly correlate them with the level of viral transcription revealing groups of innate response genes that are potentially differentially expressed under the effect of viral transcription. The observation was further validated in our DI- or WT-infection assay, in which we detected a significantly higher expression of some ISGs in DI-infected cells independent of the level of viral transcription and that of IFNs at the late stage of the infection. The IFN mRNA dose-independent ISG expression in DI-infected cells suggests a potential novel mechanism of DVG immunostimulation that is independent of IFN induction. Expression of a subset of ISGs can be induced in the absence of IFN signaling, as shown in other studies [28, 29]. It is likely to be at least partially attributed to similar and overlapping consensus DNA-binding motifs for the interferon regulatory factor (IRF) family, most notably IRF3 and IRF7 that are key regulators of type I IFN production, and for IFN-stimulated gene factor 3 (ISGF3) that promotes the expression of many ISGs in response to IFN signaling [30–33]. For example, a IRF3 ChIP-seq experiment with adenovirus-infected human primary bronchial/tracheal epithelial cells demonstrated IRF3 binding to the promoter region of induced ISGs, including MX1 and ISG15, which have the IFN-stimulated response element (ISRE) bound by IRFs and ISGF3 in the promoter regions [34]. Further investigation of ISG induction in the absence of IFN signaling during DI infection will elucidate this potential mechanism of DI immunostimulation. In addition, given the fact that some ISGs could be highly expressed in cells with a high level of viral transcription or a high abundance of DVGs suggests that other viral factors, besides the accumulation of DVGs, may also play an important role in triggering the innate immune response. Nevertheless, unlike SeV DVGs that form “copy-back” structures due to complementary termini generated by “copying back” the authentic 5’ terminus at the 3’ terminus [35, 36], IAV DVGs share identical termini with the full-length genomes. In the copy-back SeV DVGs, a stretch of dsRNA adjacent to a 5’-triphosphate serves as an effective RIG-I ligand [20]; however, the mechanism of IAV DVGs underlying a more effective immunostimulation compared to the full-length genomes remains elusive [20]. A proposed hypothesis is that the unencapsidated replication products of small DVGs can potentially activate RIG-I if they can reach the cytoplasm [20], as observed in vitro and in vivo where short influenza RNA templates can be replicated in the absence of nucleoprotein (NP) [37]. In addition, the interfering and immune-modulatory abilities of DVGs could also be attributed to DVG-encoded proteins that directly interact with mitochondrial antiviral-signaling (MAVS) and act independently of RIG-I in IFN induction, as previously reported [38].
Our approach enabled us to characterize and quantify the abundance of diverse DVG species that accumulated in individual cells. It also allowed us to associate the enrichment of certain DVG species with the attenuation of viral transcription and the induction of innate immune responses at single-cell resolution. For example, the accumulation of PB2-DVGs, especially including one type of dominant PB2-DVG species (PR8-DI222), in one subpopulation of cells was associated with significantly attenuated viral transcription and a strong host innate immune response manifested by high expression of some ISGs, although we cannot rule out the effect of other non-dominant defective PB2-DVGs, DVGs derived from other polymerase segments, or certain mutations in stimulating host responses. Diverse DVG species can play different roles in shaping the virus-host interaction. However, further investigation is necessary to elucidate the dynamics and impact of DVG species distribution during pathogenesis, enhancing the understanding of the relationship between DVG dynamics and infection outcome. Our characterization of DVGs and corresponding host responses reveals complex virus-host interactions and an underappreciated single-cell level of immune-modulation by DVGs. The complex gene expression pattern underlying the host innate immune response demonstrates the intricate effects of various viral factors, including—but probably not limited to—the accumulation of DVGs, on shaping infection outcome at the single-cell level.
MATERIALS AND METHODS
Cells and virus
Human lung adenocarcinoma epithelial A549 cells (ATCC, Virginia, USA) were maintained in Kaighn’s modified Ham’s F-12 medium (F-12K) supplemented with 10% fetal bovine serum (FBS). Primary human bronchial epithelial cells (HBEpC) (PromoCell, Heidelberg, Germany) were maintained in PromoCell airway epithelial cell growth media with SupplementMix (PromoCell). Madin-Darby canine kidney (MDCK) cells were maintained in minimum essential medium (MEM) supplemented with 5% FBS. Human embryonic kidney epithelial 293T cells (ATCC, Virginia, USA) were maintained in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% FBS. PB2 protein-expressing modified MDCK (AX4/PB2) [39] cells were maintained in MEM supplemented with 5% Newborn Calf Serum (NBCS), 2 µg/ml puromycin and 1 µg/ml blasticidin. 293T and AX4/PB2 cells were co-cultured with DMEM supplemented with 10% FBS. Viral strain A/Puerto Rico/8/34 (H1N1) was plaque purified and propagated in MDCK cells. Viral titers were determined by plaque assay in MDCK cells and sequences confirmed by Illumina MiSeq sequencing.
Infection assays
Subconfluent monolayers of A549 cells in T-25 flasks were washed with A549 infection media (F-12K supplemented with 0.15% bovine serum albumin (BSA) fraction V and 1% antibiotic-antimycotic) and infected with influenza virus at a multiplicity of infection (MOI) of 5 in 0.5 mL pre-cooled A549 infection media. The flasks were incubated at 4°C for 30 minutes with agitation every 5-10 minutes followed by addition of 4.5 mL pre-warmed A549 infection media and transferred to a 37°C, 5% CO2 incubator for further incubation. HBEpC cells were infected similarly except the HBEpC infection media was PromoCell airway epithelial cell growth media. The inoculum was back-titrated to confirm the desired MOI was used. The virus-infected cells were collected at 6, 12, and 24 hours post infection (hpi) and the mock-infected cells were collected at 12 hpi. There was no substantial cell death observed at any time point. Cells were extensively washed before re-suspension in PBS containing 0.04% BSA. A proportion of cells from one of the two duplicate flasks per time point were subject to 10X Genomics Chromium single-cell library preparation. The remaining cells in the two flasks were used for conventional bulk RNA-seq library preparation. Notably, due to a failure in the single-cell library preparation for the PR8-infected HBEpC cells collected at 12hpi, we replaced this sample by repeating the same infection assay with HBEpC cells.
Sample preparation, library construction, and sequencing
10X Genomics single cell library preparation with Chromium 3’ v2 chemistry was performed following the manufacturer’s protocol. A range of 3,900-7,500 cells were used as input into each single cell preparation, with a median of ∼3,353 cells (range 2,075-5,254) obtained following sequencing, as described below. Sequencing was performed on the HiSeq 2500 in HighOutput mode (v2) with one library per lane following the manufacturer’s recommended sequencing configuration (i.e., paired-end read 1: 27 bp, read 2: 99 bp, and i7 index: 8 bp). An average of 59,200 reads per cell were obtained. For bulk mRNA sequencing, total RNA from two replicate flasks per time point was extracted using RNeasy Mini Kit (Qiagen, Hilden, Germany) with on-column DNA digestion using RNase-free DNase (Qiagen). The conventional bulk RNA-seq library preparation was performed using NEBNext Ultra II RNA library prep kit for Illumina following poly(A) mRNA enrichment. Libraries were multiplexed and sequenced on an Illumina NextSeq 500 in HighOutput 2×75 bp mode (v3). Viral genomic RNA (vRNA) of the virus stocks used in the infection assay was amplified using multi-segment reverse-transcription PCR (M-RTPCR) [40]. The M-RTPCR product was subject to Illumina library preparation with Nextera DNA library prep kit and sequenced on the HiSeq 2500 in Rapid 2×150 bp mode (v2).
Upstream computational analyses of single-cell RNA-seq data
The Cell Ranger (v2.1.0) single cell software suite (10X Genomics) was applied to the single-cell RNA-seq data to perform the alignment to the concatenated human (hg19) and influenza A virus (A/Puerto Rico/8/34) reference with STAR (v2.5.3a) [41] and gene counting with the processed UMIs for each cell. The UMI counts for the viral reads with the CIGAR string containing a gap (i.e., “MNM”) and different UMI sequences comparing to the ungapped reads were added into the expression matrix later, since they were not considered during gene counting. Given the fact that we performed 3’-end sequencing, transcripts derived from alternative splicing of M and NS segments, encoding M1/M2 and NS1/NEP respectively, could not be distinguished in our analyses, as those transcribed from the same genes share the identical 3’-end.
To identify and exclude low quality cells in the single cell data from the downstream analyses, the following metrics were calculated for each sample and were employed to filter out cells that failed to meet these criteria: 1) remove cells with fewer than 2,000 detected genes; 2) remove cells with an alignment rate less than the mean minus 3 standard deviations; 3) remove cells with a number of reads after log10 transformation not within 3 standard deviations below or above the mean; 4) remove cells with the number of UMI after log10 transformation not within 3 standard deviations below or above the mean; 5) remove cells with the percentage of reads aligned to mitochondrial genes not within 3 standard deviations below or above the mean; 6) remove mock-infected cells with 1 viral read detected. Although we filtered cells based on the size of the total mRNA pool (i.e., total reads) as one of the criteria described above, we also checked the distribution of the size of the host mRNA pool (i.e., host reads) for individual cells. As reported in recent virus-infected single cell studies [9, 42], while virus-induced host shutoff of gene expression [43] is likely to be mainly mediated by host mRNA degradation, it remains unclear what effect viral transcription has on the size of the total mRNA pools. In our study, we observed that virtually all the cells with substantially smaller or larger host mRNA pools–where the number of host reads after log10 transformation was not within 3 standard deviations below or above the mean–were already being excluded when applying the 6 criteria described above, thus we decided to filter cells based on these. Approximately 120-250 cells were eliminated from each sample, with a median of 164 cells per sample. We further removed genes that were detected in fewer than three cells. After initial cell and gene quality control, the majority of samples had approximately 3,000-3,500 cells left for downstream analysis. The expression levels of host genes were normalized based on the size of the host mRNA pool.
Cell clustering with Seurat
An unsupervised cell clustering to identify subtle changes in the population structure after infection was performed on each time point data separately following the procedures of the Seurat package (v2.1.0) [44] using the normalized, scaled, and centered host gene expression matrix, without considering the relative abundance of virus transcripts within individual cells. Briefly, the highly variable genes with average expression < 4 and dispersion > 1 were used as input for the PCA. The statistically significant and biological meaningful PCs determined by the built-in jackstraw and elbow analyses and manually exploration were retained for visualization by t-distributed stochastic neighbor (t-SNE) and subsequent clustering by a shared nearest neighbor (SNN) graph-based approach. The legitimacy of the initially identified clusters was validated using the “ValidateClusters” function in Seurat, which built a support vector machine (SVM) classifier with significant PCs and then applied the accuracy cutoffs of 0.9 and the minimal connectivity threshold of 0.001. To identify markers that are differentially expressed among clusters, the “FindMarkers” function in Seurat was used with the different test options, including “bimod” [45], “poisson”, “negbinom”, and “MAST” (v1.4.1) [46]. Only the genes that showed at least 0.25-fold difference on the log-scale between two groups and were expressed in at least 25% of the cells in either group were tested for differential expression. Significantly differentially expressed genes for each cluster (i.e., the markers) were identified by applying the adjusted p-value cut-off of 0.05. Markers identified by all four methods were retained. GO term over-representation analysis of the up-regulated markers was performed with the online service DAVID [47, 48] by applying the Benjamini p-value cut-off of 0.05. To determine the cell cycle stage associated with individual cells, cell-cycle scoring and assignment were performed with Seurat using the “CellCycleScoring” function based on the expression of canonical markers [49].
Computational analyses of bulk RNA-seq and virus stock sequencing data
The raw bulk RNA-seq and virus stock sequencing data was first trimmed with trimmomatic (v0.36) [50] to remove the adaptors and trim off low quality bases. Reads with a minimal length of 36 bases in the trimmed bulk RNA-seq dataset were aligned to the concatenated human (hg19) and influenza A/Puerto Rico/8/34 (H1N1) reference with STAR (v2.5.3a) [41] with the default parameters, and counted with featureCounts [51] in the Subread package (v1.5.1) [52], while reads in the virus stock sequencing dataset were aligned to the influenza A/Puerto Rico/8/34 (H1N1) reference with STAR (v2.5.3a) [41]. Differential expression analysis was performed with DESeq2 (v1.18.1) [53] and edgeR (v3.20.5) [54, 55] using the bulk RNA-seq and the merged single-cell RNA-seq data. Host genes with the adjusted p-value < 0.05 were identified as significantly differentially expressed at each time point.
Deletion junction identification, filtering, and quantification
Reads that aligned to both ends of a viral segment, with each aligned portion comprised of at least 10 nucleotides in length, were collected. Following UMI de-duplication, reads with junction coordinates within a 10-nucleotide window were grouped together. We excluded from downstream analyses reads with junction coordinates that occurred fewer than 10 times in each sample. To compare quantitatively gap-spanning reads for each segment across cells and samples, the number of gap-spanning reads was normalized to the total number of non-gap-spanning reads aligned to a 100nt region centering the coverage peak at the 3’ end. For the polymerase segments, gap-spanning reads were also filtered by the size of deletions (i.e. ≥ 1000 nucleotides). The 1000 nucleotides threshold for the polymerase segments was chosen based on the distribution of the size of the deletions (Supplementary Fig. 19), as the majority of the deletions are larger than 1000 nucleotides. As cells with low infection (especially those harvested at the early stage of infection or inoculated at a low MOI) typically have poor coverage for the viral segments of interest, the DVG/FL ratio in those cells calculated as described above is typically inflated and may even fail to be calculated because all the viral reads corresponding to a given segment are gap-spanning or there are no viral reads. To mitigate this effect, we overwrote those values to 0 in the datasets, including the dataset collected from HBEpC cells at 6hpi.
Generation of PR8-derived DI162, DI222, DI244, and PB2del virus
To generate the PR8 PB2-DI162, -DI222, and -DI244 reverse genetics plasmids, gBlocks Gene Fragments (Integrated DNA Technologies, California, USA) were ordered using the corresponding defective PB2 genomic sequences identified in this study (PR8-DI162 and PR8-DI222) or previously reported (DI244) [18]. To generate the PR8 PB2del plasmid, a point mutation at nucleotide position 265 was introduced into the PR8 PB2 genomic sequence using the mutagenesis primer PR8-PB2-stopdel-F (5’-TTCATTTACTCCATTAAGTTTGTCCTTGCTC-3’). The reverse genetics plasmids were cloned into the pBZ61A18 reverse genetics vector as previously described [56]. To rescue the PR8 DI viruses, each of the sequence-confirmed PB2-DI or PB2del plasmids was co-transfected into 293T-AX4/PB2 co-cultured cells with the 7 reverse genetics plasmids for PR8 PB1, PA, HA, NP, NA, M, and NS and a PB2 protein-expression plasmid using Lipofectamine™ 3000 Transfection Reagent (Invitrogen, California, USA). The supernatant was collected on day 2 post-infection and passaged twice in the AX4/PB2 cell line, which expresses the PB2 protein in trans [39]. Viral titers were determined by TCID50 assay in AX4/PB2 cells.
Interference test for PR8-DI162 and PR8-DI222 viruses and validation of their immunostimulatory effects
To test the interfering ability of PR8-DI162 and PR8-DI222 viruses that carry the deletions identified in two dominant PB2-DVG species from the single cell dataset, their inhibitory effects on wild type PR8 virus replication was quantified in cultured cells at different DI/WT ratios. Subconfluent monolayers of A549 cells in 12-well plates were washed with infection media (MEM supplemented with 0.15% BSA fraction V, 1% antibiotic-antimycotic and 1 µg/mL TPCK-treated trypsin) and each well was co-infected with one type of defective virus at a MOI of 5 and the PR8 virus either at a MOI of 0.005 for a DI/WT ratio of 1000:1 or at a MOI of 5 for a DI/WT ratio of 1:1 in 1 mL infection media. For a DI/WT ratio of 1000:1 co-infection, plates were transferred to a 37°C, 5% CO2 incubator and supernatants were collected at 2 hours post infection, and 1, 2, and 3 days post infection. For a DI/WT ratio of 1:1 co-infection, infection was similar to that used for single-cell RNA-seq, including incubating at 4°C for 30 minutes with agitation every 5-10 minutes after infection followed by adding pre-warmed A549 infection media and transferred to a 37°C, 5% CO2 incubator for further incubation. Supernatants were collected at 2, 6, 12, 24 hours post infection and titrated as described above. Wild type PR8 yield from collected supernatants was determined by TCID50 assay using MDCK cells, which do not support replication of DI viruses due to the lack of functional PB2 proteins.
To determine if the DI viruses could stimulate a higher expression of some ISGs identified in the network analysis, subconfluent monolayers of A549 cells in 12-well plates were infected with each DI virus (PR8-DI162, -DI222, or -DI244) or WT virus at a MOI of 10 in triplicate. Total RNA from cells collected at 24hpi was extracted using RNeasy Mini Kit (Qiagen, Hilden, Germany). 100ng RNA was subsequently used as template for reverse transcription using SuperScript IV system (Invitrogen, California, USA) with Oligo d(T)20 primer. qPCR was performed using PowerUp SYBR Green (Applied Biosystems, Massachusetts, USA) in triplicate with the following primers: IFI35: Hs.PT.58.38490206 (IDT, Iowa, USA); IFI27: IFI27fw (5’-GCCTCTGGCTCTGCCGTAGTT-3’) and IFI27rev (5’-ATGGAGGACGAGGCGATTCC-3’) [57]; MX1: MX1fw (5’-CTTTCCAGTCCAGCTCGGCA-3’) and MX1rev (5’-AGCTGCTGGCCGTACGTCTG-3’) [57]; IFNB: IFNBfw (5’-TCTGGCACAACAGGTAGTAGGC-3’) and IFNBrev (5’-GAGAAGCACAACAGGAGAGCAA-3’) [58]; IFNL: IFNLfw (5’-GCCCCCAAAAAGGAGTCCG-3’) and IFNLrev (5’-AGGTTCCCATCGGCCACATA-3’) [58]; β-actin (ACTB): ACTBfw (5’-GAACGGTGAAGGTGACAG-3’) and ACTBrev (5’-TTTAGGATGGCAAGGGACT-3’) [59]; FLU: mCDC-FluA-F-PR8 (5’-GACCAATCCTGTCACCTCTGA-3’) and mCDC-FluA-R (5’-AGGGCATTTTGGACAAAGCGTCTA-3’). For all the targets, qPCR parameters according to manufacturer’s recommendation were: 95°C for 10 min and then 45 cycles of 95°C for 15 s, 57°C for 15 s, and 60°C for 60 s. Fold change of target gene expression was calculated using the 2−ΔΔCT method normalized to ACTB.
Gene co-expression network analysis
Multiscale Embedded Gene Co-Expression Network Analysis (MEGENA) [26] was performed to identify host modules of highly co-expressed genes in influenza infection. The MEGENA workflow comprises 4 major steps: 1) Fast Planar Filtered Network construction (FPFNC), 2) Multiscale Clustering Analysis (MCA), 3) Multiscale Hub Analysis (MHA), 4) and Cluster-Trait Association Analysis (CTA). A cutoff of 0.05 after perturbation-based FDR calculation was used. The differentially expressed genes are identified between virus-infected cells of a particular cluster and the rest of the cells in the same population determined by a t-test. Only the genes that showed at least 0.25-fold difference on the log-scale between two groups, and were expressed in at least 25% of the cells in either group, were tested for differential expression. Significantly differentially expressed genes were identified by applying the adjusted p-value cut-off of 0.05.
Identification of enriched GO terms, key regulators in the host module, relative abundance of virus transcripts associated with host modules
To functionally annotate gene signatures and gene modules identified in this study, enrichment analysis was performed of the established gene ontology (GO) categories. The hub genes in each subnetwork were identified using the adopted Fisher’s inverse Chi-square approach in MEGENA; Bonferroni-corrected p-values smaller than 0.05 were set as the threshold to identify significant hubs.
The relative abundance of virus transcripts and the DVGs associated with the host modules were identified using Spearman correlation between the first principal component of the gene expression in the corresponding module and the relative abundance of viral transcripts. Significantly associated traits were identified using the Benjamini-Hochberg FDR-corrected p-value 0.05 as the cutoff.
STATISTICAL ANALYSIS
The statistical significance of the changes in the relative abundance of viral and DVG transcripts between 3 or more groups of cells was first determined by the one- or two-tailed Kruskal-Wallis rank sum test, followed by the one-tailed Wilcoxon rank sum test to calculate the pairwise comparisons. The statistical significance of expression fold changes in the qPCR validation assay was determined using the two-tailed Student’s t-test. A p-value of ≤ 0.05 was considered statistically significant.
CODE AVAILABILITY
The code used to generate all the results is available on Github (https://github.com/GhedinLab/Single-Cell-IAV-infection-in-monolayer).
DATA AVAILABILITY
Sequencing data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) repository with the accession codes GSE118773 (currently private record).
AUTHOR CONTRIBUTIONS
All authors read and approved the manuscript. E.G. conceived and designed the experiments, supervised research, and wrote the manuscript. B. Zhou conceived and designed the experiments, supervised research, performed the infection assays, and wrote the manuscript. C.W. performed the infection assays and bulk RNA-seq library preparation, analyzed the data, and wrote the manuscript. C.V.F. performed the gene co-expression network reconstruction and module-based functional enrichment analysis and wrote the manuscript. T.C. performed the defective virus generation, the interference and validation assays, and wrote the manuscript. A.G. performed the sequence library preparation and contributed to the analyses. M.W. and B. Zhang contributed to data analyses. W.H., M.S., and R.S. performed the single-cell library preparation and contributed to the analyses.
ACKNOWLEDGEMENTS
We thank members of the Ghedin laboratory for feedback and discussion. We thank Tara Rock, Nicholas Rouillard, Olivia Micci-Smith, and Mohammed Khalfan of the Genomics Core Facility at the Center for Genomics and Systems Biology, New York University for sequencing. We thank Dr. Yoshihiro Kawaoka for providing the AX4/PB2 cells and Dr. Peter Palese and Dr. Adolfo Garcia-Sastre for providing the PR8 RGs plasmids. We thank Dr. Meike Dittmann for her feedback on IFNs and ISGs. This work was supported by NIAID/NIH U01 AI111598. This work was also supported in part through the NYU IT High Performance Computing resources, services, and staff expertise.