Zinc finger antiviral protein (ZAP) activity in mammalian and avian hosts in CpG and UpA-mediated restriction of RNA viruses and investigation of ZAP-mediated shaping of host transcriptome compositions

The ability of zinc finger antiviral protein (ZAP) to recognise and respond to RNA virus sequences with elevated frequencies of CpG dinucleotides has been proposed as a functional part of the vertebrate innate immune antiviral response. It has been further proposed that ZAP activity shapes compositions of cytoplasmic mRNA sequences to avoid self-recognition, particularly mRNAs for interferons (IFNs) and IFN-stimulated genes highly expressed when ZAP is upregulated during the antiviral state. We investigated the ZAP functional activity in different species of mammals and birds, and potential downstream effects of differences in CpG and UpA dinucleotide representations in host transcriptomes and in RNA viruses that infect them. Cell lines from different bird orders showed variability in restriction of influenza A virus and echovirus 7 replicons with elevated CpG frequencies and none restricted UpA-high mutants, in marked contrast to mammalian cell lines. Given this variability, we compared CpG and UpA representation in coding regions of ISGs and IFNs with the total cellular transcriptome to determine whether differences in ZAP activity shaped dinucleotide compositions of highly expressed genes during the antiviral state. While type 1 IFN genes typically showed often profound suppression of CpG and UpA frequencies, there was no over-suppression of CpGs or UpAs in ISGs in any species, irrespective of underlying ZAP activity. Similarly, mammalian and avian RNA virus genome sequences were compositionally equivalent as were IAV serotypes recovered from ducks, chickens and humans. Overall, we found no evidence for host variability in ZAP function impacting compositions of antiviral genes.


INTRODUCTION 40
Cellular innate immune responses are typically targeted towards pathogen-associated molecular patterns (PAMPs) that enable cellular recognition pathways to differentiate infecting agents from cellular components. Typical PAMPs displayed by viruses that are recognised in vertebrate cells include cytoplasmic double stranded RNA or DNA sequences, non-methylated CpG dinucleotides in 45 viral DNA and abnormally terminated (uncapped and non-poly adenylated) RNA sequences (1, 2).
Alternatively, viruses may be recognised through their possession of conserved virus capsid structures, such as the nucleocapsids of retroviruses targeted by TRIM proteins, or complexes formed during virus budding by tetherin (3)(4)(5)(6). Recently, selective binding of RNA sequences enriched for CpG dinucleotides by zinc finger antiviral protein (ZAP) was described (7) and this form of compositionally 50 abnormal RNA represents a potential PAMP. ZAP binding triggers antiviral mechanisms that potently restrict replication of RNA viruses and retroviruses with elevated frequencies of CpG in their genomes (8)(9)(10)(11)(12)(13). ZAP-dependent restriction of viruses with elevated frequencies of UpA has been similarly characterised (11,13) and may be mediated through overlapping pathways. While the structural basis of ZAP-mediated restriction of CpG-high sequences has been partially characterised, there appear to 55 be multiple mechanisms downstream that restrict virus replication, including a dependence on TRIM25 (7,(13)(14)(15)(16), activation of the nuclease KHNYN (8), a potentially non-canonical activation of oligoadenylate synthetase and its downstream RNAseL RNA degradation pathway (11), or through effects on translation initiation from the bound RNA template (17)(18)(19).

60
The ability of ZAP to identify and restrict replication of viruses with high CpG and UpA frequencies is contingent on the pervasive suppression of both dinucleotides in vertebrate cellular mRNA sequences that remain below the ZAP "radar". Frequencies of CpG in vertebrate mRNAs range from 20%-80% of expected frequencies based on the frequencies of their component bases (20); suppression is particularly pronounced in mammals and birds and in sequences with a low G+C content (21). Togaviridae, such as Sindbis virus, which shows much higher CpG frequencies than other RNA viruses of similar G+C content. This has been shown to enable ZAP-mediated restriction of wild type virus in mammalian cells (19,30,31). We have proposed incomplete suppression of CpG may represent an "adaptive compromise" for these and other groups of vector-borne viruses, such as flaviviruses (32).

85
While the replication of high CpG mutants of Zika virus was restricted in mammalian cell culture, they replicated to higher titres in insect cell culture and showed greater systemic spread and excretion in saliva in experimentally infected mosquitoes (32). Flaviviruses that replicate in insects only (insectspecific flaviviruses; ISFs) show no equivalent CpG suppression (33), mirroring the absence of methylation and CpG suppression in insect and many other arthropod groups. This supports the idea 90 that dual host-specificity places conflicting evolutionary pressures on genome composition, a problem that may be resolved differently in different virus groups. Alphaviruses appear to have adopt a CpG high genome to facilitate replication in mosquitoes, while vector-borne flaviviruses may be better optimised for replication in vertebrate hosts. The remarkable observation that the apparent host range restriction of ISFs to insect cells may be overcome in mammalian cells by knocking out ZAP 95 expression (34) underlines the involvement of this pathway in determining host range. A specific role of ZAP in preventing transmission of arthropod viruses that do not suppress CpG to mammals and birds remains a speculative possibility; if so, it would be a protective mechanism that vector-borne viruses have able to breach, albeit at some evolutionary cost.

100
It has been previously proposed that in addition to controlling RNA virus replication, ZAP and allied antiviral restriction pathways may play roles in host gene regulation (35). For example, the greater than expected suppression of CpG frequencies in human type I interferon (IFN) genes (encoding IFNα, IFN-β) was proposed as a mechanism to enhance expression of these critical antiviral proteins (35).
The IFN-inducibility of the long isoform of ZAP (36,37)   Cloning of IAV mutants. The WSN system was elaborated by Hoffman and colleagues (43) Table S3 (Suppl. Data).
Detection of positive selection in ZAP genes. Avian ZAP (encoded by the ZC3HAV1 gene) sequences were obtained from Ensembl (Table S4; Suppl. Data). Sequences were aligned using Muscle v. 3.8.425 265 (45) and trimmed to remove the highly diverse region between the fourth and fifth zinc-finger motif.
To test for positive selection across sites in the alignment, maximum likelihood analysis of ratios of nonsynonymous to synonymous nucleotide substitutions (dN/dS; ω) was performed with the codeml package of programs in PAML v. 4.9 (46,47). Various site models were fitted to the multiple alignments: M1a (neutral model; two site classes: 0 < ω 0 < 1 and ω 1 = 1); M2a (positive selection; three 270 site classes: 0 < ω 0 < 1, ω 1 = 1, and ω 2 > 1); M7 (neutral model; values of ω fit to a beta distribution where ω > 1 disallowed); M8 (positive selection; similar to M7 but with an additional codon class of ω > 1); and M8a (neutral model; similar to M8 but with a fixed codon class at ω = 1). Likelihood ratio tests were performed on pairs of models to assess whether models allowing positively selected codons gave a significantly better fit to the data than neutral models (model comparisons were M1a vs. M2a,

275
M7 vs. M8, and M8a vs. M8). In situations where the null hypothesis of neutral codon evolution could be rejected (P < 0.05), the posterior probability of codons under selection in M2a and M8 was inferred using the BEB algorithm (48). In addition, to test for positive selection in different branches of the phylogeny, the 'free-ratios' model (which fits one ω for every branch in the tree) was implemented in PAML (49).

280
To calculate the nucleotide diversity in the avian ZAP sequences used for PAML analysis, the alignment was analysed using the PopGenome package in R (50). A sliding window of 100 bp and an increment of 1 bp was used to calculate nucleotide diversity across the alignment, which is expressed as the average pairwise number of variant sites per 100 bp window.  (11).
The CpG-H replicon showed a 10-fold reduction in replication compared to WT in the control human A549 cell line as previously observed (42)  To verify the apparent difference in restriction of the E7 replicon mutants between avian species in a  Furthermore, one positively selected site, 527V, was found within the NAD+ binding site in the PARP domain (Fig. 5A).
To determine which lineage(s) have undergone positive selection in avian phylogeny, the same 390 alignment was analysed using the 'free-ratios' test in PAML, which assigns a ω value for each branch in the tree. The most striking burst of positive selection was found at the base of the Galliformes, consistent with a change of ZAP function in this lineage (Fig. 5B). Weaker instances of positive selection were found in the Neoaves, but purifying selection predominated elsewhere in the tree. As adaptive evolution frequently associates with genes involved in host-viral arms races (39,52), the findings are  We obtained similar data for avian ISGs identified as homologues of mRNA sequences of ISGs from the three human ISG datasets (Table S8; Suppl. Data). G+C contents of chicken ISGs similarly spanned 440 the range of host mRNA sequences (Fig. 8), with no significant differences in CpG or UpA representation in these or ISG homologues in ducks (Table 1A), apart from a slightly elevated frequency of UpA dinucleotides in duck-derived ISGs. In a separate analysis, the 50 most upregulated ISGs in chicken (listed in (38)) similarly showed no significant difference in CpG representation from chicken mRNAs, either by composition comparison or by regression. Both CpG and UpA 445 representations were comparable with host transcriptome and with each other by on G+C composition regression analysis (Fig. 7, Table 1B; Table S6, Suppl. Data). Interferon-regulated genes (IRGs) showed significantly higher CpG and UpA compositions, but also a higher G+C content, and there was no significant difference in their distributions by linear regression comparisons.

450
In contrast to cellular mRNAs and ISGs, IFN genes (listed in Table S9; Suppl. Data) showed often marked dinucleotide frequency differences from the host transcriptome. Formal analysis of these differences was complicated by the restricted ranges of G+C contents compared to that of the host transcriptome.
Accordingly, frequency comparisons were performed within the G+C content range of the IFN subset being compared, for example between 40.2% and 50% for the comparison of human IFN-α genes with 455 host sequences ( Fig. 7; Table 2). Summarising, human and mammalian IFN-α showed highly suppressed frequencies of both CpG and UpA compared to cellular genes of similar G+C contents (Figs. 6, 7; Table 2). IFN-β genes showed lower degrees of suppression of CpG, while IFN-γ gene CpG frequencies were higher than those of the corresponding mRNAs. Both IFN-β and IFN-γ genes showed marked suppression of UpA frequencies. 460 In contrast to mammalian interferons, avian IFN-α and IFN β genes (listed in Table S10; Suppl. Data) showed elevated CpG representations compared to chicken (Fig. 8, Table 2) and duck (data not shown) mRNA transcriptomes, although there was a moderate suppression of CpG dinucleotides in IFN-γ genes. In marked contrast, all type I and II IFN genes showed significantly suppressed frequencies of 465 UpA compared to the avian cellular transcriptome ( Fig. 8; Table 2).  To investigate this, we collected datasets of RNA viruses from the ICTV virus metadata resource (VMR)

CpG and UpA representation in avian and
representative of each species of + and -strand RNA viruses and retroviruses annotated for mammalian or avian hosts but excluding dual-host viruses such as arboviruses (Table S11; Suppl. Data).
Genomes were split into component genes and compositional analyses performed on their coding sequences; sequences shorter than 450 bases in length were excluded from analysis. Overall, a total 480 of 1929 genes from 554 mammalian RNA virus genomes and 319 genes from 74 avian viruses drawn from 21 different virus families were compared (Tables S11, S12; Suppl. Data). In general, distributions of CpG and UpA frequencies in RNA viruses infecting both mammals and birds were fully overlapping with each other and with dinucleotide compositions of cellular genes (Fig. 9). Although the mean G+C content of RNA viruses was lower than that of human or avian mRNA sequences, suppression of CpG 485 and UpA dinucleotides showed a similar relationship with G+C content as observed in cellular mRNAs.
Correlation coefficients and trajectories of G+C / dinucleotide relationships for mammalian and avian RNA viruses were minimally different from each other and comparable to those of avian (Fig. 9B) and mammalian (Fig. 9A)  Similar findings were obtained using coding corrected dinucleotide representations (Table S15B; Suppl. Data).
Overall, this analysis did not show convincing differences in dinucleotide compositions between 535 mammalian and avian RNA viruses, nor between those infecting avian hosts with demonstrated differences in ZAP activity. Host-associated differences in ZAP function. We documented substantial variability in the abilities of cell lines derived from different avian species to preferentially restrict the replication of high CpG 560 mutants of IAV and the E7 replicon. Duck derived cells restricted high CpG IAV and E7 equally potently as the range of mammalian cell lines tested (Fig. 2), the pigeon cell line showed a marginal restriction of the high CpG E7 replicon, while all cell lines derived from chicken, other galliforms and zebra finch displayed none. It should be noted, however, that the varying permissivity of the different cell lines for E7 replication may originate from any number of cellular mechanisms and this required us to 565 normalise replication to that of the WT replicon (y-axis; Fig. 2).
This format contrasts with a previous investigation of the activity of avian-derived ZAPs, where the replication of CpG modified mutants of HIV-1 was compared in mammalian HEK293T ZAP k/o cells cotransfected human/test species ZAP chimaeras and a cognate TRIM25 co-factor required for activity 570 (40). Using this human cell transfection system, duck and eagle ZAP/TRIM25-transfected cells mediated marked and selective suppression of CpG-high mutants of HIV-1, consistent with the restriction in CCL-141 (duck) cells observed in the current study. However, while chicken ZAP/TRIM25transfected cells also showed some antiviral activity, restriction was non-selective and similarly active in both CpG-H and WT variants of HIV-1. As our assay readout assigned restriction by normalisation to 575 the replication of WT replicon, such non-selective activity in for example, the DF-1 chicken cell line, would have been undetectable. However, if we equate CpG-high/WT replication ratios (Fig. 2) with ZAP/TRIM25 selectivity (40), the avian species-associated differences we found were comparable to those identified in the heterologous expression system; specifically, chicken, turkey and zebra finch showed little CpG selectivity in virus inhibition, in contrast to duck and eagle-derived ZAP/TRIM25.

580
Additional evidence for host differences in dinucleotide-mediated virus recognition was provided by the absence of restriction of UpA-high E7 replicons (Fig. 2B) and compositionally modified IAV (Fig. 3B) in the duck cell line CCL-141 that was otherwise capable of potent restriction of CpG-high mutants.
This contrasts with substantial restriction of the UpA-H double mutant of E7, comparable to that of 585 the CpG-H mutant observed in all mammalian cells ( Fig. 2A). As the recognition mechanism of UpAhigh mutants remains undetermined, and may indeed be mediated through a non-ZAP PRR (12), these findings do not necessarily implicate an altered substrate specificity of avian ZAPs. Measurement of the restriction of UpA-high mutants of E7 using the avian ZAP/TRIM25 transfection method would be informative in this respect. genome composition further demonstrated no substantial differences between RNA viruses infecting 610 mammals from those infecting birds, findings that contrast with previous conclusions (40,60). Of direct relevance to the transmission of IAV between bird species, we found no evidence for systematic compositional differences between IAV strains isolated from ducks and chickens, even though we might hypothesise that the absence of ZAP-mediated discrimination of CpG-high sequences (40) in chicken cells might have enabled variants with less CpG-suppressed genomes to have evolved.

615
More striking is the more general similarity in CpG and UpA representations between mammalian and avian flu strains. It was previously shown following the zoonotic transmission of H1N1 from an inferred avian host into humans was associated with a sustained and decades long reduction in CpG dinucleotide frequencies as the H1N1 strain spread and became established in humans following its 620 emergence in 1918 in the Spanish flu pandemic. (55). This was accounted for by the mechanistically unproven supposition that human cells were more restrictive for CpG than avian cells (56), that in retrospect might be interpreted as indicative of more active ZAP function (7,40). It was further proposed that recently transmitted IAV strains may be more pathogenic in humans because of their greater CpG frequencies and consequent activation of responses that lead to immunopathology (35, 625 56). While superficially attractive as an explanation for the pathogenicity of recently transmitted flu strains, such as those with H5 and H7 HA segments (61), the whole framework might be challenged on several grounds. Firstly, on an extended comparison, we found no systematic compositional difference between IAV strains recovered from human and either species of bird using linear regression that accommodates the effects of G+C content on CpG composition ( Fig. 11; Table S13,   630 Suppl. data). Most human strains included in the comparison were the H1N1 and H3N2 serotypes, long established in human populations, while the duck and chicken strains were primarily avian serotypes also with likely long term host associations. The compositional similarities therefore did not arise simply from possible recent cross-species transmission as might be the case for H5N1. The second grounds for questioning the hypothesis is the previously discussed demonstration of an equally 635 potent ZAP mediated restriction of high CpG viruses in duck cells (Fig. 2B) and evidence for similar restriction of CpG mutants of HIV-1 by the duck-derived ZAP construct (40). As ducks form the principal reservoir of IAV in nature (62), it is somewhat problematic to base a hypothesis on an assumption that IAV is less compositionally restricted in avian hosts than in humans.

640
Influence of ZAP on host transcriptome composition. The ability of ZAP to recognise and restrict RNA viruses appears to rely on its ability to detect clustered CpG dinucleotides (7). ZAP-mediated selection may be the mechanism underlying the previously inferred compositional selection against CpG and UpA suppression and their G+C compositional dependence arising from previous mutational modelling in mammalian transcriptomes (25). Further indirect evidence for the potential shaping 645 effect of ZAP on host gene composition is provided by analysis of interferon genes. As previously described for human IFN-α genes (35,63), human and all other mammalian type I IFN genes investigated showed substantially greater CpG suppression than other mRNA sequences (Fig. 7), but a similar dependence on G+C content as found in other mRNA sequences. We also observed much greater suppression of UpA in IFN genes compared together cellular mRNAs.

650
Long before the discovery of ZAP-mediated recognition of CpG dinucleotides, Greenbaum et al. proposed that this extreme suppression may enable greater of expression of IFNs perhaps at the expense of other cellular genes not engaged in antiviral defence (35,56).        (Table S4A; Suppl. Data) and following trimming to remove a region of no homology between the fourth and fifth zinc-finger motifs, a sliding-window analysis of sequence diversity was  Tables S10 and S11 (Suppl. Data).

FIGURE 11 COMPARISON OF CpG AND UpA COMPOSITIONS OF MAMMALIAN AND AVIAN IAV STRAINS
Distributions of CpG and UpA representations plotted against G+C content for IAV strains derived from different hosts; source sequences and serotype totals listed in Tables S13, S14 (Suppl. Data).