Pervasive Adaptation of Hepatitis C Virus to Interferon Lambda Polymorphism Across Multiple Genotypes

Genetic polymorphism in the interferon lambda (IFN-λ) region is associated with spontaneous clearance of hepatitis C virus (HCV) infection and with response to interferon-based antiviral treatment. Here, we evaluate the associations between IFN- λ polymorphism and HCV variation through a genome-to-genome analysis in 8,729 patients from diverse ancestral backgrounds infected with various HCV genotypes. We searched for associations between rs12979860 genotype, a tag for IFN-λ haplotypes, and amino acid variants in the NS3, NS4A, NS5A and NS5B HCV proteins. We report multiple associations between host and pathogen variants in the full cohort as well as in subgroups defined by viral genotype and human ancestry. We also assess the combined impact of human and HCV variation on pre-treatment viral load. By demonstrating that IFN-λ genetic variation leaves a large footprint in the viral genome, this study provides strong evidence of pervasive viral adaptation to host innate immune pressure during chronic HCV infection.


Introduction
Infection with hepatitis C virus (HCV), a positive strand RNA virus of the Flaviridae family, represents a major health problem, with an estimated 71 million chronically infected patients worldwide 1 . In the absence of treatment, 15-30% of individuals with chronic HCV infection develop serious complications including cirrhosis, hepatocellular carcinoma and liver failure [2][3][4][5] .
Seven major genotypes of HCV have been described, further divided into several subtypes 6,7 . Moreover, within each infected individual, multiple distinct HCV variants co-exist as quasipecies 8 . Inter-host and intra-host HCV evolution is shaped by multiple forces, including human immune pressure 9 . To investigate the complex interactions between host and pathogen at the level of genetic variation, we proposed a genome-to-genome approach that allows the joint analysis of host and pathogen genomic data 10 . Using an unbiased association study framework, a genome-to-genome analysis aims at identifying the escape mutations that accumulate in the pathogen genome in response to host genetic variants.
Ansari et al. 11 used this approach to analyze a cohort of individuals of white ancestry predominantly infected with genotype 3a HCV; they identified associations between viral variants and human polymorphisms in the interferon lambda (IFN-λ) and HLA regions, demonstrating an impact of both innate and acquired immunity on HCV sequence variation during chronic infection.
The IFN-λ association is of particular interest considering the known impact of this polymorphic region on spontaneous clearance of HCV and on response to interferon-based treatment [12][13][14][15] . The rs12979860 variant, located 3 kb upstream of the IL28B gene (encoding IFN-λ3), showed the strongest correlation with treatment-induced clearance of infection in the first report 12 . More recent studies have shown that rs12979860 tags a dinucleotide insertion/deletion polymorphism, IFNL4 rs36823481516, which controls generation of the IFN-λ4 protein and is thus the most likely functional variant in the region 28 . The two variants (rs12979860 and rs368234815) are in strong linkage disequilibrium, which explains the differences in HCV outcomes originally associated with rs12979860.
Here, we aim at characterizing the importance of innate immune response in modulating chronic HCV infection by describing the footprint of IFN-λ variation in the viral proteome.
Using samples from a heterogeneous cohort of 8,729 HCV-infected individuals, we genotyped the single nucleotide polymorphism (SNP) rs12979860, a known marker of IFNλ haplotypes, and obtained partial sequences of the HCV genome (NS3, NS4A, NS5A and NS5B genes). We tested for associations between rs12979860, HCV amino acid variants and pre-treatment viral load. We show that IFN-λ polymorphism has a pervasive impact on HCV, by describing multiple associations between host and pathogen variants in the full cohort and in subgroups defined by viral genotype or human ancestry. We also present an association analysis of human and viral variants with HCV viral load, which allows for a better understanding of the connections between genomic variation, biological mechanisms and clinical outcomes.

Host and pathogen data
We obtained paired human and viral genetic data for 8,729 HCV-infected patients participating in various clinical trials of anti-HCV drugs. The samples were heterogeneous in terms of self-reported ancestry (85% Caucasians, 13% Asians and 2% Africans) and HCV genotypes, with a majority of HCV genotype 1a, 2a and 3a ( Table 1). On the host side, we genotyped the SNP rs12979860, which reliably tags the known IFN-λ polymorphism in Europeans and Asians 14 . On the pathogen side, we performed deep sequencing of the coding regions of the non-structural proteins NS3, NS4A, NS5A and NS5B 16

Genome-to-genome analyses
We observed highly significant associations between rs12979860 and multiple HCV amino acid variants throughout the HCV proteome (Figure 1). With a Bonferroni correction threshold of 4.7 x 10 -6 , 97 significant association were observed under additive model (Supplementary Table 1) and 111 under recessive model (Supplementary Table 2).
Presence of Proline at position 156 of NS5B showed the strongest association with To test for potential heterogeneity of the association signals, we ran sensitivity analyses in various subsets of the study population: First, we performed separated association studies for each HCV genotype group, restricting the analysis to the following genotypes, present in at least 100 participants: 1a, 1b, 2a, 2b, 3a and 4a. The numbers of significant associations (pthreshold < 4.7 x 10 -6 ) under additive and recessive models, per genotype, are shown in Figure 2A. To further understand these associations, we performed a residual regression analysis. We searched for associations between the HCV amino acid variants and viral load residuals, obtained after regressing viral load on rs12979860. The objective of this analysis was to identify amino acids associated with changes in viral load that cannot be entirely explained by IFN-λ variation. We detected 107 significant associations in total from this analysis ( Figure 3B). Interestingly, 22 amino acids, which associated with IFN-λ variations, also showed significant association with viral load as well as viral load residuals (Supplementary

Discussion
We used an integrated genome-to-genome approach to explore the impact of human genetic variation in the IFN-λ region on the HCV proteome during chronic infection. Our results reveal a strong footprint of innate immune pressure on the non-structural regions of the HCV genome and provide strong evidence for pervasive HCV adaptation to escape human immunity. We also performed sub-analyses in different sample groups, which showed a consistent impact of IFN-λ variation on HCV across genotypes and ancestry categories. Finally, we report viral amino acids significantly associated with both IFN-λ variations and viral load, indicating that some of the HCV clinical and biological outcomes can be explained through host-pathogen interactions.
Our analysis detected multiple associations in all tested proteins, including NS5A. This protein is required for HCV RNA replication and virus assembly and has been shown to associate with interferon signaling and hepatocarcinogenesis 18 . Previous studies have also shown strong associations between variants in the interferon sensitivity determining region of NS5A and viral load as well as response to IFN-based therapy 19,20 . Some of the strongest associations that we observed were in and around this highly variable interferon sensitivitydetermining region of NS5A, suggesting a possible role of these variants in determining the response to IFN-based antiviral treatment. We also detected a strong association for NS5A variant Y93H. This supports the previous findings that reported associations between Y93H mutation, IFN-λ genotype and HCV viral load 17 .
Around 32% of study participants harbored the "favorable" rs12979860 CC genotype, which associates with higher rates of spontaneous viral clearance and of successful response to IFN-based anti-HCV treatment. To measure the impact of the homozygous CC genotype on viral amino acid variants, we also performed association analyses using a recessive model.
We detected higher number of significant associations with the recessive model compared to the additive model: patients harboring the CC genotype showed a higher prevalence of amino acid changes, suggesting a stronger immune selective pressure on the virus in homozygous individuals.
This is the first comprehensive analysis of IFN-λ-driven HCV adaptation across different viral genotypes and ancestry groups. In addition to identifying genotype or ancestry specific associations, we observed sites of interaction that were consistent across HCV genotypes and ethnicities; for example, the NS5A variant Y93H. These results indicate that IFN-λdriven viral adaptation is a part of evolution across HCV genotypes.
In an attempt to delineate the biological impact of these associations, we evaluated the associations between HCV amino acid variants and viral load. We were able to detect a subset of amino acids that associated with both INF-λ variation and HCV viral load, supporting the clinical relevance of host and pathogen interactions. Furthermore, we also performed a similar analysis with residual viral load, i.e. the fraction of the viral load variance that that is not explained by INF-λ variation. We detected a group of viral amino acid variants that associated with SNP variations as well as residual viral load, indicating a stronger role of host-pathogen interactions in explaining the variations in HCV viral load.
Interestingly, only 25% of the host-driven HCV amino acid variants were found to be associated with viral load, indicating that a genome-to-genome analysis can reveal correlations that would go unnoticed in association studies that use more downstream laboratory measurements or clinical outcomes as phenotypes.
IFN-λ polymorphism is the strongest human genetic predictor of spontaneous HCV clearance and response to IFN-based therapy. By integrating INF-λ and HCV amino acid variation in a joint analysis, we here contribute to a better understanding of the genomic mechanisms involved in inter-individual differences in HCV disease outcomes. Contigs were generated from paired-end FASTQ files using VICUNA 21 and merged to create a de novo assembly sequence. All paired-end reads were merged using PEAR 22

Host genotyping
Human genotype was determined by means of PCR amplification and sequencing of the rs12979860 single-nucleotide polymorphism, which reliably tags the known IFN-λ polymorphism. Possible genotypes were CC, CT or TT.

Association analyses
We used logistic regression to search for associations between rs12979860 and the binary pathogen variants, under both additive and recessive models (CC genotype coded as 1), and including sex, country of origin, self-reported ethnicity, cirrhosis status and prior treatment experience as covariates. To account for viral phylogeny, the first 5 phylogenetic principal components 24 , calculated per HCV gene to account for recombination, were also added as covariates. We used muscle 25 to align the pathogen sequences, RaXML 26 to obtain the phylogenetic trees and R 27 for all other analyses.

Funding
This study was supported by Gilead Sciences as well as by the Swiss National Science Foundation (grant PP00P3_157529 to JF).

Figure 1: Genome-to-genome analysis results
Manhattan plot for associations between human SNP rs12979860 and HCV amino acid variants. The dotted line shows the Bonferroni-corrected significance threshold.