Abstract
Endemic Burkitt lymphoma (eBL), the most prevalent pediatric cancer in sub-Saharan Africa, is associated with malaria and Epstein Barr virus (EBV). In order to better understand the role of EBV in eBL, we improved viral DNA enrichment methods and generated a total of 98 new EBV genomes from both eBL cases (N=58) and healthy controls (N=40) residing in the same geographic region in Kenya. Comparing cases and controls, we found that EBV type 1 was significantly associated with eBL with 74.5% of patients (41/55) versus 47.5% of healthy children (19/40) carrying type 1 (OR=3.24, 95% CI=1.36 - 7.71, P=0.007). Controlling for EBV type, we also performed a genome-wide association study identifying 6 nonsynonymous variants in the genes EBNA1, EBNA2, BcLF1, and BARF1 that were enriched in eBL patients. Additionally, we observed that viruses isolated from plasma of eBL patients were identical to their tumor counterpart consistent with circulating viral DNA originating from the tumor. We also detected three intertypic recombinants carrying type 1 EBNA2 and type 2 EBNA3 regions as well as one novel genome with a 20 kb deletion resulting in the loss of multiple lytic and virion genes. Comparing EBV types, genes show differential variation rates as type 1 appears to be more divergent. Besides, type 2 demonstrates novel substructures. Overall, our findings address the complexities of EBV population structure and provide new insight into viral variation, which has the potential to influence eBL oncogenesis.
Key Points
EBV type 1 is more prevalent in eBL patients compared to the geographically matched healthy control group.
Genome-wide association analysis between cases and controls identifies 6 eBL-associated nonsynonymous variants in EBNA1, EBNA2, BcLF1, and BARF1 genes.
Analysis of population structure reveals that EBV type 2 exists as two genomic sub groups.
Introduction
EBV infects more than 90% of the world’s population and typically persists as a chronic asymptomatic infection.1 While most individuals endure a lifelong infection with minimal effect, EBV is associated with ~1% of all human malignancies worldwide. EBV was first isolated from an endemic Burkitt lymphoma (eBL) tumor which is the most prevalent pediatric cancer in sub-Saharan Africa.2 Repeated Plasmodium falciparum infections during childhood appear to drive this increased incidence.3 Malaria causes polyclonal B-cell expansion and increased expression of activation-induced cytidine deaminase (AID) dependent DNA damage leading to the hallmark translocation of the MYC gene under control of the constitutively active immunoglobulin enhancer.4–6 How EBV potentiates eBL is incompletely understood, however, the clonal presence of this virus in almost every eBL tumor suggests a necessary role.
EBV strains are categorized into two types based on the high degree of divergence in the EBNA2 and EBNA3 genes.7–9 This long standing evolutionary division is also present in orthologous primate viruses,10 yet remains unexplained. While EBV type 1 has been extensively studied,11, 12 because it causes acute infectious mononucleosis and other diseases in the developed world, type 2 virus studies have not kept pace since infected individuals are less frequent and found primarily in sub-Saharan Africa. While several recent studies have reported both types of EBV circulating in western countries,13, 14 the African context provides a better opportunity to examine viral variation because type 1 and type 2 are found in both eBL patients as well as healthy individuals.8, 15, 16 Viral variation has been shown to impact differential transformation and growth, and capacity to block apoptosis or immune recognition.7, 17, 18 However, studies focusing on only certain genomic regions/proteins potentially miss disease associations of other loci.19, 20 Although new studies have been conducted,21, 22 genome-wide examinations in case-control studies are few and often lack typing the virus.
To address this shortfall, whole genome sequencing of EBV is now attainable from tumor, blood, or saliva using targeted viral DNA capture methods.23–28 However, studying EBV from the blood of healthy individuals remains challenging due to low viral abundance relative to human DNA (1-10 EBV copy/ng blood DNA). In addition, EBV’s GC-rich genome is inefficiently amplified using conventional library preparation methods. Here, we present improved methods for EBV genome enrichment that allow us to sequence virus directly from eBL patients and healthy children. Leveraging these samples, we sought to define the viral population structure and characterize viral subtypes collected from children hailing from the same region of western Kenya. Additionally, we performed the first genome wide association study to identify viral variants that correlate with eBL pathogenesis.
Materials and Methods
Ethical approval and sample collection
For this study, we recruited children between 2009 and 2012 with suspected eBL, between 2-14 years of age, undergoing initial diagnosis at Jaramogi Oginga Odinga Teaching and Referral Hospital (JOOTRH; Kisumu), which is a regional referral hospital for pediatric cancer in western Kenya.29 We obtained written informed consent from children’s parents or legal guardians to enroll them in this study. Ethical approval was obtained from the Institutional Review Board at the University of Massachusetts Medical School and the Scientific and Ethical Review Unit at the Kenya Medical Research Institute. For this study, primary tumor biopsies were collected using fine needle aspirates (FNA) and transferred into RNAlater at the bedside, prior to induction of chemotherapy. In addition, peripheral blood samples were collected and fractionated by centrifugation prior to freezing into plasma and cell pellets. All samples were stored at −80°C prior to nucleic acid extraction.
Improved enrichment of GC-rich EBV in low abundance samples
We used Allprep DNA/RNA/Protein mini kit (Qiagen) for DNA isolations from FNAs and QIAamp DNA Kit for blood and plasma. We developed an improved multi-step amplification and enrichment process for the GC-rich EBV genome, particularly in samples with low viral copies. We used EBV-specific whole genome amplification (sWGA) to provide sufficient material and targeted enrichment with hybridization probes after the library preparation. For this, we designed 3’-protected oligos following the instructions from Leichty et al.30 (detailed in Supplemental Methods). For low viral load samples, we added a multiplex long-range PCR amplification (mlrPCR) step, comprising two sets of non-overlapping EBV-specific primers31 tiling across the genome. We improved the amplification yield for low copy EBV input (Supplemental Table 1) by optimizing buffers and reaction conditions (Supplemental Figure 1A and 1B).
Sequencing library preparation and hybrid capture enrichment
Illumina sequencing library preparation steps consisted of DNA shearing, blunt-end repair (Quick Blunting kit, NEB), 3’-adenylation (Klenow Fragment 3’ to 5’ exo-, NEB), and ligation of indexed sequencing adaptors (Quick Ligation kit, NEB). We PCR amplified libraries to a final concentration with 10 cycles using KAPA HiFi HotStart ReadyMix and quantified them using bioanalyzer. We then pooled sample libraries balancing them according to their EBV content and proceeded to target enrichment hybridization using custom EBV-specific biotinylated RNA probes (MyBaits, Arbor Biosciences). We sequenced the libraries using Illumina sequencing instruments with various read lengths ranging from 75bp to 150bp.
Sequence preprocessing and de novo genome assembly
We checked the sequence quality using FastQC (v0.10.1) after trimming residual adapter and low quality bases (<20) using cutadapt (v1.7.1)32 and prinseq (v0.20.4),33 respectively. After removing reads that mapped to the human genome (hg38), we de novo assembled the remaining reads into contigs with VelvetOptimiser (v2.2.5)34 using a kmer search ranging from 21 to 149 to maximize N50. We then ordered and oriented the contigs guided by the reference using ABACAS, extended with read support using IMAGE,35 and merged the overlapping contigs to form larger scaffolds (using in-house scripts). By aligning reads back to scaffolds, we assessed contig quality requiring support from ≥5 unique reads. We created a final genome by demarcating repetitive and missing regions due to low coverage with sequential ambiguous “N” nucleotides. We excluded minor variants (<5% of reads) in final assemblies. Deposited genomes can be accessed from GenBank (accession #) and raw reads can be downloaded from SRA (SRA accession #).
Diversity and variant association analysis
We used Mafft (v7.215)36 for multiple sequence alignment (msa) of genomes, and constructed phylogenetic neighbor-joining trees with Jukes-Cantor substitution model using MEGA (v6.0).37 We determined variant sites relative to consensus using snp-sites (v2.3.2)38 then projected variant loci on EBV type 1 reference. For principal coordinate analysis (PCoA), we used dartR (v1.0.5).39 We calculated dN/dS rates per gene using SNAP (v2.1.1) after excluding frameshift insertions and ambiguous bases.40 For variant association analysis, we used ‘v-assoc’ function from PSEQ/PLINK. To control for multiple testing, we calculated empirical p-values with one million permutations (pseq proj v-assoc --phenotype eBL --fix-null --perm 1000000) with EBV type stratification which permutes within types (--strata EBVtype).
Results
Study participant characteristics
The objective of this study was to examine EBV genetic variation in a region of western Kenya with a high incidence of eBL29 and determine if any variants are associated with eBL pathogenesis. We leveraged specimens from eBL patients and healthy children residing in the same geographic area (Figure 1A).29 We sequenced the virus isolated from 58 eBL cases and 40 healthy Kenyan children, as controls. Patients aging between 1 and 13 years were predominantly male (74%), consistent with the sex ratio of eBL (Table 1).29 Healthy controls had similar levels of malaria exposure based on previous epidemiologic studies.41 Control samples ranged in age from 1 to 6 years. This difference in age was necessary due to the finding that younger, healthy yet malaria-exposed children have higher average viral loads compared to older children who have developed immune control over this chronic viral infection.42
Sequencing and assembly quality
EBV is a large GC-rich double stranded DNA virus with 172 kb genome of which ~20% is repetitive sequence. For the majority of eBL patients, we prepared sequencing libraries directly from tumor DNA followed by hybrid capture enrichment. For low copy viral samples, such as eBL plasma and healthy control blood, we designed and implemented additional viral whole genome amplification and enrichment prior to library preparation and sequencing (Figure 1A; Supplementary Figure 1). We generated a study set of 114 genomes including replicates from cell lines and primary clinical samples, representing 98 cases and controls. In addition, we sequenced 20 technical replicates for quality control purposes such as estimation of re-sequencing error or sWGA bias, and sensitivity of detection of mixed infections. The baseline re-sequencing error rate was limited to ~1.1×10-5 bases when our assemblies are compared with high-quality known strain genomes43 (Supplemental Table 2). The mean error rate was ~2.1×10-5 bases for sWGA with GenomiPhi, while it is ~1.1×10-4 bases when we used more sensitive mlrPCR-sWGA (Methods). We obtained an average of ~5 million reads, resulting in an average 9,688 depth of coverage across assemblies (Supplemental Table 3). De novo sequence assembly created large scaffolds covering non-repetitive regions, except three isolates with low coverage, yielded a median of 137,887bp genomes (ranging 47,534bp - 146,920bp). We determined the types of each isolate by calculating the nucleotide distance to both reference types in addition to read mapping rates against type-specific regions. Despite our ability to experimentally detect mixed types at levels as low as 10% (Supplemental Figure 2A), we found no evidence of mixed infections in our cases and controls. Also, to ensure that our sample inclusion was unbiased when selecting healthy individuals with high enough viremias to sequence, we compared the viral loads and found no significant difference between type 1 and 2 (P=0.126, Supplemental Figure 2B).
Equivalence of tumor and plasma viral DNA in eBL cases
The viral genomes from eBL cases included virus reconstructed from plasma and tumor samples. We confirmed that viral DNA in the plasma was representative of the virus in the tumor cells by sequencing plasma-tumor pairs from 6 eBL patients (Figure 1B). Accounting for the sequencing errors, the pairs appeared to be identical. Besides these plasma-tumor pairs, we further confirmed identical EBV types with additional pairs from 8 separate patients using type-specific PCRs. Overall, these findings demonstrate that viral DNA isolated from plasma represents the tumor virus.
Structural variation and intertypic recombinants
First, we looked for large deletions within our viral genomes, but did not detect any of the previously described deletions in EBNAs, even though we were able to detect, as positive controls, EBNA3C deletion in Raji and the EBNA2 deletion in Daudi cell lines. However, in one sample we did detect a novel 20kb deletion, spanning from 100 kb to 120 kb in the genome (Figure 1C), which contains lytic phase genes BBRF1/2, BBLF1/3, BGLF1/2/3/4/5, and BDLF2/3/4. Interestingly, none of the latent genes were affected by this deletion.
Next, we interrogated our isolates by comparing the pairwise similarities of each genome against EBV type 1 and type 2 references. By traversing through the genome with a window, we were able to delineate regions that were more similar to one type over the other (Figure 1D). As expected, Jijoye, a type 2 strain, displayed less similarity against type 1 reference around its EBNA2 and EBNA3 genes, the most divergent region between types, while Namalwa as a type 1 strain shows the same pattern of dissimilarity against type 2 reference around the same regions. Interestingly, we found three patient-derived genomes, eBL-Tumor-0012, eBL-Tumor-0033, and eBL-Plasma-0049, with mixed similarity trends. Similar to a previously detected recombinant strain (LN827563.2_sLCL-1.18),43 all of the intertypic isolates carried type 1 EBNA2 and type 2 EBNA3 genes. Although not significant (P=0.268), these new intertypic hybrids were all isolated from eBL patients while we did not detect any in healthy controls.
Genomic population structure is driven by type differences with distinct substructure in type 2 viruses
Our samples present a unique opportunity to study population structure of EBV types and their co-evolution within a geographically defined region. As expected, the major bifurcation within the phylogenetic tree based on the entire genome occurs between type 1 and type 2 viruses (Figure 2A). Viruses from eBL patients as well as healthy controls appeared to be intermixed almost randomly within the type 1 branch. Interestingly, within type 2 genomes 8 eBL-associated isolates formed a sub-cluster. The hybrid genomes clustered with type 2s, which is consistent with type 2 EBNA3s representing a greater amount of sequence than type 1 EBNA2 region.
We further explored viral population structure with principal coordinate analysis (PCoA) of variation across the genome. While the first three components cumulatively explain 57.2% of the total variance, the first component, which solely accounted for 43.9% of the variance, separates genomes based on type 1 and type 2 (Figure 2B, upper plot). Similar to the phylogenetic tree, intertypic genomes positioned more closely to type 2s. Interestingly, the second and predominantly third components separate type 2 viruses into two distinct clusters, group A and B (Figure 2B, lower plot). These clusters were reflected, although not as distinctly, in the structure of the tree as well. The PCoA loading values, which accounts for 37.1% of the variance between the type 2 groups, are predominantly driven by correlated variation spanning 70kb upstream of EBNA3C (Supplemental Figure 3A and B). Together these findings suggest that there are two EBV type 2 strains circulating within this population. We also examined viral variation from the perspective of LMP1. Interestingly, the vast majority of viruses were grouped into Alaskan and Mediterranean strains (Supplemental Figure 4). For all available LMP1 type 2 sequences, group A and group B correlated with Mediterranean and Alaskan, respectively.
EBV type 2 has less diversity compared with type 1
We further explored the pattern and nature of genomic variation across the genome comparing and contrasting EBV type 1 and type 2. Examining the pairwise divergence of coding genes for all viral genomes, we found that the divergence was the highest in the type-specific EBNA genes (EBNA2 and EBNA3s), in particular, with EBNA2 showing the greatest divergence (d=0.1313 ± 2.3×10-3) (Figure 2C, upper panel). Investigating each type separately, the diversity within types was low for EBNA2 and EBNA3Cs, consistent with type 1 and 2 being separated by many fixed differences (Figure 2C, middle panel). In both types, intra-type divergence was greatest for EBNA1 and LMP1. Most remarkable was the fact that type 2 generally showed lower levels of divergence across the genome (0.0047 ± 3.7×10-3 and 0.0025 ± 2.7×10-3 for type 1 and type 2, respectively). Overall, these measures suggest that EBV gene evolutionary rates differ by types.
To explore signatures of evolutionary selection, we examined the dN/dS ratios within coding sequences (Figure 2C, lower panel). Overall most genes showed signals of purifying selection, as indicated by ω < 1.0, except LMP1, BARF0, and BKRF2 (only type 2). Interestingly, with dN/dS measures, EBNA2, BSLF1, BSLF2, and BLLF2 genes had relatively higher rates in type 2 compared to type 1 suggestive of differential evolutionary pressure. Overall, the magnitude of average nonsynonymous and synonymous changes per gene, normalized by gene length, reflect the high-level diversity accumulated in certain genes (Supplemental Figure 5). Latency-associated genes generally have the highest non-synonymous variant rates, but they also have the highest synonymous rates consistent with longstanding divergence (Figure 2D). Other functional categories, including lytic genes, have relatively low levels of nonsynonymous mutations suggesting stronger purifying selection.
Global context of Kenyan viruses
To more broadly contextualize our viral population from western Kenya, we examined the phylogeny of the Kenyan viruses along with other publicly available genomes from across the world (Supplemental Table 4). Among all isolates, the most polymorphic genomic regions appeared to be around EBNA2 and EBNA3 genes (Supplemental Figure 6A). Phylogenetic tree shows that the major types, type 1 and type 2, are the main demarcation point regardless of the source or geographic location. The three intertypic genomes from our sample set neatly cluster with the previously isolated intertypic hybrid, sLCL-1.18 (Supplemental Figure 6B). Type 1 genomes from our study were split into two groups, with one forming a sub-branch only with Kenyan type 1, including Mutu, Daudi, and several Kenyan LCLs. The second group interspersed with other African (Ghana, Nigeria, North Africa) and non-African isolates. In addition, a few of our genomes from healthy carriers clustered with a group of mainly Australian isolates, however; none of them clustered with South Asian group. Our Kenyan EBV type 2s generally intermixed with other type 2 genomes.
Viral Genomic Variants and Associations with eBL
After excluding the intertypic hybrids, we compared type frequencies of EBV genomes isolated from eBL patients and healthy controls. We observed a significant difference in frequencies with 74.5% of eBLs carrying type 1 while only 25.5% carried type 2 infections. In contrast, 47.5% vs. 52.5% of type 1 and type 2, respectively were found in healthy controls. EBV type 1 was associated with eBL (OR=3.24, 95% CI=1.36 - 7.71, P = 0.007, Fisher’s exact) (Figure 3A), independent of age and gender (all P>0.05, Supplemental Figure 7). We then expanded the association analysis to all 2198 non-synonymous single nucleotide variations across the entire genome (Figure 3B). We did an initial association test for each nonsynonymous variant and detected 133 significant associations (Supplemental Table 5 & Methods). The vast majority of these variants were located within the type1-type2 region given the highly correlated nature of this region. We then stratified by type to detect variation independent of viral type. This yielded 6 variants solely associated with eBL (Table 2, Supplemental Table 5). Variant 37668T>C represents a serine residue change to a proline at the C-terminus of EBNA2 (S485P) which is carried by 24/54 eBL cases; while this variant was present in only 2/36 healthy controls. Two variants in EBNA1 at 95773A>T and 95778T>G (N38Y and H39Q, respectively) were both observed in 3/57 eBL isolates while their corresponding frequencies were 11/36 and 12/37 among healthy controls.
Nucleotide variants in non-coding and promoter regions can affect regulation of viral gene expression and activity within host cell. BZLF1 is a regulator gene of lytic reactivation and classified based on its promoter as prototype Zp-P (B95-8) and Zp-V3 (M81 strain).44 We determined variants at seven positions in the upstream promoter region of BZLF1 (Supplemental Table 6). Interestingly, all of the Kenyan viruses carried C at positions both −525 and −274 (as in Zp-P) regardless of promoter type. We also found that −532 and −524 are variable in our isolates while these two are not variant in both promoter types. Our results show that only 12.5% (5/40) type 1 promoter sequences fully resembled Zp-V3 in eBL group as opposed to 22% (2/9) healthy genomes, while all of the type 2 genomes, without exception, carried Zp-V3 type promoter regardless of disease status.
Discussion
In this study, we investigated genomic diversity of EBV by sampling virus from children in western Kenya where eBL incidence is high.41 Our improved methods allowed us to sequence asymptomatically infected healthy controls with relatively low peripheral blood viral loads, and thereby examine the virus in the population at large.42 We performed the first association study comparing viral genomes from eBL patients and geographically matched controls, without the need for viral propagation in LCLs; thus showing that type 1 EBV, as well as potentially several non-type specific variants, are associated with eBL. Furthermore, as the first study that characterized significant numbers of EBV type 2, we were able to compare and contrast both types and explore the viral population, thus discovering novel differences including population substructure in EBV type 2.
Our sequencing data demonstrated that EBV from plasma is representative of the tumor virus in eBL patients. This is consistent with the premise that peripheral EBV DNA originates from apoptotic tumor cells given that cell-free EBV DNA in eBL patients are mostly unprotected against DNase45, as opposed to being encapsidated during lytic reactivation, and that plasma EBV levels are associated with tumor burden and stage.46 These findings support the use of plasma viremia as a surrogate biomarker and the development of plasma-based prognostic tests with predictive models that could be used during clinical trials.46 The lack of mixed infections observed in our healthy controls could be due to the limit of detection in blood compared to virus isolated from saliva.14 Further studies are needed to understand the coevolution and dynamics of both EBV types.
In addition, we detected three intertypic recombinant EBV genomes solely found within our eBL patients; similar to those previously described in other cancers.47 It is unclear whether the intertypic genomes represent a common event with subsequent mutation and recombination or multiple independent events. If the latter is true, it supports more frequent mixed-type infections given that both parents have to be present in the same cell.48–50 It is interesting that all four intertypics observed to date carry the same type EBNA2/EBNA3 combinations with the type 2 genes being so closely related (Supplemental Figure 8). Thus, if multiple events have generated these viruses, it suggests that certain strains may have a greater proclivity to recombine. Further studies will be needed to better define the intertypic population, their origins and their association with disease.
Importantly, we were able to explore EBV population genetics and compare and contrast type 1 and type 2 because of their co-prevalence in Africa. As well described, the major differentiation in terms of genetic variability was the variation correlated with type 1 and type 2 viruses. These viral types showed distinct population characteristics with type 1 harboring greater diversity especially in functionally important latent genes. Combined with the observed nucleotide diversity, latency genes appear to have long standing divergence that has accumulated significant synonymous changes (as opposed to recent sweeps on nonsynonymous changes that would erase synonymous variants). Global phylogenetic analysis emphasizes this diversity by providing two main subgroups for type 1 genomes in our sequencing set. One group represents core local Kenyan viruses while the second group is a mixture of viruses from across the globe, with the exception of South Asian viruses that group apart. While previously sequenced type 2 viruses intermingle with western Kenya isolates, the majority of these originated from East Africa with only a few from West Africa. Interestingly, intermingling is also true for type 2 as we observed two distinct groups. This is more apparent in PCA where type 2 virus forms 2 clusters. Examination via PCA, the loading values are determined by a broad stretch of the genome from the end of EBNA3C to LMP1, where Mediterranean and Alaskan designations correlate. It remains to be determined whether this substructure might be due to the introduction of previously geographically isolated viruses or distinct evolutionary trajectories within the population. Further study is needed with broader samplings to understand its significance but our findings suggest that there may be significant epistasis potentially including LMP1.
By sequencing virus directly from healthy controls, we were able to address the question of relative tumorigenicity between type 1 and 2. We tested the long-standing hypothesis that type 1 virus is more strongly associated with eBL, in contrast to type 2. Our work was able to more definitely answer this question as we were not reliant on LCLs from healthy controls where type 1 bias in transformation might explain the lack of previous associations. We earlier demonstrated, by mutational profiling of EBV positive and negative eBL tumors, that the virus, especially type 1, might mitigate the necessity of certain driver mutations in the host genome.16 In addition, our genome-wide results controlling for viral type substantiates investigations of non-type associated variation that could also impart oncogenic risk, as we found suggestive trends for several nonsynonymous variants as well. Only a small subset of type 1 viruses from eBL patients carried BZLF1 promoter variant, which leads to a gain of function,44 while all type 2 viruses carried this variant suggesting this promoter might be beneficial for type 2 but makes it unlikely to be a driver of oncogenesis.
Overall, this population-based study provides the groundwork to unravel the complexities of EBV genome structure and insight into viral variation that influences oncogenesis. Genomic and mutational analysis of BL tumors identified key differences based on viral content suggesting new avenues for the development of prognostic molecular biomarkers and the potential for antiviral therapeutic interventions.
Authorship Contributions
Contribution: Y.K., C.I.O., and O.A. designed and performed experiments; Y.K. and C.I.O analyzed and interpreted results; Y.K. made the figures; Y.K., J.A.B. and A.M.M. designed the research and wrote the paper, C.I.O, J.A.O., J.M.O., and A.M.M. organized clinical sample acquisition.
Disclosure of Conflicts of Interest
The authors declare no competing financial interests. The current affiliation for Yasin Kaymaz is FAS Informatics and Scientific Applications, Harvard University, Cambridge, MA
Acknowledgements
This work was supported by the US National Institutes of Health, National Cancer Institute R01 CA134051, R01 CA189806 (A.M.M., J.A.B, C.I.O, Y.K.) and The Thrasher Research Fund 02833-7 (A.M.M.), UMCCTS Pilot Project Program U1 LTR000161-04 (Y.K., J.A.B., and A.M.M.), Turkish Ministry of National Education Graduate Study Abroad Program (Y.K.). We would like to thank the Kenyan children and their families who participated in this study. Patrick Marsh for helping with EBV genotyping assays, Mercedeh Movassagh for sharing genotyping primers. This publication was approved by the Director of KEMRI.
Footnotes
↵* shared last authorship