Abstract
Background To eliminate trachoma as a public health problem, the WHO recommends the SAFE strategy. As part of the SAFE strategy in the Amhara Region, Ethiopia, the Trachoma Control Program distributed over 124 million doses of antibiotic between 2007 and 2015. Despite these interventions, trachoma remained hyperendemic in many districts and a considerable level of Chlamydia trachomatis (Ct) infection was evident.
Methods We utilised residual material from Abbott m2000 Ct diagnostic tests to sequence 99 ocular Ct samples from Amhara and investigated the role of Ct genomic variation in the continued transmission of Ct following 5 years of SAFE.
Findings Sequences were typical of ocular Ct, at the whole-genome level and in tissue tropism-associated genes. There was no evidence of macrolide-resistance in this Ct population. Polymorphism in a region around ompA gene was associated with village-level TF prevalence. Additionally, greater ompA diversity at the district-level was associated with increased Ct infection prevalence.
Interpretation We found no evidence for Ct genomic variation contributing to continued transmission of Ct after treatment, adding to previous evidence that azithromycin does not drive acquisition of macrolide resistance alleles in Ct. Increased Ct infection in villages and in districts with more ompA variants requires longitudinal investigation to understand what impact this may have on treatment success and host immunity.
Funding European Commission; Neglected Tropical Disease Support Center; International Trachoma Initiative
Introduction
Trachoma is a blinding disease caused by the intracellular bacterium Chlamydia trachomatis (Ct). To eliminate trachoma as a public health problem, the World Health Organization (WHO) recommends the SAFE (Surgery, Antibiotics, Facial cleanliness, and Environmental improvement) strategy.1 As part of this strategy, annual mass drug administration (MDA) with azithromycin is delivered to individuals aged ≥6 months whilst topical tetracycline eye ointment is administered to pregnant women and children aged <6 months. The number of recommended years of SAFE interventions is based on the prevalence of trachoma in a district, determined by population based surveys.2 For districts considered hyperendemic for trachoma, defined as a trachomatous inflammation-follicular (TF) prevalence of ≥ 30% among children aged 1 to 9 years, 5 to 7 years of SAFE are recommended followed by further population based surveys to determine the impact of the interventions.
As part of the SAFE strategy in Amhara National Regional State, Ethiopia, the Trachoma Control Program distributed over 124 million doses of antibiotic between 2007 and 2015.3 Both administrative and self-reported coverage have demonstrated treatment coverage in the region to be close to or above the WHO recommended minimum threshold of 80%.3–5 During this time, the program also provided village-based and school-based health education and assisted in the construction of latrines as part of the F and E components of SAFE.3 Despite an average of 5 years of these interventions, trachoma remained hyperendemic in many districts, and a considerable level of Ct infection was evident throughout the region.3,6
Historically, Ct molecular epidemiology predominantly focused on ompA,7 the gene encoding the major outer membrane protein. More recently a number of multilocus sequence typing schemes with and without ompA have been used.8–10 Since 2010, there has been a rapid expansion of Ct whole-genome sequencing (WGS), due to the ability to sequence directly from clinical samples.11,12 Despite more than seven-hundred Ct genomes being publicly available,13–16 few studies have evaluated the role of genome-level variation in Ct transmission and clinical outcomes of Ct infection. Recent publications have begun to address these questions in Ct populations from trachoma-endemic settings16,17 and have shown higher diversity than expected, compared with sequencing of cultured isolates. WGS additionally allows monitoring of emergence of antimicrobial resistance alleles in Ct,17–19 which is of critical importance as MDA with azithromycin is key for trachoma control, and is also under consideration as an intervention for childhood mortality,20,21 neonatal sepsis,22 and bacterial skin diseases.23
The Trachoma Control Program in Amhara has conducted multiple studies to better understand the epidemiology of trachoma in communities that have received approximately 5 years of annual MDA with azithromycin, yet still have significant levels of Ct infection and trachomatous disease. The resolution of WGS allows for a greater understanding of Ct transmission patterns, presence of putative virulence determinants, and identification of antimicrobial resistance alleles. Therefore, this study sequenced 99 ocular Ct samples from Amhara to investigate the role of genomic variation in the continued transmission of Ct. We further explored the relationship between Ct genomic variation, ocular Ct infection prevalence, and trachoma clinical sign prevalence at the village and district level.
Methods
Ethics statement
Survey methods were approved by the Emory University Institutional Review Board (protocol 079-2006) as well as the Amhara Regional Health Bureau. Due to the high illiteracy rate among the population, approval was obtained for oral consent or assent. Further permission for sample transfer and genomic sequencing of Ct samples for this project was provided by the Amhara Regional Health Bureau.
Study design and population
Between 2007 and 2010 the SAFE strategy was scaled to reach all districts in Amhara and interventions were subsequently administered for approximately 5 years. Sampling methodology for these district-level surveys has been published previously.3,24 Briefly, a multi-stage cluster randomized methodology was used, whereby clusters (villages) were selected using a population proportional to estimated size method, and within a cluster, a modified segmentation approach was used to randomly select segments of 30 to 40 households.3
After enumerating all residents, present and consented residents were examined for trachoma based on the WHO simplified methodology.25 Trained and certified trachoma graders used x2·5 magnification and adequate light.3,26 Every-other cluster was chosen for swab collection prior to surveying a district, and during the house-to-house survey, the first 25 children aged 1 to 5 years with parental consent were swabbed for the presence of infection. If more than one child aged 1 to 5 years lived in a household, one child was randomly chosen by survey software.
Sample collection and processing
Gloved graders swabbed the upper tarsal conjunctiva three times with a polyester-tipped swab, rotating 120 degrees along the swab’s axis each time to collect a sufficient epithelial specimen.6 Samples were transferred to the Amhara Public Health Institute (APHI) where they were stored at −20° C. Conjunctival swabs from each district were randomized and five samples were combined into each pool. Pools were processed with the RealTime (Abbott Molecular Inc., Des Plaines, IL, USA) polymerase chain reaction assay on the Abbott m2000 system at the APHI laboratory.6 All individual samples from the identified positive pools from four zones (administrative groupings of districts), North Gondar, South Gondar, East Gojam, and Waghemra, were processed again to provide individual level infection data.27 Samples from these zones were prioritized owing to the persistent high trachoma prevalence. For positive individual samples, the m2000 generated delta cycle threshold result was converted to Ct elementary body (EB) equivalent concentration based on a previously described calibration curve of known EB concentrations on the RealTime Assay.27,28
Once Ct load was known for the positive individual samples, a total of 240 with the highest load were chosen for this project. Samples with sufficient Ct load, likely to obtain high quality full genome sequence data based on our previous studies, were re-extracted as described below.16,17,29
Ct detection and sequencing preparation
DNA was extracted from 800 μl residual material per sample from Abbott m2000 diagnostic tests using QIAamp mini DNA kit (Qiagen). Samples were quantified using plasmid and genome targets by quantitative PCR as previously described.30 Samples with ≥ 10 genome copies per μl of DNA were considered for WGS.
Sequencing, processing, and analysis of Ct
Sequencing was performed as previously described16 except we utilised the SureSelectXT Low Input kit. Processing and analysis of sequenced reads was performed as previously described.17 Briefly, raw reads were trimmed and filtered using Trimmomatic.31 Filtered reads were aligned to a reference genome (A/Har13) with Bowtie2,32 variant calls were identified with SAMtools/BCFtools.33 Multiple genome and plasmid alignments were generated using progressiveMauve,34 multiple gene alignments were generated using muscle.35 Phylogenies were computed using RaxML,36 predicted regions of recombination were masked using Gubbins37 before visualisation in R. Domain structure of tarP and truncation of trpA were characterised as previously described.16 ABRicate and the ResFinder database (https://github.com/tseemann/abricate) were used to test for the presence of antimicrobial resistance genes in the reference-assembled genomes and de novo assembled reads for each sample.
Inter-population genome-wide association analysis
Genome-wide association analysis was performed to identify polymorphisms specific to this population of ocular Ct sequences through comparison of 99 Amharan Ct genomes to 213 previously sequenced samples from trachomaendemic communities. Heterozygous calls and positions with greater than 25% missing data were removed. Polymorphisms were considered conserved in Amhara if the major allele frequency was greater than 0.8 and rare in the global ocular population if the same allele was at a frequency of less than 0.2. The final analysis included 116 single nucleotide polymorphisms (SNPs). A logistic regression was performed with each Amhara-specific site as the independent variable and origin of the sequence as the dependent variable (reference level; global and comparator level; Amharan). P-values were considered for significance after Bonferroni correction.
Intra-population genome-wide association analyses
Genome-wide association analyses were performed to identify Ct polymorphisms associated with village-level clinical data. Heterozygous base calls were and positions with a minor allele frequency of less than 25% or greater than 25% missing data were removed. The final analysis included 681 SNPs across the 99 sequences. A linear regression was performed with each SNP as the independent variable and log10 transformed village-level Ct infection, TF, or TI prevalence as the dependent variable. District was included as a random-effect and all analyses were adjusted for age and gender. P-values were considered significant after Bonferroni correction. Additionally, a sliding-window approach was used to identify polymorphic regions of the genome. Windows of 10 kilobases were evaluated for polymorphic sites, with a step size of 1 kilobase. There were a median of four polymorphic sites per window (IQR; 2-7). The final analysis included 907 polymorphic regions across the 99 sequences. A linear regression was performed with each polymorphic region collapsed into a pseudo-haplotype per sequence as the independent variable, including district as a random-effect and adjusted for age and gender. This model was compared to a model including only the covariates and random-effects by F-test. P-values were considered significant after Bonferroni correction.
Inference of ompA sequences
Complete sequences of ompA were obtained from whole-genome sequence data using the reference-based assembly method described above with one change. Each sample was assembled against four reference sequences (A/Har-13, B/Jali-20, C/TW-3, and D/UW-3) and the assembly with the highest coverage was used for downstream analyses. Serovar of ompA was assigned using maximum blastn homology against all published Ct sequences. Genotypes of ompA were manually determined using SeaView.38 Diversity of ompA genotypes was calculated as Simpson’s D using vegan in R.
Results
Ocular swabs previously confirmed as positive for Ct DNA using the Abbott m2000 system were selected for this study (n = 240). One-hundred and thirty-five samples had sufficiently high concentration of Ct DNA after reextraction to be considered for WGS (≥ 10 genome copies per μl). Of these, 99 were randomly selected for sequencing to match the complete dataset on age, gender, and zone of collection. The sequenced and complete samples were comparable (Table 1), except as expected a higher median load of infection in sequenced samples.
The Amharan Ct genomes formed two closely related subclades within the classical T2 ocular clade (Figure 1). The two subclades were predominantly separated by ompA genotype, with 52 serovar A (SvA) and 47 serovar B (SvB) genomes. Focusing on genomes from ocular infections (Figure 2), the SvA Amharan genomes branch together independent from any previously sequenced Ct. The SvB Amharan genomes were split across two branches. One branch was most closely related to A/Har-13 originally isolated from Saudi Arabia. The second, smaller branch was most closely related to Ba/Apache-2 Ct from the United States of America (USA) as well as recently sequenced ocular Ct from Solomon Islands.
Several Ct genes and genomic regions are hypothesised to be indicative of tissue preference/tropism, with polymorphisms distinct to ocular, urogenital and LGV sequences. All Amharan Ct genomes had tarP domain structure typical of ocular sequences.39 Similarly, all Amharan genomes had inactivating mutations in trpA, leading to a nonfunctional tryptophan synthase.40 Polymorphic membrane proteins (pmp) B, C, F, G, H and I clustered phylogenetically with ocular isolates (SI Figure 1).41 There was minimal polymorphism in the Ct plasmid within the Amharan genomes and they were closely related to previously sequenced ocular isolates (SI Figure 2). There was no evidence for the presence of macrolide resistance alleles in the assembled genome sequences or de novo assembled sequence reads.
Amharan Ct genomes were compared to 213 previously sequenced samples from trachoma-endemic communities to identify genomic markers specific to Amhara.13–17,29 Of 36,805 polymorphic sites (Figure 3a), 116 were conserved in Amhara (frequency ≥ 0.8) and rare in the global ocular population (frequency ≤ 0.2). These were dispersed throughout the genome (Figure 3b). Fourteen genes harboured two such sites and five genes contained three sites, all of which have previously been identified as polymorphic in distinct populations of Ct genomes (Figure 3c).
A genome-wide association study was performed to identify polymorphism within the Amharan Ct genomes related to village-level prevalence of Ct infection. The final analysis included 681 single nucleotide polymorphisms (SNPs) in 99 genomes. No SNPs were associated with village-level prevalence of infection (SI Figure 3a). A secondary sliding-window approach was utilised to identify polymorphic regions of the genome associated with infection prevalence. The final analysis included 907 polymorphic regions in 99 genomes. No polymorphic regions were associated with village-level prevalence of infection (SI Figure 3b).
No SNPs were associated with village-level prevalence of TF (Figure 4a). However, eight polymorphic regions from positions 774,000 to 791,000 were associated with village-level prevalence of TF (Figure 4b). SNPs in these regions were focused in CTA0743/pbpB (harbouring 29 SNPs), CTA0747/sufD (10 SNPs) and CTA0742/ompA (7 SNPs). All SNPs in sufD were synonymous, while 8/29 and 3/7 SNPs in pbpB and ompA were non-synonymous.
No SNPs or polymorphic regions were associated with village-level prevalence of TI (SI Figure 4).
As ompA variation was important in Ct phylogeny and heterogeneity in TF profiles in this population, we further investigated the geographical distribution of ompA serovars and their relationship to levels of Ct infection and TF. Serovars A (SvA) and B (SvB) of ompA were distributed across all studied zones (Figure 5a). Village-level Ct infection, TF and TI prevalence were not associated with ompA serovar (p = 0·860, 0·382 and 0·177 respectively). We identified nine ompA types in this population (Table 2). Six were serovar A (SvA), defined by nine non-synonymous polymorphic sites. Three were serovar B (SvB), defined by two non-synonymous polymorphic sites. Four of nine types were present in all zones of this study (A1, A3, A5 and B3), four were exclusive to East Gojam (A2, A4, A6 and B1) and one was found in both East Gojam and North Gondar (B2) (SI Figure 5). Types A1 (n = 5) and B1 (n = 6) had a nucleotide predicted amino acid change in the surface-exposed, variable domain 1 (VD1), A2 (n = 2) in VD2 and A4 (n = 1) in VD4.
Most villages (55/61) had only one ompA type in this study, therefore we evaluated ompA diversity at the districtlevel, using Simpson’s D. We used previously published district-level Ct infection, TF and TI prevalence estimates6,26 (Table 3). District-level Ct infection and TI prevalence were significantly higher with increasing ompA diversity, a similar trend was found for TF prevalence. In a multivariate model, only Ct infection prevalence was associated with increasing ompA diversity.
Discussion
This study sequenced Ct from ocular samples collected from districts in Amhara, Ethiopia which had received approximately 5 years of the SAFE strategy, including annual azithromycin MDA, as part of trachoma control efforts. We found sequences were typical of ocular Ct, at both the whole-genome level and in tropism-associated genes, yet phylogenetically distinct from most previously sequenced Ct genomes. There was no evidence of macrolide-resistance alleles in this ocular Ct population. Greater ompA diversity at the district-level was associated with increased Ct infection prevalence. A continued commitment to the implementation of the full SAFE strategy with consideration of enhanced MDA accompanied by further longitudinal investigation is warranted in Amhara.
Almost 900 million doses of azithromycin have been distributed by trachoma control programmes since 199942 and in Amhara alone 15 million doses are administered every year.3 Mass distribution of azithromycin is likely to become more common as more evidence emerges of off-target effects such as reducing infectious diseases,23,43–45 diarrheal diseases46 and childhood mortality.22,47–50 There is concern about the impact of these programmes on development of antimicrobial resistance in Ct and other bacteria.19,51 This is particularly true in situations where community-wide treatment with azithromycin has been unable to eliminate trachoma as a public health problem within expected timelines.46,52 It has been shown that treating whole communities with azithromycin can increase nasopharyngeal carriage of macrolide-resistant Staphylococcus53 and Streptococcus54,55 species and alters the faecal microbiome,56–59 with studies reporting an increase in macrolide-resistant Escherichia coli.60,61 This study, in agreement with previous work,17–19 found no evidence of macrolide-resistance alleles in this Ct population. This result is encouraging; however, it does not rule out macrolide-resistance as a potential problem in these communities. Carriage of macrolide-resistant pathogens in the gut and nasopharynx may be impacted by continued antibiotic treatment. Additionally, presence of additional species of Chlamydia62,63 and non-chlamydial bacteria64–69 in the ocular niche have been associated with clinical signs of trachoma, therefore resistance in other bacteria may be important.
No Ct genomes in this study had acquired azithromycin resistance alleles, however, there may be other genomic factors which support Ct transmission after treatment. To answer this question, we compared the Amharan Ct genomes with previously sequenced Ct to find polymorphism specific to this population that could explain continued transmission. The few SNPs identified as conserved in Amhara and rare in the global ocular Ct population were dispersed across the genome in known polymorphic genes, rather than being overrepresented in genes related to Ct survival. The typical nature of this Ct population was further supported by phylogenetic clustering with other ocular Ct sequences, presence of a non-functional tryptophan synthase operon and tropism-associated polymorphism in tarP and the polymorphic membrane proteins. Similarly to recent studies from distinct trachoma-endemic communities,13–17,29 the Ct sequences in this population formed two closely related subclades within the ocular clade, primarily separated by ompA serovar. Evidence of phylogenetic clustering by country of collection and the similarity to Ct sequences collected over 50 years prior to this study suggests diversification in ocular Ct is slow and geography-related, rather than being driven by treatment-derived selection pressure. A surprising finding in this study was that a subgroup of SvB Ct from Amhara were most closely related to a historical genome from USA (Ba/Apache-2) and recently collected genomes from Solomon Islands.29 It is possible the origin of these genomes is unique within this population; however, it is more likely that this is further evidence of the slow diversification of Ct. In support of this, ompA SvB sequences were significantly less diverse than SvA in this study. Furthermore, all major branches of ocular Ct phylogeny studied here included samples collected decades apart from geographically disparate sites.
We identified several polymorphic regions associated with village-level TF prevalence, a similar trend was found with village-level Ct and TI prevalence. The polymorphisms were mostly frequently found in ompA, pbpB and sufD, all of which are known to be polymorphic in ocular Ct.8,70–72 OmpA encodes the major outer membrane protein which constitutes approximately 2/3 of the surface of Ct, is the primary target of host immune responses and is believed to function as a porin.73 The functions of pbpB and sufD in Ct are currently unknown,74 however bacterial homologues of these genes function in peptidoglycan synthesis and response to oxidative stress75 respectively. It is plausible that genes hypothesised to be involved in immune evasion and response to stress could have an impact on Ct survival and response to treatment.
We found approximately equal representation of SvA (52.5%) and SvB (47.5%) in this sample of the Amhara population. Both serovars were present in all four districts and were not individually associated with village-level Ct infection, TF, or TI prevalence. However, Ct infection prevalence, and to a lesser extent TF and TI prevalence, were increased in districts with greater ompA diversity. Our data agrees with that from a Nepalese study that found increased ompA diversity in villages to be associated with higher trachoma prevalence.76 In contrast, a more recent study from Ethiopia found no association between ompA diversity and Ct infection levels77. It is known that individuals develop serovar-specific immunity to Ct,78–80 therefore it is plausible that in villages with more than one serovar in circulation, individuals are more likely to be exposed to a serovar they do not have protective immunity against. However, presence of one or more ompA variants should not impact treatment success.
A potential limitation of this study was bias towards samples with the highest Ct load. It is possible that identified relationships between ompA variation and Ct infection prevalence might have been different if lower load infections were included, particularly at the village-level, as the majority (34/61) were represented by one sequence. Additionally, we have not sequenced residual material from Abbott m2000 specimens previously, therefore, it is possible that long-term storage in this format and multiple freeze-thaw cycles may have impacted DNA quality and/or quality. However, obtaining high quality genomes from all sequenced samples, with as low as 500 Ct genomes as starting material, suggests quality was not an issue.
Despite approximately five years of azithromycin MDA, we found no evidence for Ct genomic variation contributing to the continued transmission of Ct after multiple rounds of treatment, adding to previous evidence that azithromycin MDA does not drive acquisition of macrolide resistance alleles in Ct. This study demonstrates feasibility of WGS of low-load, residual material and highlights the added value of collecting ocular swabs as part of routine trachoma surveys. Collection and long-term storage of these samples has helped alleviate concerns of azithromycin resistance in Amharan Ct, while offering important insights into the relationship between ompA variation and Ct infection levels. Future longitudinal investigation will be needed to understand what impact, if any, ompA diversity may have on treatment success in this setting.
Contributors
HP, RLB, EKC, MJH and SDN contributed to study design. HP, CAW, AC, ES, MZ, ZT, EKC, and SDN contributed to data collection. HP, AWN, EKC, MJH and SDN contributed to data analysis. All authors interpreted the findings, contributed to writing the manuscript, and approved the final version for publication.
Declaration of interests
We declare no competing interests.
Availability of data and materials
All sequence data are available from the European Bioinformatics Institute (EBI) short read archive (PRJEB38668).
Funding
This work received financial support from the Coalition for Operational Research on Neglected Tropical Diseases (COR NTD), which is funded at The Task Force for Global Health primarily by the Bill & Melinda Gates Foundation, by the United Kingdom Department for International Development, and by the United States Agency for International Development through its Neglected Tropical Diseases Program. Additional financial support was received from the International Trachoma Initiative. HP and MH were funded by the EU Horizon 2020 grant agreement ID: 733373.
Acknowledgments
The authors would like to acknowledge the study participants and field team in Amhara, Ethiopia. The authors also acknowledge the infrastructure support provided by the UCL/UCLH Biomedical Research Centre funded Pathogen Genomics Unit. We would also like to thank Abbott for its donation of the m2000 RealTime molecular diagnostics system and consumables.