Abstract
Anopheles minimus is an important malaria vector throughout its wide geographic range across Southeast Asia. Genome sequencing could provide important insights into the unique malaria transmission dynamics in this region, where many vector species feed and rest outdoors. We describe results from a study using Illumina deep whole-genome sequencing of 302 wild-caught An. minimus collected from three Cambodian provinces over several years (2010, 2014-2016) and seasons to examine the level of population structure and genetic diversity within this species. These specimens cluster into four distinct populations of An. minimus s.s., with two populations overlapping geographically. We describe the underlying genetic diversity and divergence of these populations and investigated the genetic variation in genes known to be involved in insecticide resistance. We found strong signals of selection within these An. minimus populations, most of which were present in the two Northeastern Cambodian populations and differ from those previously described in African malaria vectors. Cambodia is the focus of the emergence and spread of drug-resistant malaria parasites, so understanding the underlying genetic diversity and resilience of the vectors of these parasites is key to implementing effective malaria control and elimination strategies. These data are publicly available as part of the MalariaGEN Vector Observatory, an open access resource of genome sequence data.
Introduction
An. minimus is a small mosquito that plays a major role in sustaining malaria transmission across the Greater Mekong Subregion (GMS). This species is commonly associated with human-altered habitats, able to breed in rice paddies and forest-fringe habitats. The GMS is becoming increasingly fragmented as deforestation and agriculture create patchwork landscapes (1–3). Historically, malaria transmission in this region has been dominated by the major vector species An. dirus, a forest-dwelling, highly competent vector and human-attracted mosquito. More recently, land use change has resulted in the decline of An. dirus populations and has shed light on the role of other vectors in malaria transmission, species like An. minimus, that are able to breed in human-associated habitats and quickly adapt to changing environments (4–6). Vectors that occupy different seasonal and ecological niches can also sustain malaria transmission and change the dynamics of seasonal transmission patterns. While An. dirus is considered the major vector in the greater Mekong Subregion (GMS), there are many other species that have been shown to be susceptible to malaria parasites that play a role in local transmission (2, 7–9).
To further complicate malaria transmission in this region, successful lineages of multi-drug resistant malaria parasites have continued to emerge from the area along the Thai-Cambodian border (10–13) and spread through the GMS. Lab infection experiments have shown that Anopheles minimus is able to be infected by and transmit the parasites in Cambodia that have been shown to have slow clearance in response to the Artemisinin combined therapy (ACT) drug treatment (14) and these parasites are co-circulating in areas where An. minimus is active and abundant. All five human malarias are transmitted in Cambodia (15), with Plasmodium vivax infections about as prevalent as those from Plasmodium falciparum (16). An. minimus is known to transmit P. falciparum and P. vivax throughout its wide geographic range, which spans from China down into the Malay archipelago, and from western India to Vietnam (17).
Many Anopheles species in Southeast Asia are members of cryptic species complexes, groups of isomorphic species that are molecularly and often behaviourally distinct. In other Anopheles cryptic species groups like the Gambiae complex, some closely related cryptic species have been shown to exchange genetic material from almost every region of the genome, described as having “porous species boundaries” (18). The level of gene flow between what is typically thought of as isolated species has important implications for the spread of genes relevant to malaria transmission, such as insecticide resistance alleles, or genes that confer susceptibility or resistance to malaria parasites. Population genomic studies of An. gambiae s.l. across the African continent have shown that geographically distinct populations have similar signatures of selection in response to decades of insecticide-based vector control (19), sharing common haplotypes that confer insecticide resistance over thousands of kilometers. Anopheles minimus is a member of such a species complex with at least three recognized cryptic species: An. minimus s.s. (An. minimus A), An. harrisoni (An. minimus C), and An. yaeyamaensis (An. minimus E) (20, 21) and is also in the Funestus group, somewhat closely related to An. funestus, an important African malaria vector. Only An. minimus s.s. has been reported from collections in Cambodia (5), and is considered to have greater indoor and human biting preferences than other members of this species complex. We currently know very little about the population structure of An. minimus and other important vectors occurring across Southeast Asia.
Cambodia is transitioning from focusing on malaria control to malaria pre-elimination, with a goal of eradicating malaria from the country within the next five years. Drug resistant parasites, prevalent P. vivax infections, and outdoor malaria transmission are all factors that complicate the goal of malaria elimination in this country. Despite outdoor biting behaviors of many local vector species, Insecticide-treated nets are still the primary vector control method implemented in the GMS, and do have some protective efficacy against P. falciparum infection (22). Investigations into baseline rates of insecticide resistance have shown that An. minimus from different regions have varying levels susceptibility to DDT and pyrethroid insecticides, with some evidence of resistance in populations in Cambodia, Thailand, Laos, India, and Vietnam (23–25). Additionally, there is very limited regulation of the use of insecticides for public health and agriculture in Cambodia (26).
In the context of the spread of parasites that threaten our best and last-line defence against malaria, we need to know what the population history of the vectors of these parasites look like across this same space. In this study we seek to describe the population structure, diversity, and divergence of populations of An. minimus that were active during the rise and spread of drugresistant parasites in Cambodia. We hope to contribute to a better understanding of malaria transmission in Southeast Asia.
Results
Population sampling and sequencing
We generated whole genome sequence data from 302 wild-caught individual An. minimus female mosquitoes collected from five different field sites in Cambodia using the Illumina HiSeq 2000 platform with 150bp reads with a target coverage of 30X for each. These collections were done in 2010 in Thmar Da, in Eastern Cambodia, a longitudinal monthly collection over 2014 in two sites in each of Pursat, Preah Vihear, and Ratanakiri provinces, and quarterly collections over 2016 in one site each in Pursat and Preah Vihear province, Cambodia.
Variant discovery
The methods for sequencing and variant calling closely follow those of the Anopheles gambiae 1000 Genomes project phase 2 (Ag1000G) (27). Sequence reads were aligned to the An. minimus reference genome AminM1 (28). The median genome-wide coverage was 35X. We discovered 38,000,285 segregating single nucleotide polymorphisms (SNPs) that passed all of our quality control filters. 13.4% of these SNPs were mutiallelic, with 4,807,355 triallelic and 286,459 quadriallelic SNPs. We restricted our analysis to the largest 40 contigs, which cover 96.6% of the AminM1 reference genome, as many smaller sized contigs can confound diversity and divergence calculations. We found that 75.4% of sites within these largest contigs pass our site filters and thus were accessible to SNP calling.
Population structure
A principal component analysis (PCA) over biallelic SNPs distributed over the genome of 302 individual field-collected mosquitoes showed that there is clear population structure of An. minimus in Cambodia. Samples collected from 5 sites in three provinces split into three distinct populations (Figure 1). One population includes all samples from the western collection site Thmar Da and the northern collection sites in Preah Vihear province, with two further populations with samples from Ratanakiri province in the northeast. These populations split primarily along the first and second principal components. This was a surprising finding because this population structure did not correlate to the geographic sampling of these mosquitoes. Individuals collected from the western and northern sites cluster tighly together despite being hundreds of kilometers apart.
To further explore this structure, we performed the same PCA over contigs from different regions of the genome, PCA over the largest X-chromosomal contig revealed that there is more structure within this quickly evolving sex chromosome, with a splitting of the western and northern samples, indicating four distinct populations of An. minimus in Cambodia (Figure 1). The populations defined by these PCA clusters are designated in this study as TD from Thmar Da, in Western Cambodia (n=41), on the Thai-Cambodian border, PV from the Northern province Preah Vihear (n=156), and the two distinct populations collected in Ratanankiri province in the Northeast, each including individuals collected over multiple years and at both collection sites, these are designated as populations RK1 (n=58) and RK2 (n=28).
To examine population differentiation, we computed differences in allele frequencies between each population using Pairwise Fst. Pairwise Fst between all 4 populations over the largest contig, KB663610, representing 16% percent of the An. minimus genome, (Figure 2) shows that differentiation was relatively low between populations of TD and PV with an average pairwise Fst of 0.003, while the difference between RK2 and the other three populations is ten-fold higher, around 0.03. Pairwise Fst estimates comparing these populations over other large An. minimus contigs indicates a similar level of differentiation, with average pairwise Fst values over 0.03 (Supplementary Table 3). The two sympatric populations from the Ratanakiri collection sites are as differentiated from eachother as they are from the northern and western cluster.
This level of differentiation of RK2, even from the RK1 population, might indicate an emerging cryptic species within An. minimus A or a newly diverging clade. RK1 and RK2 are sympatric populations, both being collected in the same two sites in Northeastern Cambodia. The differences seen here between RK1 and RK2 populations is consistent with cryptic taxa in other anopheline groups. For example, in the An. gambiae complex, the level of differentiation between recently diverged sibling species An. coluzzii and An. gambiae in West Africa is approximately 0.03 (19).
Population diversity and variation
To characterize population diversity among these populations, nucleotide diversity (π), Watterson’s Theta (θW), and Tajima’s D statistics were calculated over 4-fold degenerate sites on autosomal contigs larger than 2 megabases with 100 bootstrap replicates over samples. These 17 contigs represent 80% of the Anopheles minimus genome (Figure 2). The populations were downsampled for these calculations to have same sizes equal to that of the smallest population RK2 (n=28).
There are small but significant differences in the magnitude of the genetic diversity summary statistics between these four different populations. In particular, there were notable differences between the putatively cryptic taxa RK1 and RK2, two populations that were collected in the same sites. RK1 had higher levels of nucleotide diversity and lower levels of Tajima’s D than RK2. These differences are consistent with different population size history between these sympatric groups. In particular, higher levels of genetic diversity indicate larger effective population sizes of TD and PV. Lower values of Tajima’s D suggest stronger population growth in RK1.
These calculations indicate that RK2 has a significantly reduced nucleotide diversity and Watterson’s Theta compared to the other three populations. This may indicate a smaller population size and recent bottleneck of the RK2 population in Cambodia. All four An. minimus populations have a negative Tajima’s D, indicating an excess of rare variants, particularly in RK1. This suggests recent population expansions in all populations.
Signals of evolutionary selection
We used Fst to scan across the Anopheles minimus genome to look for regions of the genome with increased differentiation. When we scanned the genome using pairwise Fst, there were no apparent long signals of differentiation that might indicate a large inversion or other structural variant, known to be major drivers of adaptive evolution in other Anopheles groups. To further investigate potential large regions of increased differentiation, we performed scans of nucleotide diversity (π), Watterson’s Theta (θW), and Tajima’s D over the largest 14 contigs (representing 80% of the An. minimus genome). As with the Fst scans, there were not large regions of higher differentiation in any of the populations that might indicate major structural variants or inversions (Supplementary Figures 1-3).
Whole genome sequencing allowed us to identify pointed signals occurring across the entire genome using scans of average pairwise Fst. Isolated points of high differentiation were compared over single contigs with average pairwise Fst calculated over 1000 SNP windows and plotted over whole contigs. The strongest signals, indicated by the highest Fst value at the peak of a strong signal of differentiation, were ranked and compared. The five top signals in each of six comparisons between the four populations are listed in Table 1. These isolated points of high differentiation are one indication of a signal of evolutionary selection. The most differentiated regions by Fst occurred when comparing the RK2 population to the other three populations, with the highest selection peaks with pairwise Fst over 0.4. RK2 also had more distinct signals of selection when compared to the other populations than RK1. Since these signals of differentiation were highly localized, we could look to known gene annotations and gene predictions across the AminM1 reference genome to see which genes were close to the peaks of these signals. We have noted candidate genes of interest that were very close to the strongest Fst signal peaks and also had known or predicted gene functions (Table 1).
There is almost no indication of selection when comparing the Thmar Da population with Preah Vihear, with all but one signal with Fst values below 0.05. The one strong signal between TD and PV (Fst = 0.125) is near a Carbohydrate sulfotransferase, which is involved in detoxification processes. Comparing TD to RK1 and RK2 reveals multiple strong signals of selection, some that are present in both Northeastern populations as well as many unique RK2-specific signals (Figure 3, Supplementary figure 4).
Many of the strongest signals identified in this study may be associated with insecticide pressure on these An. minimus populations. The strongest selection signals in every population comparison were close to genes that are involved in detoxification, signal transduction, adaptations to oxidative stress, or have been functionally validated to have mutations that confer resistance to insecticides (Table 1). Some signals of interest include a strongly supported signal of selection in both RK1 and RK2 populations at 7.5Mb on the contig KB664266, which is in the same location as a cluster of glutathione-S-transferases, including GSTe2, which has been shown to be involved in the metabolism of DDT and pyrethroids, mutations in which mediate metabolic insecticide resistance (29). The signal with the highest pairwise Fst peak on the same contig KB664266, at 6Mb is an RK2 specific signal and close to an Ecdysteroid UDP-glucosyltransferase gene, which has been shown to confer pyrethroid insecticide resistance in An. stephensi (30).
Another notable signal is between the RK1 and RK2 populations on the contig KB663610, a Peptidase S1 domain-containing protein AMIN002286, which has been shown to be involved in response to parasite pathogens in insects (31). The signals of selection observed in this study are mostly distinct from the main selection signals seen in An. gambiae complex mosquitoes (19), the primary vectors of Plasmodium falciparum in Africa.
Insecticide resistance
We report here variants at known insecticide resistance associated alleles for each of the four An. minimus populations. Variants occurring at a frequency of more than 2% in at least one of the four populations are reported in the known insecticide-resistance associated genes Ace1, Rdl, KDR, and GSTe2 (Supplementary Table 2). GSTe2 mutants are present in multiple populations, at a low rate, and there are a few individuals in Thmar Da and Preah Vihear with the Rdl resistance mutation, which is known to confer resistance to cyclodiene insecticides, despite evidence from other studies that species in this region lack this resistance mutation (32). We did not investigate copy number variation, which is one mechanism by which GSTe2 confers insecticide resistance. These SNP variants indicate variation throughout these insecticideresistance associated genes and though most of these population do not currently have high rates of validated insecticide resistance associated mutations, this underlying variation provides potential for structural and transcriptional events resulting in greater levels of insecticide resistance in An. minimus populations.
Discussion/Conclusions
This is the first study investigating the population genomics of An. minimus. We examined the whole genomes of 302 wild-collected female mosquitoes from five different sites in Cambodia over several years. This important vector species appears to be highly structured within Cambodia, with a division of West, North, and North-eastern populations. This structure cannot be fully accounted for by geography. The RK1 and RK2 populations that are as differentiated from each other as An. coluzzii is from An. gambiae s.s., were collected in the same sites and at the same timepoints. The other populations PV and TD are separated by a few hundred kilometres with no major geographic barrier between these regions of Cambodia. The sympatric RK1 and RK2 populations are as distinct from one another as they are from the geographically distinct populations PV and TD. The RK2 population likely represents an emerging cryptic species or diverging clade within An. minimus s.s.
All four populations appear to be under selective pressure and may have undergone recent population expansions. These diverging populations of An. minimus each have unique signals of evolutionary selection. These signals are particularly prominent in the most differentiated population, RK2. Many of the signals identified in this study indicate a strong evolutionary response to insecticidal pressure in the GMS, this pressure is more likely due to heavy and longterm agricultural pesticide use rather than malaria vector control such as indoor residual spray or long-lasting insecticidal nets. Multiple mechanisms of insecticide resistance have been identified in other insects and in other disease vector mosquitoes in Cambodia (23, 24, 33). The widespread use of pyrethroid, organophosphate, and carbamate insecticides in agriculture exacerbates the development of insecticide-resistance in populations of many different species across the GMS (23).
This study shows that populations of An. minimus in Cambodia are evolving, and this may be in the context of strong and sustained pressure from agricultural insecticide use. Each population in this study has unique signals of evolutionary selection associated with multiple detoxification pathways and may also be under different types of insecticidal pressure than mosquitoes in other parts of the world. For example, since An. minimus is a species that is able to exploit rice paddies and flooded agricultural fields as larval habitats, individual mosquitoes may experience insecticidal pressure as larvae, which might impact their susceptibility to insecticides as adults. The presence of these signals reveals a complicated landscape of potentially multiple insecticideresistant vectors that may not be responsive to insecticides implemented as part of a malaria control campaign.
Current vector control tools in Cambodia and most of the GMS include the widespread distribution of long-lasting insecticidal nets (LLINs), and topical repellents. New control tools are needed in this region and in the context of outdoor, residual malaria transmission. The population structure within this single Southeast Asian vector species in a small country like Cambodia may indicate barriers for implementing vector control methods centered around genetic modification or sterile insect release. If populations of vectors are diverging with limited gene flow, despite being in the same place, genetic control based on any assumption of a single population of vectors would be unlikely to penetrate vector populations in this region.
This study expands the Anopheles species represented in whole genome population data. This resource of An. minimus whole genome data will be a starting point to explore Anopheles genome variation in important transmission contexts such as Southeast Asia. This study is the foundation for further delving into the population structure and history of An. minimus, which acts as an important malaria vector across all of Southeast Asia.
Examining the population structure in a single species over its range and through time will help us to better understand the potential risk associated with multiple insecticide-resistant populations of vectors and how to best approach managing and monitoring insecticide resistance. An. minimus in Cambodia includes an adaptive, resilient, and diverse set of populations that can thrive in human-created habitats and potentially contribute to epidemics of drug-resistant malaria as we work toward malaria elimination. It is crucial that we appreciate the evolutionary history and ecology of this important and widespread vector.
Materials and Methods
Collection methods
An. minimus female mosquitoes were selected from several previous field collections in Cambodia with the goal of including a subsample of An. minimus specimens from different field sites and timepoints for population genomic analysis. These anophelines were collected using CDC light traps, human landing collections, cow-baited tents, and barrier fences from seven different locations in Cambodia, from 2009 to 2014. Five of these sites had enough An. minimus individuals to include in population genomic analyses.
Collection locations included Thmar Da, in Western Cambodia on the Thai/Cambodian border (Thmar Da) and in Pursat Province (Angkrong and Veal villages), in the North in Preah Vihear province (Chean Mok pagoda and Preah Kleang village), and in the Northeast Ratanakiri (Sayas and Chamkar Mai villages)(34, 35). Multiple Anopheles species were collected in each of these studies, including the An. minimus A specimens that have been included in this study. GPS coordinates for each collection site are available in the sample metadata (Supplementary Table 1, 5).
DNA extraction and identification
Specimens were morphologically identified immediately after collection using keys from (36, 37). These were stored individually in 1.5ml tubes with silica gel desiccant. DNA was extracted from the head and thorax of collected mosquitoes using either a CTAB extraction method or in Nextec plates. Individual mosquitoes were molecularly identified using Sanger sequencing of the rDNA ITS2 region using primers ITS2A and ITS2B from (38) and compared to voucher reference rDNA ITS2 sequences in NCBI.
Whole genome Sequencing
Individual samples were sequenced at the Wellcome Trust Sanger Institute using whole-genome sequencing on the Illumina HiSeq 2000 platform. Paired-end multiplex libraries were prepared as per the manufacturer’s specifications, with DNA fragmented using Covaris Adaptive Focused Acoustics. Multiplexes comprised 12 tagged individual mosquitoes. Three lanes of sequencing were used for each multiplex to control for variation between sequencing runs. Sequencing and cluster generation were done using the manufacturer’s protocol for paired-end reads of 150bps with insert sizes from 100-200 bp. The target coverage for each individual was 30X. To include some samples that had DNA concentration that was too low to run through the standard Illumina pipeline, we whole genome sequenced a set of specimens using a low-input DNA sequencing pipeline. In our sequence and population QC, these samples did not appear to be of lower quality or influence the clustering of whole genome sequences in our PCA analysis.
Sequence alignment
Illumina whole genome sequence reads were aligned to the Anopheles reference genome AminM1 using BWA version 0.7.15.(39, 40) using bwa mem. A BAM file was constructed for each individual using merged alignments from multiple lanes. Duplicate reads were marked using Picard version 2.6.0 (41).
Coverage
For each sample, depth of coverage was computed at all genome positions. Samples were excluded if median coverage across all chromosomes was less than 10×, or if less than 50% of the reference genome was covered by at least 1×.
Population outliers and anomaly detection
We used principal component analysis (PCA) to identify and exclude individual samples that were population outliers. SNPs were down-sampled to use 100,000 segregating non-singleton sites from autosomal contigs. PCA was computed using scikit-allel version 1.2.0. We iteratively identified and excluded any individual samples that were outliers along a single principal component. We then identified and excluded any individual samples or small sample groups that clustered together with other samples in a way that was not plausible given metadata regarding their collection location.
Quantification of population diversity and divergence
Diversity statistics Pairwise Fst, nucleotide diversity, Tajima’s D, and Watterson’s Theta were calculated in windows across the largest 40 contigs on the An. minimus AminM1 genome using scikit-allel version 1.3.3. and custom code in Python.
Nucleotide diversity (π), Watterson’s Theta (θW), and Tajima’s D statistics were calculated over 4-fold degenerate sites on autosomal contigs larger than 2,000,000 bases. These 17 contigs represent 80% of the Anopheles minimus genome. These statistics were calculated as an average over sliding windows with 100 bootstrap replicates over samples. Populations were downsampled to the size of the smallest population RK2 (n=28) for these calculations.
Quantification of signals of selection
We used pairwise Fst in 1000 SNP windows to scan across the Anopheles minimus genome to look for any signals of increased differentiation, which might indicate evolutionary selection. This scan was performed for contigs larger than 2 Megabases, these contigs represent 86% of the An. minimus genome. Signals that were represented by at least 5 1000 SNP windows above two times the genome-wide average were recorded and ranked by the highest Fst value at the peak of each selection signal.
Genes potentially associated with these signals were identified by going to the location of the peak of each signal on the An. minimus genome (AminM1) using the Genome Browser tool in VEuPathDB Vectorbase (28) and walking in both directions from the location signal to see which annotated genes were present in that location. Gene IDs and gene functional predictions were recorded.
Variants in known insecticide-resistance associated genes
We report in this study variants at known insecticide resistance associated alleles. Variants occurring at a frequency of more than 2% in at least one of the four populations are reported in the known insecticide-resistance associated genes Ace1, Rdl, KDR, and Gste2 (Supplementary Table 2).
Data accessibility
These genome data are available at https://www.malariagen.net/resource/35. These data are hosted in Google Cloud. For more information about downloading data, please see the data download guide. For more information about accessing data in the cloud, please see the cloud access guide.
Author contributions
BSTL, AM, and DK conceived of the study. BSTL designed and directed the field collections. BSTL, KO, CS, MS, RS, and SN carried out the field collections in multiple sites in Cambodia with KO managing entomological data collection and SS managing personnel and site coordination. BSTL and ND performed molecular analysis for molecular species identification and plasmodium detection. ND, ED, KR, and SG were involved in preparing samples and sample data for whole genome sequencing. BSTL, NH, and AM carried out the population genomic analyses. BSTL prepared the manuscript. All authors have approved the manuscript.
Conflict of interest
The authors declare no conflicts of interest.
Acknowledgements
Thank you to our collaborators at the National Center for Parasitology, Entomology, and Malaria Control (CNM) in Phnom Penh, Cambodia, the National Institutes of Health NIAID LMVR, and the MalariaGEN team for their support of this work.