Abstract
Relatively few genome-wide population studies of field-acquired insecticide resistance have been carried out on agricultural pests. We utilized the navel orangeworm (Amyelois transitella), the main insect pest of almond orchards in California, to study the short and long-terms effects of heavy insecticide usage in the population genomic landscape of insects. We re-sequenced the genomes of three contemporary A. transitella natural populations differing in resistance status and characterized its population genetics parameters. We detected an exceptionally large selective sweep present in all populations. This sweep has virtually no polymorphisms and extends up to 1.3 Mb (spanning 43 genes) in the resistant population. We analyzed the possible causes of this unusually strong population genetic signature, and find genes in the sweep that are associated with DDT and pyrethroid resistance. Moreover, we found that the sequence along the sweep is nearly identical in the genome assembled from a strain founded in 1966, suggesting that the underpinning for insecticide resistance may have been laid a half-century ago when the California Central Valley experienced massive use of DDT, or that DDT residues in the soil may still be at play as a selective pressure. Our findings show the effects of insecticide applications in genomes of agricultural pest. We show that insecticide application in this species has probably evolved as a stacking of selective pressures that started decades ago and which effectively reduced variation in a region that includes several genes that confer resistance to insecticides that share a mechanism of action.
Background
Since the rapid post-World War II expansion of the use of synthetic organic insecticides, more than one thousand individual cases of resistance involving more than 500 arthropod species have been documented [1, 2]. Many of the underlying genetic mechanisms of metabolic resistance and target site insensitivity have been associated with individual genes [3, 4]. Sufficiently intense pesticide selection, however, can result in acquisition of multiple resistance mechanisms within target pest populations and are associated with multiple genes and larger genomic regions. Studies of selection for insecticide resistance at the genomic scale have been focused mainly on drosophilid model species [5–7] or medically important mosquito species [8–10]. Relatively few genome-wide population studies of field-acquired insecticide resistance have been carried out on agricultural pests, a surprising omission given the frequency and economic consequences of resistance evolution in insects.
The primary insect pest of California almonds (Prunus dulcis), a >$ 7 billion industry, is the navel orangeworm (Amyelois transitella, Lepidoptera: Pyralidae). In addition to almonds, the highly polyphagous A. transitella also causes significant losses in other nut and fruit tree crops, including pistachios (Pistacia vera), walnuts (Juglans spp.), and figs (Ficus carica) Nut crops are vulnerable to oviposition by A. transitella adults and damage by larvae when kernels are exposed at hull split. Beyond this direct damage, the navel orangeworm is a facultative mutualist of the fungal pathogen Aspergillus flavus and, by serving as a vector, can increase the likelihood of contamination by aflatoxins, reducing crop quality and marketability [11]. Adding to its potential for causing economic losses is the fact that, in California, A. transitella is multivoltine, with up to four generations per year depending on weather.
Insecticides have been used intensively in almond orchards to protect them from the navel orangeworm, and these are generally applied at hull split to reduce feeding damage, and up to three times over the course of the growing season. Although insecticides approved for use include representatives from five structural classes, pyrethroids have been widely used due to their low cost and high efficacy. Prior to 2013, insecticide resistance was not known to occur in navel orangeworm populations; in 2013, however, ten-fold resistance to bifenthrin was documented in populations in Kern County [12].
Taking advantage of the recent sequencing of the A. transitella genome (http://i5k.github.io/), we used a population genomics approach to locate regions of the genome under insecticide selection; We evaluate nucleotide diversity and genomic differentiation across windows of the genome in three modern populations of this species. Additionally, we characterize the population genomic parameters in families of genes whose members are usually involved in detoxification processes. In the process, we detected an extremely large selective sweep across all the sequenced populations. We analyzed the mutations on genes along the sweep and compare those the reference genome to find that the reference genome, from a laboratory strain founded in 1966, has an almost identical sequence along ~0.6 Mb of the sweep. We review patterns of pesticide use in the Central Valley to understand the origins of this selection signature.
Results
Genome re-sequencing and population genomics parameters
One hundred individuals from each of three navel orangeworm populations from the Central Valley were sampled and sequenced in pools. Two populations were collected in Madera County: one from almond orchards (ALM), and one from figs (FIG). Both populations displayed field susceptibility to bifenthrin at the time of collection. A third population was sampled from Kern County almond orchards where resistance to bifenthrin was first reported (R347) The sequencing resulted producing over 700 million 150 nt-long paired reads with an average Phred quality score higher than 30 throughout each read and resulted in roughly 82X coverage of the 409-Mb A. transitella genome for each of the strains sequenced (Supplementary Material, Table S1).
The reads were mapped to our A. transitella reference genome, which was sequenced from a laboratory strain (SPIRL-1966). After mapping, filtering, and subsampling reads, the total number of called SNPs for the ALM population was 10,592,326, for FIG was 10,303,426 and for R347 was 9,704,346. The nucleotide diversity, measured as Tajima’s π across non-overlapping 5kb-long windows for each of the populations, was on average 0.0179, 0.0175, and 0.0171, for ALM, FIG, and R347, respectively. An unusually large region displaying substantially reduced genetic diversity across the three populations was readily detected by screening Tajima’s π values that were calculated with 5kb non-overlapping windows across the genome (Table 1). We narrowed down this region of low nucleotide diversity to the region starting around 2.92 Mb in Scaffold NW_013535362.1 and extending up to the base at approximately 3.8 Mb in the ALM and FIG populations, and up to the base at approximately 4.32 Mb in the R347 line. The region of low nucleotide diversity that spans the three populations encompasses 38 genes. In the resistant line, the total selective sweep extends to up to 43 genes (Figure 1, Panel A). An examination of the read alignments to the reference strain SPIRL-1966 shows that the nucleotide sequence of the reference is nearly identical to that of the three other populations along the hard sweep region. In addition, the reference genome sequence is also nearly identical to that of R347 in the regions flanking the sweep (Supplementary Material, Figure S1).
Selective sweeps are the result of beneficial mutations that rapidly became fixated in the population and carried along neighboring variants, resulting in a local loss of heterozygosity; in a complete hard sweep heterozygosity reaches zero at or very close to the causative mutation [13, 14]. To further narrow down the causative mutation that could have generated the observed pattern, the Tajima’s π metric was re-calculated on a gene-by-gene basis (where the length of the window to account for variable nucleotides is equal to the length of the gene) on the predicted gene models across the entire NW_013535362 scaffold. Of the 190 annotated gene models, 170 had enough sequencing coverage to calculate nucleotide diversity at the set cut-offs. In the ALM population, nucleotide diversity reached zero (π= 0) at the protein kish-A gene (XM_013328236.1), with no SNPs detected along this coding sequence, followed by the voltagegated sodium channel “para” (XM_013328250.1), with a single SNP. In the FIG population, the lowest π value (0.0001) was also found in para, followed by the gene encoding the cytochrome P450 CYP6B56 (XM_013328369.1). In the R347 strain, π was zero at the gene coding for CYP6B56, followed by the gene coding for “protein Ariadne”, a putative RING-type transcription factor (Table 2). The coverage and number of SNPs for each of the genes in the scaffold is detailed in Supplementary Material, Table S2).
We then manually screened the few nucleotide variations along the region where all three populations have near zero π values. Only two genes had non-silent mutations in all or in a portion of the reads covering the position across the three populations. One of these was the para gene (which also showed Tajima’s π values equal to zero), with the mutation L934F. All three populations carried the mutation in 100% of the sequenced reads covering the position, in contrast with the reference genome (Figure 2, panel A). This mutation corresponds to the known kdr (‘knock down resistance’) mutation that confers target site resistance to DDT and pyrethroids [15] in multiple insect species [16–20]. Although the reference genome shows nearly identical nucleotide sequence along para, it does not have the kdr mutation. In the absence of population level data, we confirmed that the reference strain did not carry the kdr mutation by re-sequencing the region flanking this mutation on ten SPIRL-1966 individuals collected in 2012 and preserved in our laboratory (Supplementary Material, Figure S2). A second gene encoding a Krüppel-like-9 factor (KL-9) had non-silent mutations segregating in 82% and 91% of the sequenced reads from ALM and FIG, respectively, and in only 9% of the reads in the resistant R347. KL-9 overlaps the region where all three populations have low diversity and the region where only the R347 strain has reduced π values (Figure 1A, Figure 2B, and Supplementary Material, Table S3).
We then calculated nucleotide diversity in the mRNA sequences of members of four detoxification gene families whose members might be involved in insecticide resistance - cytochrome P450s, glutathione-S-transferases (GSTs), ABC transporters, and carboxylesterases (COEs) - across the whole genome. The calculation of Tajima’s π in this case used the gene length as window size. In addition to the three P450 genes located in the selective sweep, 57 other P450 genes had enough read coverage to calculate nucleotide diversity according to our set cut-offs. The R347 population showed a low π (i.e., less than half the median of the three populations in the gene) in CYP6AW1 relative to ALM and FIG and relative to the rest of P450s in the set, suggesting possible local positive selection on this gene in the resistant line (Supplementary Material, Figure S3 A). Among the ABC transporters, 29 of 56 had enough read coverage in at least two of the populations to calculate π but none of them had nucleotide diversity π below half of the median of the three populations, with the lowest π values in the gene coding for Atra_ABC-C6 (XM_013339562.1). Among the GSTs, 18 had enough SNP coverage to calculate a π, but none was below the median of the three population, and Atra_GSTω1 (XM_013335304.1) had the lowest π across all GSTs in the three populations (Supplementary Material, Figure S3 B). Forty-two of 66 COEs had enough SNP coverage, where a clade B α-esterase (XM_013334043.1) has the lowest nucleotide diversity in the three populations, and no COE had significantly lower π when compared across populations (Supplementary Material, Figure S3 C).
Genetic differentiation measured as FST between populations showed that ALM and FIG are probably indistinguishable as separate strains (FST −0.004, as negative FST values are effectively zero). The FST between ALM and R347 was 0.0289, and between FIG and R347 0.0290. These values are highly reflective of the demographic origins of each population (i.e., Madera County, where ALM and FIG were sampled vs. Kern County, the source for the R347 strain), with ALM and FIG probably indistinguishable as separate strains. Genes controlling traits that differ between populations might present large differences in allele frequencies, thereby generating higher FST values comparable with average FST across the genome [21]. We scanned for outlier loci showing SNPs with large differences in allele frequency by calculating FST on a by-SNP basis utilizing two different methods (see Materials and Methods section). The SNPs with top 1%FST values were concentrated in the regions flanking the selective sweep (Figure 1B and Supplementary Material, Figure S4). These high FST values were all found in pairwise comparisons with the resistant genotype (i.e. ALM vs. R347 or FIG vs. R347).
The FST method may not account for unexpected population structure and requires that each sequenced individual is assigned to a single population, an assumption that might not be valid. To verify our findings by overcoming the limitations of FST, we utilized an approach based on principal component analysis (PCA) on the calculated major allele frequencies [22], under the assumption that SNPs that are highly correlated with underlying population structure are candidates for local adaptation. We screened the top 20 differentiated SNPs (FDR-corrected p-value <0.07, k=2). Of those, only 2 SNPs were differentiated between the resistant and susceptible populations: one in scaffold NW_013535509.1, in an intergenic region nearby the ecdysteroid kinase gene cluster, and a second in the NW_013535362.1 scaffold, just downstream of the selective sweep in position ~4.1Mb (Supplementary Material: Table S4, and Fig 3). To assess whether the high FST outliers found in scaffold NW_013535362.1 are responsible for the overall genetic differentiation between populations, we re-calculated the overall pairwise FST values after removing the region between position 2.95 Mb and 3.93 Mb in NW_013535362.1. The resulting FST values were almost identical to the ones obtained with the full genomes, indicating that, other than the region surrounding the selective sweep, these populations are likely differentiated as consequence of local (i.e., geographically restricted) selection, including population bottlenecks, genetic drift, or migration.
Given the size and the strength of the signature of selection that we identified, we hypothesized that genes other than para could be involved in insecticide resistance. Of the genes in the sweep, the chain of three cytochrome P450 genes (CYP6B54, CYP6B55 and CYP6B56) are strong candidates, given that genes in this sub-family have been implicated in pyrethroid resistance in other species [23–25]. Tandem clusters of P450s are likely recent duplications that have persisted due to their adaptive value. We compared syntenic regions of A. transitella and Bombyx mori (the domestic silkworm) and found only one B. mori P450 (CYP6B29) adjacent to para. In the Old World bollworm Helicoverpa armiguera, however, there is also a chain of three CYP6B-subfamily genes just adjacent para, these same P450s that were previously associated with pyrethroid resistance in this species [23, 24] (Figure 4). Available RNA-seq data (NCBI accession: PRJNA548705), shows that transcripts for CYP6AB54, CYP6AEB55, and CYP6AB56 are highly constitutively expressed in samples from both ALM and R347 populations (i.e. up to tenfold higher than the average counts per million in the region); in bifenthrin-treated individuals from the R347 resistant population, CYP6AB56 transcript levels are upregulated up to 1.6 fold relative to non-treated individuals (Supplementary Material: Table S5 and Figure 1C).
Bioassays and analysis of historical data on insecticide usage
To determine the functional significance of the identified genomic signatures, we compared levels of resistance to bifenthrin and DDT in ALM and R347 with CPQ, a strain that lacks the kdr mutation and is derived from the original SPIRL-1966 line. The CPQ strain was the most susceptible to bifenthrin with an LC50 value of 0.38 ppm; in contrast, the R347 strain displayed the highest level of resistance, with a LC550 value of 24.27 ppm, confirming field observations of insecticide failure. The LC50 of ALM was 7.45, comparable to that of FIG reported by Bagchi et al. [26]. There was no difference in LC50 values for DDT between the ALM and the R347 populations, both displaying relatively high resistance to DDT (LC50=259.85 and 310.33 ppm, respectively) compared to CPQ (LC50 =25.32 ppm) (Table 3). The data shows a direct correlation between the presence of kdr and DDT resistance but not between kdr and bifenthrin resistance.
To gauge the level of selective pressure exerted by bifenthrin usage in the sampled populations, we reviewed historical patterns pesticide usage using data from the California Department of Pesticide Regulation (CDPR) annual reports from 1990-2017. Statewide, the use of bifenthrin increased 7.3-fold by pounds applied and 4.5-fold by treated acres from 2007 to 2013 (Supplementary Material, Figure S5). During this period, Kern County applications in almonds increased 3.5-fold by pounds of bifenthrin applied and 4.4-fold by treated acres, while Madera County applications increased 10.1-fold in pounds applied and 5.2-fold by treated acres. Applications of bifenthrin per pound, however, were higher in Kern County throughout this period, except in 2012 (Figure 5). From 2009 to 2013, the number of registered products containing bifenthrin that were applied in almond orchards increased from 1 to 13 (Supplementary Material, Table S6). By 2017, 19 products containing bifenthrin were used to treat almond orchards.
Discussion
A classic hallmark of positive selection is a reduction in nucleotide diversity in genome regions flanking genes that have been the targets of such selection [27]. This shift in allele frequencies, or “selective sweep” [28], is dependent on recombination rates in the genomic region as well as in the timing and strength of the underlying selective pressure. Classic selective sweeps or “hard sweeps” (where heterozygosity reaches near zero values) are rare and are not reproducible in experimental evolution studies, where for the most part soft sweeps and/or multi-locus resistance are obtained [14, 29]. Examples of classic selective sweeps in insects are few. Hudson et al. [30, 31], e.g., detected a ~50-kb selective sweep surrounding the SOD locus in Drosophila melanogaster. Using high-density QTL markers, Lattorff et al. [32] detected a locus in the honeybee Apis mellifera where the allele frequency reached near fixation (allele frequency = 0.97) in a period of seven years under selection with the ectoparasitic mite Varroa destructor.
Within the context of pesticide selection, only a handful of studies have identified sweeps associated with resistance and, in all cases to date, the sweeps were relatively small (< 200 kb). Schlenke and Begun [33] reported a ~100-kb selective sweep associated with a Doc transposable element inserted in Cyp6g1 in Drosophila simulans, although the sweep was only marginally associated with DDT resistance. Soft, incomplete selective sweeps were found in the α-esterase gene cluster that contains the polymorphic LcαE7 gene encoding forms of the protein conferring organophosphate insecticide resistance in the Australian sheep blow fly Lucilia cuprina [34]. A region containing the quantitative trait loci responsible for pyrethroid resistance in the malaria vector mosquito Anopheles funestus displayed characteristics of a selective sweep, which was then narrowed to the CYP6P9A and CYP6P9B clusters [35]. Song et al. [36] analyzed nine Z chromosome-linked loci in different populations of the Old World bollworm Helicoverpa armigera and detected a region possibly indicative of a selective sweep surrounding the Cyp303a1 locus.
In the navel orangeworm, a crop pest regularly subjected to repeated, widespread, heavy use of a single insecticide, we have confirmed the existence of a large (~1.3 Mb) region showing characteristics of a selective sweep in populations that show varying levels of resistance to bifenthrin. The sweep region contains 38 genes in the two susceptible lines sequenced and extends up to 43 genes in the resistant line. In this region, there are virtually no polymorphisms across 0.5 Mb (spanning 22 genes) among the three sequenced populations. The sweep then extends to both sides in the scaffold with increased – but still very low – polymorphism in ALM and FIG to the right of the hard sweep and increased polymorphism in the three populations to the left. In addition, the regions flanking both sides of the selective sweep showed very high FST values between the resistant R347 and either the ALM or FIG populations. Possibly, the genetic basis of the phenotypic differences (i.e. resistance to bifenthrin), lies in part or in whole in this flanking region.
To evaluate the possible target or targets of selection that caused the observed patterns, we re-measured nucleotide diversity across each gene coding sequence and analyzed the amino acid sequences to verify the presence of non-silent mutations. The flanking region to the left of the hard-sweep, where the sweep is better characterized as a soft-sweep in all three populations, includes 15 genes, including cytochrome c oxidase, a cyclin-dependent kinase, a phosphomevalonate kinase and Krüppel-like-9 transcription factor, among others. Of these, only cytochrome c oxidase has been directly associated with pyrethroid resistance in an insect [37]. The Krüppel-like-9 transcription factor, however, has been implicated in regulation of P450 expression in mammals [38], and it showed segregating non-silent mutations that differ between the resistant and both susceptible strains. To the right of the sweep, where only R347 retains low nucleotide diversity, in addition to a gene encoding a small conductance calcium-activated potassium channel (SkCa2), there are two uncharacterized proteins and an extensin-like proteincoding gene. Although calcium and chloride ion channels have recently been found to be affected by pyrethroids in mammals [39], to date there is no evidence that insect SkCa2 channels are targeted by pyrethroids.
The first gene in the hard-sweep region is para (paralytic), with the mutation that confers partial resistance to pyrethroids (kdr) in all three sequenced populations. The kdr locus has long been associated with resistance both to DDT and to other neurotoxic insecticides, including pyrethrins and pyrethroids [40, 41], in many insects; kdr maps to a single recessive point mutation, resulting in an amino acid substitution from L to F in the S6 transmembrane segment of domain II of the para protein, which is a voltage-gated sodium channel, the main target of pyrethroids and DDT [42]. The kdr mutation is present in ALM, FIG, and R347 strains and absent in the reference genome and in the CPQ strain. Our bioassay results showed that resistance to DDT is highly correlated with the presence of the kdr mutation (Table 3). However, differences in resistance to bifenthrin do not completely correlate with the kdr mutation in the sequenced populations, indicating that additive factors are involved in resistance outcomes, including other beneficial mutations that might be responsible for the large selective sweep.
The para gene is followed by a cluster of three cytochrome P450 genes (CYP6B53, CYP6B54 and CYP6B56), where CYP6B56 has a π value of zero in R347. Although the precise substrate specificity of these P450s has not been defined, they belong to a CYP subfamily associated in other lepidopterans with pyrethroid metabolism [43] and pyrethroid resistance [25, 44]. Assays with piperonyl butoxide, a P450 synergist, have implicated these enzymes in pyrethroid detoxification in A. transitella [12]. Because there are no mutations in the coding sequences in these P450s in any of the sequenced strains, if their function has been the target of selection the causal variant may be located in a regulatory region. The fact that these three P450s are present as a tandem cluster indicates that they originated by duplication, an additional indication of their adaptive value.
The insecticidal properties of DDT were discovered in 1939. DDT was heavily used by the military for vector control during World War II. Its release to civilian populations in 1945 led to widespread adoption for use against agricultural pests until 1972, when virtually all federal registrations for its use in the USA were cancelled by the Environmental Protection Agency. Despite its curtailed use, DDT and its derivatives remain in the environment, persisting in the soil as contaminants and moving from there across trophic levels and geographic regions [45]. The A. transitella SPIRL-1966 line was established in the USDA laboratory facility in the San Joaquin Valley in 1966, during a period when the Central Valley area of California was subjected via crop-dusting to pesticides estimated at the time as amounting to “thousands of tons of DDT alone” [46].
Based on that information, and absent any information about recombination rates in A. transitella, we suggest that the large hard sweep documented in contemporary field populations of this insect is likely attributable to a stacking of selective forces that prevailed at the time of the founding of the SPIRL-1966 colony and that persisted under continuous selection until a new reinforcing selective pressure arrived in the form of pyrethroids. The fact that the genomic region of low variability in contemporary populations is nearly identical to the sequence in the SPIRL-1966 reference genome suggests that this reference strain had probably undergone a population bottleneck, possibly by selection with DDT, effectively reducing the number of alleles that were then maintained in the laboratory. Although SPIRL-1966 does not carry the kdr mutation, our findings are consistent with a scenario whereby it may have lost the mutation after generations of relaxed selection under laboratory conditions [47].
High levels of resistance to pyrethroids in Kern County could have emerged as a result of a “perfect storm” of contributing factors—a newly cheap and widely available insecticide being used over an extensive area for a multivoltine pest requiring multiple applications for control in a high-value crop. Reports of selective sweeps of the size we have identified in this study are vanishingly rare. Tian et al. [48] reported a 1.1 Mb sweep in domesticated Zea mays, with similar evidence of multiple selection targets, the domestication of maize likely resulted from artificial selection exerted over thousands of years [49], a significantly longer period of time than the time required for DDT and bifenthrin resistance to appear in A. transitella in Kern County. Within the context of pesticide selection, only a handful of studies have identified sweeps associated with resistance and, in most cases to date, the sweeps were relatively small (< 200 kb) [33] [35, 36]. In 2017, Kamdem et al. [10] compared urban and rural populations of mosquitoes in the Anopheles gambiae complex in Cameroon, where DDT is still used for malaria control, and identified a selective sweep containing circa 80 genes, among which was the kdr locus.
Conclusions
Using a population genomics approach, we identified an unusually large selective sweep that spans several populations of the A. transitella genome. We conclude that this sweep is the result of previously selected regions that compounded with the recent heavy application of pyrethroids in the Central Valley of California. These findings provide evidence that decisions based on pest control are affecting the genomic landscape of insects, possibly leading to a stacking of resistance genes that could potentiate the increase in resistance and survival. Our study exemplifies the ability of humans to alter and accelerate the pace of evolutionary change in target species [50]. Driven by short-term economic gain, decisions on pesticide usage are increasing the environmental and economic costs of food production, leading to a long-term damage.
Methods
Materials and Methods
Genome re-sequencing
The ALM and FIG strains of A. transitella were collected from orchards in Madera County as larvae (JPS, USDA-ARS, Parlier, CA) in 2016. Larvae from the pyrethroid-resistant strain R347 were collected from almond orchards in Kern County and sent to us by Brad Higbee (Trece) in 2016. All three populations were maintained in an incubator at University of Illinois at Urbana-Champaign at temperatures of 28 ± 4°C and photoperiod of 16:8 (L:D) h cycle, reared until adulthood and separated by sex before freezing at −80°C. Genomic DNA was extracted from the heads of 100 adult moths (equal sex ratios) Insects were ground in liquid nitrogen, lysed overnight with SDS and Proteinase K, treated with RNase A, and centrifuged in a high-salt solution to precipitate proteins. The DNA was precipitated with ethanol, re-suspended in 10 mM Tris pH 8, and evaluated quantitatively and qualitatively with a Qubit fluorometer (Thermo Fisher Scientific, USA) and checked for degradation on an agarose gel. Subsequently, 2.5 μg of male head DNA were combined with 2.5 μg of female head DNA into a single tube for each of the three strains. Shotgun genomic libraries were prepared with the Hyper Library construction kit (Kapa Biosystems, Wilmington, MA) from equimolar-pooled DNA samples from each of the three populations. Libraries were quantitated with qPCR and paired-end sequenced for 150 bases-long reads on one lane of the Illumina HiSeq 4000 (Illumina, San Diego, CA). Library construction and sequencing were carried out at the W.M. Keck Center of the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign.
Alignment, SNP calling and calculation of population genomic parameters
Reads were trimmed for residual adapters and quality using trimmomatic v. 0.32 [51]. The trimmed reads were aligned to the A. transitella reference genome (NCBI accession ASM118610v1) using bwa mem v. 0.7.12-r1039 with paired-end mode [52]. The resulting SAM files were sorted and optical- and PCR-derived sequencing duplicates were marked and removed using Picard v. 1.48 (Broad institute, http://broadinstitute.github.io/picard/). Low-quality alignments (mapping quality score < 20), improper pairs and un-mapped mate reads were removed using SAMtools v.1.7 [53]. A pileup file was created for each separate library using the mpileup command from SAMtools. Indels were identified and separated from the main pileup files. The three pileup libraries were then sub-sampled for uniform SNP coverage of reads using ‘identify-genomic-indel-regions.pl’ and ‘subsample-pileup.pl’ scripts from the Popoolation v.1.2.2 package utilizing the max-coverage =100 parameter [54]. The subsampling was done to prevent biases in population genomic metrics affected by sequencing errors and copy number variation that could skew coverage in affected regions. Tajima’s π was calculated for each of the strains (libraries) using 5 kb-long windows and 5 kb-long steps using the ‘Variance-sliding.pl’ script from Popoolation. On a separate pipeline, a multiple-pileup file was created that incorporated the three trimmed and quality-filtered mapped libraries, indels were removed as described and libraries were synchronized using ‘mpileup2sync.jar’ from Pooplation2 software (v.1.201) [55]. The 3-populations pileup file was subsampled for uniform coverage and pair-wise FST values were calculated using the ‘fst-sliding.pl’ script from Popoolation2 on a per-SNP basis.
For pooled DNA sequencing, FST values might present bias if the sample is too small or if non-equimolar amounts of DNA for each of the individuals from each population were used. Bias can also be introduced during the PCR amplification step before sequencing [56]. Even though FST calculation in Popoolation2 implements a bias correction [55], the estimates are still deemed biased according to Hivert et al. [57]. For that reason, to validate our FST estimates, the synchronized file with data from the three populations obtained using ‘mpileup2sync.jar’ from Pooplation2 software (see above section), was converted to a pooldata object for the “Poolfstat” v. 1.0.0 R package (https://cran.r-project.org/web/packages/poolfstat). Both methods had similar results, but Popoolation2 does not report overall FST between populations, we report results from both packages. ‘Pcadapt’ v. 1.1 (50) was used to detect the outlier SNPs based on Principal Component Analysis (PCA). In PCAdapt, z-scores were calculated based on the original set of SNPs using K = 2 principal components. Outliers were then identified on the z-scores vector using Mahalanobis distances. The distances were transformed into P-values assuming a chisquare distribution with K degrees of freedom [22].
Sanger sequencing the para locus in laboratory strains
For SPIRL-1996, we obtained genetic material from frozen whole-body fifth instar larvae, as the strain is no longer available as a laboratory colony. DNA was extracted using an E.Z.N.A.® insect DNA kit (Omega Bio-tek, Norcross, GA) according to the manufacturer’s instructions. For the CPQ strain, we utilized existing midgut cDNA sampled from ten different individuals; The available CPQ cDNA was previously prepared from total RNA derived from midguts of fifthinstar larvae fed on semi-synthetic diet using a Protoscript II kit (NEB, Ipswich, MA). PCR was carried out on both strains using primers designed to flank the region of the kdr mutation in the para gene (Forward 5’- ACCAAGGTGGAACTTCACAGAT -3’ Reverse 5’- AGCAATTTCAAGAAGTCAGCAACA -3’). PCR amplicons were sequenced and sequences were aligned to the reference to verify the presence or absence of the mutation.
Insecticide bioassays
To establish the median-lethal concentrations (LC50) for bifenthrin and DDT in the sequenced strains, we used feeding assays with semi-synthetic artificial diet [58]. Bifenthrin (Chem Service Inc., West Chester, PA), or DDT (Sigma-Aldrich Co., St. Louis, MO) was stirred into the diet at different concentrations for each strain and poured into separate 1-oz (28 ml) cups to set. Treatments and concentrations were: bifenthrin in methanol – ALM: 2 ppm, 5 ppm, 10 ppm, 12 ppm, 15 ppm, 24 ppm; DDT – ALM: 50 ppm, 100 ppm, 200 ppm, 300 ppm, 400 ppm; bifenthrin – R347: 8 ppm, 16 ppm, 24 ppm, 48 ppm, 75 ppm; DDT – R347: 50 ppm, 100 ppm, 200 ppm, 300 ppm, 400 ppm; DDT – CPQ: 10 ppm, 20 ppm, 35 ppm, 50 ppm, 75 ppm, 100 ppm. Four neonates were transferred with a soft brush into each plastic cup containing bifenthrin or methanol as the solvent control. Twenty larvae from each strain were exposed to their respective bifenthrin or DDT concentrations and each assay was replicated three times per concentration.
Neonate mortality on diets was assessed after 48 h and scored according to a movement response after being touched by a soft brush. Probit analysis (SPSS version 22, SPSS Inc., Chicago, IL) was applied to identify median-lethal concentrations (LC50). Differences between populations were considered significant if their respective 95% confidence intervals in the Probit analysis did not overlap. We were unable to perform these assays with the FIG strain because we did not establish a colony from the population used for sequencing; however, for the purposes of comparison we cited the bifenthrin LC50 established in our laboratory by Bagchi et al. [26].
Obtaining pesticide application data
Records of pyrethroid use were accessed through the California Department of Pesticide Regulation (CDPR) - pesticide use annual reports from 1990-2016. Total bifenthrin use in almond orchards were analyzed in Kern County, Madera County, and statewide based on number of applications, pounds of active ingredient, and acres treated from 2006-2016. We also examined records of all pyrethroids applied in almonds from 2000-2016 and compared bifenthrin use relative to all registered pyrethroids by pounds of active ingredient and acres treated.
Data Availability
Data were deposited in the NCBI sequence read archive (SRA), under accession numbers:
ALM: PRJNA544523, SAMN11842179, SRX5891168, SRR911709;
Fig: PRJNA544523, SAMN11842181, SRX5891170, SRR9117089;
R347: PRJNA544523, SAMN11842180, SRX5891169, SRR9117090
Acknowledgements
We thank Dr. Mathew Hudson of the University of Illinois at Urbana-Champaign for helpful discussions and suggestions. We also thank Jeffrey Haas for helping with computing resources. This work was funded by the Almond Board of California (ABC grant# ABC15.ENT01).
Mention of trade names and commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. US Department of Agriculture is an equal opportunity provider and employer