Abstract
Organisms vary in their dispersal abilities, and these differences can have important biological consequences, such as impacting the likelihood of hybridization events. However, there is still much to learn about the factors influencing hybridization, and specifically how dispersal ability affects the opportunities for hybridization. Here, using the ecological replicate system of dove wing and body lice (Insecta: Phthiraptera), we show that species with higher dispersal abilities exhibited increased genomic signatures of introgression. Specifically, we found a higher proportion of introgressed genomic reads and more reticulated phylogenetic networks in wing lice, the louse group with higher dispersal abilities. Our results are consistent with the hypothesis that differences in dispersal ability might drive the extent of introgression through hybridization.
Introduction
Dispersal is the permanent movement of organisms away from their place of origin. It is a fundamental process in biology with significant implications at multiple scales of organization (Barton 1992; Clobert et al. 2001; Nathan 2001; Matthysen 2012), including the reproduction of individuals, the composition of populations and communities, and the geographical distribution of species (Clobert et al. 2001, 2012).
Organisms differ in their dispersal abilities, and these differences have an impact on their biology, such as on the distributional range of a species or gene flow between populations (Bohonak 1999). For example, organisms with lower dispersal abilities tend to have smaller distributional ranges and populations that are genetically more structured (Bohonak 1999; Dawson et al. 2014; DiBlasi et al. 2018).
Dispersal ability might also affect the opportunities for hybridization between species because the rates at which individuals encounter different species are likely to be higher in organisms with higher dispersal capabilities. Indeed, recent evidence supports this prediction by demonstrating that range expansion is associated with the extent of introgression (Currat et al. 2008; Nussberger et al. 2018). Similarly, dispersal differences explain more than 30% of the variation in the width of hybrid zones across animals (McEntee et al. 2018). However, overall there is still much to learn about the factors influencing hybridization (Randler 2006; Arnold 2015; Taylor and Larson 2019), and, in particular, the influence of dispersal ability on the rate of hybridization remains understudied.
Testing for the effect of dispersal on hybridization should ideally hold constant most factors other than dispersal. The ecological replicate system of wing and body lice (Insecta: Phthiraptera) of pigeons and doves (Aves: Columbidae) has proven to be an ideal system for comparing the impact of dispersal differences on other aspects of biology, such as population structure and codivergence (Clayton and Johnson 2003; Johnson and Clayton 2004; Clayton et al. 2015; DiBlasi et al. 2018; Sweet and Johnson 2018). Specifically, this is an excellent system in which to assess the effect of differences in dispersal capabilities on levels of introgression because both of these lineages of feather lice: 1) significantly differ in their dispersal ability (Harbison et al. 2008, 2009; Bartlow et al. 2016), 2) co-occur across the diversity of pigeons and doves, and 3) have the same basic life history and diet (Clayton et al. 2015; Sweet and Johnson 2015, 2018). Both wing and body lice disperse vertically between parents and offspring in the nest. However, wing lice can also attach to and hitchhike on hippoboscid flies to disperse “phoretically” between host individuals or host species (Harbison et al. 2008, 2009; Bartlow et al. 2016). Indeed, this hitch-hiking dispersal mechanism profoundly influences their degree of population structure and cophylogenetic history (Clayton and Johnson 2003; Sweet et al. 2017b; DiBlasi et al. 2018; Sweet and Johnson 2018). In addition, wing lice have a higher rate of host-switching (Clayton and Johnson 2003; Clayton et al. 2015; Sweet et al. 2017b) (i.e., successful colonization of new host species) and of straggling (Whiteman et al. 2004) (i.e., dispersal to new host species without reproduction on that new host).
To compare differences in the extent of introgression between wing and body lice, we used whole-genome data from 71 louse individuals belonging to five species of wing lice (Columbicola) and seven species of body lice (Physconelloides) that occur across the same suite of host species and have highly comparable patterns of diversification (Sweet and Johnson 2015, 2018). Specifically, both lineages within these two groups of lice that are the focus of this study originated on the common ancestor of Metriopelia doves (11.3-14.9 mya) and have a correlated pattern of codiversification within the same group of hosts (including a shared cospeciation event which occurred within the Metriopelia genus 5.2-7.4 mya; Sweet and Johnson 2015, 2018). We predicted that wing lice, which have higher dispersal abilities and thus higher odds of encountering individuals of a different louse species on the same host, should show more extensive evidence of introgression (Fig. 1).
Materials and Methods
Data
We analyzed Illumina whole genome sequence data (150 or 160bp paired-end reads) from 71 louse individuals belonging to five and seven taxa of Columbicola and Physconelloides, respectively (Supplementary Table S2) hosted by the monophyletic clade of small New World ground doves. This paper’s taxonomic classification of lice is based on Sweet and Johnson (2018) species delimitation analyses. In particular, they found most Columbicola OTUs matched known species, and some Physconelloides OTUs were yet to be formally described as species (and are named here following Sweet & Johnson, 2018). All raw sequence data used were available from previous studies (Boyd et al. 2017; Sweet et al. 2017a; Sweet and Johnson 2018) and represent all described New World ground-dove wing and body louse species (sampled from most host species in this group) including lice samples across multiple biogeographic areas within species of hosts (Sweet and Johnson 2018; Supplementary Table S2). Illumina genome sequence data pre-processing included several steps (Sweet and Johnson 2018). First, we discarded duplicate read pairs using the fastqSplitDups script (https://github.com/McIntyre-Lab/mcscript and https://github.com/McIntyre-Lab/mclib). We then eliminated the Illumina sequencing adapters with Fastx_clipper v0.014 from the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit). Also, we removed the first 5 nt from the 5’ ends of reads using Fastx_trimmer v0.014 and trimmed bases from the 3’ ends of reads until reaching a base with a phred score ≥28 (which is equivalent to a base call accuracy higher than 99.8 %) using Fastq_quality_trimmer v0.014. Finally, we removed any reads less than 75 nt and analyzed the cleaned libraries with Fastqc v0.11.5 to check for additional errors. We assembled nuclear loci in aTRAM following previous studies (Allen et al. 2015; Boyd et al. 2017; Sweet and Johnson 2018). In particular, we mapped modest coverage (25-60X), multiplexed genomic data to reference loci from a closely related taxon. For our reference set of nuclear loci for wing lice, we used 1,039 exons of Columbicola drowni (Boyd et al. 2017) (raw data: SRR3161922). This data set was assembled de novo (Allen et al. 2015) using orthologous protein-coding genes from the human body louse genome (Pediculus humanus humanus (Kirkness et al. 2010) as a set of target sequences. We mapped Columbicola reads to the C. drowni references using Bowtie2 (Langmead and Salzberg 2012). For body lice, we obtained nuclear data using the same pipeline and software parameters, except that we used 1,095 loci from Physconelloides emersoni as the reference for mapping. To generate the input ultrametric gene trees for Phylonet v3.6.8 (Than et al. 2008; Yu and Nakhleh 2015; Wen et al. 2018), we first aligned each nuclear locus in MAFFT (Katoh 2002) (--auto) and removed columns with only ambiguous sequences (“N”). Then, we estimated gene trees in RAxML v8.1.3 (Stamatakis 2006) with a GTR + G substitution model for each gene alignment. Finally, we made trees ultrametric using the nnls method in the force.ultrametric function within the “phytools” R package (Revell 2011).
Quantifying introgression
We used two different approaches to quantify differences in the extent of introgression (i.e., ancient plus recent) between the two louse genera. We employed methods suitable to detect introgression between species and between individuals from the same species (i.e., we did not employ methods aimed to detect differences at the population level, e.g., TreeMix; Pickrell and Pritchard 2012). First, we used sppIDer (Langdon et al. 2018) to quantify the genomic contributions of different louse species in an individual louse genome (Fig. 2). We built our reference for each genus using all the nuclear loci from a single individual per species. For the reference, we selected those individuals for which we assembled the highest number of genes for each genus. We estimated the extent of introgression as the sum of the mean coverages of reads mapped from all the species excluding the focal louse species, divided by the mean coverage of the focal louse species (Fig. 2). Note that these mean coverage values are calculated using only those reads that mapped with a mapping quality (MQ) > 3 (Li et al. 2009; Langdon et al. 2018; Figs. S1-S12). Plots of coverage across the genomes suggested that reads mapping to other species were not artificial mappings (e.g., high coverage mappings to short repetitive regions; Figs. S1-S12). In addition, we used SPAdes v3.12.0 (default parameters) to perform a de-novo assembly of the putatively introgressed reads detected by sppIDer).
Second, we quantified introgression at the species level, while accounting for incomplete lineage sorting (ILS) using a maximum pseudo-likelihood framework with PhyloNet 3.6.1 (Than et al. 2008; Yu and Nakhleh 2015; Wen et al. 2018). Reticulations in this method can be attributed to hybridization events. We trimmed the unrooted gene trees to the same individuals used as reference taxa in sppIDer, and performed eleven independent analyses with a differing maximum number of reticulation nodes (i.e., from zero to ten). We conducted ten runs per analysis. We then selected the optimal network for each genus based on AIC values.
Statistics and reproducibility
We compared the sppIDer results using generalized linear models (GLMs). We used a Gaussian distribution of errors and an identity link function. We performed one GLM for each simulation iteration using the glm function of the “stats” R package (R Core Team 2013). The extent of introgression for each louse genus was the dependent variable, the genus identity was the independent variable, and we accounted for the introgression differences between louse species including louse identity as a fixed factor. We confirmed assumptions underlying GLMs by testing the normality of regression residuals for normality against a Q-Q plot. We also considered the possibility that some of the reads mapping to other species were technical contaminations, i.e., due to index-swapping (Carlsen et al. 2012; Esling et al. 2015; Schnell et al. 2015; Sinha et al. 2017). Previous studies have found that the misassignment of reads generally ranges from 1 to 9% (Carlsen et al. 2012; Esling et al. 2015; Schnell et al. 2015; Sinha et al. 2017). Thus, to account for possible contaminants, we wrote a simulation in R that randomly subtracted 9% (Carlsen et al. 2012; Esling et al. 2015; Schnell et al. 2015; Sinha et al. 2017) from the mean coverage value of a particular sample (i.e., we subtracted a random proportion of the mean coverage value for each sample until reaching 9%). We ran 100 iterations of the simulation and ran a GLM for each iteration (Table S1). Finally, we used the χ2 test to compare the number of species in pairwise comparisons of each genus with the number of reticulations found in each optimal phylogenetic network. Because we had an a priori prediction that Physconelloides should exhibit less evidence of reticulation than Columbicola, we used a one-tailed test; however, we also report the results of the two-tailed test equivalent.
Results
Both approaches revealed highly concordant results: higher levels of introgression among species of wing lice compared to body lice. In particular, using a read-mapping based method, the genomic signature of introgression was significantly higher in wing louse species than in body louse species (GLM with the mean values of the simulations; F = 21.0705, df = 69, P = 2.367 x 10−5, R2=0.58; Fig. 3, Supplementary Table S1, Figs. S1-S12). The contigs assembled from reads mapping to the non-focal species were in the size range of the loci used as reference (mean max contig length= 1214 bp; mean contig length=292 bp; Table S3). Even though wing lice showed more evidence for introgression, one body louse individual (included in the GLMs) exhibited the highest level of introgression (Fig. 3, Figs. S1-S12). However, the other individual from the same taxon, inhabiting the same host species and collected in the same geographic region, did not show these elevated levels of introgression (Supplementary Table S2, Figs. S1-S12).
Secondly, in a phylogenetic network framework, the optimal networks of wing lice were more reticulated than those of body lice even though the number of taxa included in the networks was lower (seven reticulations in Columbicola vs. four in Physconelloides, Fig. 4). Accordingly, the number of reticulations given the number of potential combinations under a one-tailed test was significantly higher in Columbicola (One-sided: χ2= 3.8132; df=1; P= 0.03; CI=0.03–1), and the p-value was still near 0.05 with a two-tailed test (Two-sided: χ2= 3.8132; df=1; P= 0.05; CI=-0.01–0.66). Also, the specific lineages involved in the reticulations were generally congruent with signatures of introgression from the read-mapping based approach (Fig. S1-S12).
Discussion
Estimates of introgression in two groups of ectoparasites that differ in their dispersal abilities, wing and body lice of doves, indicate that the lineage with higher dispersal ability (wing lice) shows more evidence of introgression. This evidence from wing and body louse genomes is consistent with the hypothesis that dispersal differences might drive differences in the level of introgression in this system of parasites. Admittedly, there may be some unknown factor, other than dispersal, differing between these two groups of lice that causes the difference in the level of introgression, but prior work on these groups of parasites points to dispersal as a crucial factor underlying many of the ecological and evolutionary patterns in these parasites. Further research on other taxa is needed to confirm the generality of these findings. This work is among the first studies of introgression in a host-symbiont system (Detwiler and Criscione 2010). Notably, recent studies have found that straggling and host-switching are relatively common processes in host-symbiont systems (De Vienne et al. 2013; Bourguignon et al. 2018; Nylin et al. 2018; DoÑa et al. 2019). Our study suggests that in a straggling/host-switching scenario, hybridization can provide further genetic variation with important ecological and evolutionary consequences (e.g., facilitating adaptation to current hosts or facilitating the colonization of new ones) (Barton 2018). Indeed, we may have found a potential recent hybridization event (i.e., the Physconelloides individual showing the highest level of introgression), though this requires further study to rule out methodological issues (e.g., wet-lab contamination).
In this vein, a careful examination of the introgression history of these taxa (and symbionts as a whole) is needed to better understand the patterns of introgression that we found. Questions such as how much introgression can be expected or how the introgressed regions are retained in parasite/symbiont genomes across time, among many others, require further attention. For instance, in this study, the levels of introgression detected by the sppIDer analyses (i.e., the magnitude but not the comparative pattern) may be unrealistic. It may be that some fraction of the level of introgression detected by sppIDer may be due to ILS, and not introgression.
However, both louse genera are expected to have relatively similar rates of ILS (if any). It is also possible that taxon age and interspecific divergence might affect introgression rates. Nevertheless, Sweet and Johnson (2018) compared the degree of genetic divergence of two pairs of species of both genera that inhabits the same host species and share a cospeciation event. In this case of taxa of the same age, the pair of Columbicola species had lower interspecific genetic distances than Physconelloides. This could be as a result of mutation rate differences between the two genera. However, this could also be due to higher gene flow among host infrapopulations due to higher dispersal capabilities of wing lice. The same pattern can be found overall across the species studied here, i.e., on average, lower uncorrected interspecific genetic distances among Columbicola than among Physconelloides species, though the range of the interspecific distances does overlap (Table S4). Thus, if present, ILS could potentially be more prevalent in Columbicola species (Pamilo & Nei, 1988; Maddison, 1997). The PhyloNet analysis, however, does control for ILS, and showed highly congruent results. In addition, some individual gene trees exhibit signatures suggestive of introgression with highly similar sequences shared by some individuals of different species and much less likely to be a consequence of ILS (Figs. S13-S14). Overall, the species of Columbicola and Physconelloides are from the same group of hosts and thus are overall comparatively similar in levels of divergence, so it seems unlikely that these small differences are driving the results.
Another caveat is that sppIDer can detect introgression from species that are not included in the reference data. In those cases, the reads may map to the closest taxon available in the reference set, and thus could artificially increase the level of introgression from a given species; Langdon et al. 2018). Accordingly, the levels of introgression detected by sppIDer in certain species could be an aggregate of introgression events from more than one species. Indeed, our PhyloNet analysis supports this scenario, with several reticulations from ghost lineages and species (Fig. 4). However, in this system we have nearly complete sampling of host taxa and are missing few, if any, extant species making this concern less likely.
Data availability
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.
Supplementary Materials
Supplementary data are available at GitHub (https://jorge-dona.github.io/Comparing-rates-of-introgression-in-parasitic-feather-lice-with-differing-dispersal-capabilities/supplementary.html) and at Figshare (DOI: 10.6084/m9.figshare.9176204). Individual gene trees are available at Figshare (DOI: 10.6084/m9.figshare.9176204).
Acknowledgments
This study was supported by the US National Science Foundation (DEB-1239788, DEB-1342604, DEB-1926919, and DEB-1925487 to K.P.J). We declare that we have no conflict of interest.
Footnotes
Changes in the text to clarify it; new analyses on the length of the putatively introgressed reads included; Figure 4 revised; supplementary files updated
https://figshare.com/articles/Parasite_dispersal_influences_introgression_rate/9176204