Abstract
The effective population size (Ne) of an organism is expected to be proportional to the total number of individuals in a population. In parasites, we might expect the effective population size to be proportional to host population size and host body size, because both are expected to increase the number of parasite individuals. However, parasite populations are sometimes so extremely subdivided that high levels of inbreeding may distort these relationships. Here, we used whole-genome sequence data from dove parasites (71 feather louse species of the genus Columbicola) and phylogenetic comparative methods to study the relationship between parasite effective population size and host population size and body size. We found that parasite effective population size is largely explained by host body size but not host population size. These results suggest the subdivided nature of parasite populations, rather than the overall number of parasites, has a stronger influence on the effective population size of parasites.
Impact Summary Parasites, among Earth’s most diverse, threatened, and under-protected animals, play a central role in ecosystem function. The effective population size (Ne) of an organism has a profound impact on evolutionary processes, such as the relative contributions of selection and genetic drift to genomic change. Population size is also one of the most important parameters in conservation biology. For free-living organisms, it is expected that Ne is proportional to the total number of individuals in a population. However, for parasites, populations are sometimes so extremely subdivided that high levels of inbreeding may distort these relationships. In this study, we used whole-genome sequence data from dove parasites and phylogenetic comparative methods to investigate the relationship between parasite effective population size (Ne) and host population size and body size. Our results revealed a positive relationship between parasite effective population size (Ne) and host body size, but not host population size. These results suggest inbreeding may be a major factor in parasite infrapopulations, and have important implications for conservation.
Introduction
The effective population size (Ne) of an organism has a profound impact on evolutionary processes, such as the relative contributions of selection and genetic drift to genomic change (Wright, 1943; Waples, 2002; Charlesworth, 2009). For free-living organisms, it is expected that Ne is proportional to the total number of individuals in a population (Frankham, 1995; Waples, 2002). While population size estimates can often be readily obtained for free-living species, estimating the population size of parasites can be more challenging because this usually requires sampling from individual hosts (Criscione and Blouin 2005, Criscione et al., 2005; Poulin, 2007; Clayton et al., 2015; Strobel et al., 2019).
Typically, we might expect that the size of a parasite population is proportional to that of the host, because parasites rely on their hosts for survival and reproduction (Poulin, 2007; Barrett et al., 2008; Clayton et al., 2015). However, population subdivision can also influence measures of Ne for a species (Wright, 1943; Charlesworth et al., 2003). Theoretical expectations generally predict that subdivided populations have a higher overall Ne than non-subdivided ones (Charlesworth et al., 1997; Charlesworth et al., 2003; Charlesworth, 2009). On the other hand, in highly divided populations, levels of inbreeding can increase, such that the Ne of these subdivided populations is low (Charlesworth et al., 1997; Charlesworth et al., 2003; Charlesworth, 2009).
In the case of parasites, populations can sometimes be so extremely subdivided that each individual host harbors a distinct parasite population, termed infrapopulation (Bush et al., 1997; Huyse et al., 2005; Criscione and Blouin 2005, Criscione et al., 2005; Poulin, 2007; Clayton et al., 2015). This subdivision is particularly pronounced in the case of parasites that spend their entire lifecycle on the host (i.e., permanent parasites, DiBlasi et al., 2018; Sweet and Johnson, 2018; Virrueta-Herrera et al., 2022). For example, lice, which are permanent parasitic insects of birds and mammals, have highly structured infrapopulations subject to high levels of inbreeding (Virrueta-Herrera et al., 2022). We might expect that infrapopulation size influences the effective population size of a parasite. This effective population size, in turn, could influence the amount of genetic variation within an infrapopulation, as has been shown for feather mites (Doña et al., 2015).
Host body size has been shown to strongly impact infrapopulation size, with larger-bodied hosts harboring larger parasite infrapopulations (Poulin, 1999; Poulin, 2007; Clayton et al., 2015). For example, a positive effect of host body size on parasite abundance has been shown for avian feather lice, which feed on the feathers of their hosts (Rozsa, 1997; Clayton and Walther, 2001). Thus, we would expect that feather lice on larger-bodied avian hosts would have higher Ne, reflecting their larger infrapopulation sizes. This relationship, in turn, should affect the degree of inbreeding and be evident in the genetic variation in lice of different infrapopulation sizes.
Thus, two factors may influence a parasite’s Ne: 1) host population size and 2) host body size. We test the relative contributions of these two factors to parasite Ne by examining genome-wide variation in the wing lice (Phthiraptera: Columbicola) of pigeons and doves (Aves: Columbidae). Pigeons and doves vary dramatically in overall population sizes, with some species being among the most abundant birds on earth and others restricted to single small islands and highly endangered. In addition, pigeons and doves vary by over an order of magnitude in body mass, and smaller-bodied species have been shown to have smaller infrapopulations of these lice (Rozsa, 1997). Thus, pigeons and doves and their lice are an excellent system in which to examine the correlation between both host population size and host body size and parasite Ne. We used genome sequencing of 71 species of Columbicola to estimate a phylogeny for these parasites and examine the relationship between a genome-wide measure of effective population size (θ) and the overall population size and body size of their respective hosts, accounting for phylogeny.
Materials and Methods
Taxon sampling and host data
We sampled 89 individual lice, representing 71 different species of Columbicola (Table S1), which are feather lice (Insecta: Ischnocera) of pigeons and doves. We also included five feather louse outgroup taxa for the phylogenomic analyses, selected based on recent higher level phylogenomic studies of feather lice (Table S1). We obtained host body size (body mass) information from the Birds of the World online database (Billerman et al., 2022). Specifically, in cases where measures from both males and females were reported independently, we used the average between male and female body mass. We obtained global-scale host population size data from recent estimates (Callaghan et al., 2021). In particular, we used the “Abundance estimate” data from the “Dataset_S01.xlsx” supplemental file.
Genomic sequencing
Some of the genomic data we analyzed here have been previously published (Boyd et al., 2017, see Table S1 for details). For the newly sequenced samples, which had been stored in 95% ethanol at −80 °C, we performed single-louse DNA extractions and photographed each specimen as a voucher. We extracted total genomic DNA by first letting the ethanol evaporate and then grinding the louse with a plastic pestle in a 1.5 ml tube. For DNA extraction, we used a Qiagen QIAamp DNA Micro Kit (Qiagen, Valencia, CA, USA) and conducted an initial incubation at 55 °C in buffer ATL with proteinase K for 48 h. Otherwise, we followed the manufacturer’s protocols and eluted purified DNA off the filter in a final volume of 50ul buffer AE. We used a Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA) and the high sensitivity kit to quantify total DNA.
We prepared genomic libraries using the Hyper library construction kit (Kapa Biosystems). We then sequenced these libraries to generate 150 bp paired-end reads using Illumina NovaSeq 6000 with S4 reagents. Libraries were tagged with unique dual-end adaptors and multiplexed 48 libraries per lane, intending to achieve approximately 30-60X coverage of the nuclear genome. We trimmed adapters and demultiplexed the sequencing data using bcl2fastq v.2.20 to generate final fastq files. We deposited raw reads for each library in NCBI SRA (Table S1).
Single-copy orthologs assembly, species delimitation, phylogenomic and cophylogenetic analyses
Ortholog assembly
We used fastp v0.20.1 (Chen et al., 2018) to perform adaptor and quality trimming (phred quality >= 30). We then converted trimmed fastq files to aTRAM 2.0 blast databases using the atram_preprocessor.py command of aTRAM v2.3.4 (Allen et al., 2018). We used an amino acid sequence reference set of 2395 single-copy ortholog protein-coding genes (Johnson et al., 2018) from the human louse, Pediculus humanus (Kirkness et al., 2010). We assembled the single-copy ortholog genes using the atram.py command and the ABySS assembler with the following parameters (iterations = 3, max-target-seqs = 3000). The Exonerate pipeline in aTRAM (atram_stitcher.py command) was used to stitch together exon sequences from these protein-coding genes (Slater and Birney, 2005).
COI-based species delimitation analyses
Several prior studies have indicated the potential for cryptic species within species of Columbicola (Johnson et al., 2007; Malenke et al., 2009; Sweet and Johnson, 2018), and we wanted to account for this in our comparative analyses. For assembly of the mitochondrial COI gene, we subsampled four million reads (two million read1 and two million read2) from each library using Seqtk v1.3 (Li, 2022). As the reference target for constructing COI sequences from all samples in our current work, we used a COI sequence from Columbicola columbae that had previously been published (Johnson et al., 2007). For these assemblies, we ran aTRAM for only a single iteration. Then, we translated COI DNA sequences to amino acids, aligned them, and back-translated them to DNA sequences. As a quality control procedure, we blasted COI sequences against NCBI to identify any identical or nearly identical to previously generated Sanger sequences. We estimated a phylogenetic tree based on these COI sequences under maximum likelihood using model parameters estimated by IQ-TREE 2 v.2.1.235 (Minh et al., 2020). We estimated ultrafast bootstrap support values with UFBoot2 (Hoang et al., 2017). Finally, we also computed the percent pairwise sequence divergences among all the COI sequences (using the R function dist.dna, model “raw,” pairwise.deletion = T from APE v5.5, Paradis and Schliep, 2018) and looked at their distribution to identify likely cryptic species, which indicated a 5% uncorrected p-distance threshold would be appropriate, as in prior studies of lice (Johnson et al., 2021).
Phylogenomic analyses
We translated assembled single-copy-ortholog nucleotide sequences to amino acids and aligned them using MAFFT v.7.47133 (Katoh and Standley, 2013). After back-translation to nucleotide sequences, we used trimAL v.1.4.rev2234 (with a 40% gap threshold) (Capella-Gutiérrez et al., 2009) to trim individual gene alignments. We discarded any gene present in less than four taxa. We then concatenated gene alignments into a supermatrix and analyzed it under maximum likelihood using IQ-TREE 2 in a partitioned analysis that included model selection for each partition. Support was estimated using ultrafast bootstrapping (Hoang et al., 2017). We also ran a coalescent analysis using ASTRAL-III (Zhang et al., 2018) on individual gene trees estimated by maximum likelihood in IQ-TREE 2. As a measure of branch support, we computed local posterior probability for each branch in ASTRAL-III. Both trees were almost identical; therefore, we only used the partitioned concatenated tree for dating and phylogenetic comparative analyses.
Cophylogenetic analyses
We used eMPRess v1.0 (Santichaivekin et al., 2020) to compare host and parasite trees. As in prior cophylogenetic studies, we used costs of duplication: 1, sorting: 1, and host-switching: 2. This is the cost scheme used by most published cophylogenetic studies of lice, as well as other groups of ectosymbionts (Doña et al., 2017; Matthews et al., 2018; Sweet and Johnson, 2018; Johnson et al., 2021, 2022; Boyd et al., 2022). For the host tree, we obtained phylogenetic information from a prior phylogenomic study (Boyd et al., 2022). As there was no phylogenetic information for fourteen of the focal species in this tree, we obtained the placement of these species from additional phylogenetic studies (Johnson and Weckstein, 2011; Sweet et al., 2017; Nowak et al., 2019). We used the phylogeny derived from the partitioned analysis (above) for the parasite tree. Based on the distribution of the MPR distances histogram, we summarized the MPR space into one cluster and drew a representative median MPR. From this reconstruction we identified terminal cospeciation events between sister pairs of doves and lice to use in the molecular dating analysis (below).
Dating analysis
We produced an ultrametric tree using the least square dating (LSD2) method implemented in IQ-TREE (To et al., 2016). Because there are no currently known fossilized lice within Ischnocera, we used terminal cospeciation events between sister pairs of doves and lice (above) as calibration points for molecular dating (Johnson et al., 2021, 2022). Specific cospeciation events that were used as calibration points can be found at Table S2 (see Supplemental information). For this analysis, we set a root age of 52 mya (based on de Moya et al., 2019) and a minimum branch length constraint (u = 0.01) to avoid collapsing short but informative branches without introducing bias to the time estimates (see https://github.com/tothuhien/lsd2).
SNP calling and mlRho analyses
We used the Columbicola columbae chromosome-level genome assembly (Baldwin-Brown et al., 2021) as the reference for the SNP calling analyses. We aligned trimmed and filtered reads to the C. columbae reference genome using bwa v0.7.17 (Li and Durbin, 2009). We then removed PCR duplicates with picard v2.26.10 (Broad Institute, 2022) and sorted and indexed bam files with samtools v1.14 (Danecek et al., 2021). We called SNPs using bcftools multiallelic caller (Danecek and McCarthy, 2017). Lastly, we used vcftools to filter the vcf file with the following filtering parameters: <40% missing data, site Phred quality score >30, a minimum genotype depth of 10X, a maximum genotype depth of 60X, a minimum mean site depth of 10X and a maximum mean site depth of 60X. A total of 177,895 SNPs remained after filtering.
We used mlRho v2.9 (Haubold et al., 2010) to calculate the sample-specific mean theta (θ), which is defined as the population mutation rate, or θ = 4Neμ, and which can be used as an indicator of effective population size (Lynch, 2008; Meyer et al., 2012; Virrueta-Herrera et al., 2022) because it is directly proportional to Ne. For this analysis, we converted bam files from bwa to profile (.pro) files for each individual louse and then ran mlRho with maximum distance (M) = 0.
Phylogenetic comparative methods
We used phylogenetic generalized least squares (PGLS) models, gls function from nlme v3.1-149 R package (Pinheiro et al., 2020), to examine associations between θ (a measure of parasite Ne) and host population size and host body size. We evaluated various phylogenetic correlation structures in our weighted regressions (corPagel, corBrownian) and used AIC model comparisons to identify the best fitting correlation structure for the models. We checked models via visual inspection of diagnostic plots (residuals vs. fitted values and QQ plots to check normality).
Results
We found a strong positive relationship between θ, a metric directly proportional to Ne, and host body size (PGLS, Brownian: R2pred = 0.44, p < 0.001; Pagel’s λ: R2pred = 0.48, p < 0.001; Fig. 1). In contrast, there was no significant relationship between θ and host population size (PGLS, Brownian & Pagel’s λ: p > 0.05). Including host population size in the best model led to a small improvement in the overall model fit (PGLS, Pagel’s λ including host population size, R2pred = 0.52), but the host population size term remained non-significant (PGLS, Pagel’s λ model including host population size, p > 0.05).
Discussion
For parasites such as lice, hosts represent their habitat (Clayton et al., 2015). Host body size largely explains parasite infrapopulation size (Rozsa, 1997; Clayton and Walther, 2001). Genome scale data for parasitic lice of pigeons and doves revealed that metrics (θ), associated with effective population size (Ne), are also highly correlated with host body size. In contrast, there was little association between parasite effective population size and host population size. Thus, it appears that the smaller infrapopulation sizes on smaller-bodied hosts increase the amount of inbreeding to such a degree that Ne is also reduced on smaller bodied hosts, eliminating any effects of overall parasite population size.
Several studies have indicated that louse infrapopulations on single host individuals are highly inbred, showing strong evidence of genetic structure even between host individuals in close proximity (Ascunce et al., 2013; DiBlasi et al., 2018; Virrueta-Herrera et al., 2022). This inbreeding would reduce the effective population size on single host individuals. However, theoretical models predict that population structure should increase overall effective population size (Charlesworth et al., 1997; Charlesworth et al., 2003; Charlesworth, 2009), at least to some extent, because alternative alleles can go to fixation in different infrapopulations increasing the overall standing genetic diversity of the global population. Counter to this expectation, we find that the estimator of Ne is lower for parasites on small-bodied doves that are expected to host smaller infrapopulations with higher levels of inbreeding.
One factor facilitating the effect of host body size on Ne may be the low migration rates of permanent parasites. A moderate migration rate among parasite infrapopulations is expected to increase Ne. However, permanent parasites, such as lice, have minimal dispersal capabilities and thus migration rates are expected to be very low. While host population size has been previously identified as a potential driver of parasite population dynamics (Doña and Johnson, 2020), the lack of relationship between parasite Ne and host population size might be indicative of these very low migration rates. In this case, Ne would be mainly influenced by the inbreeding of infrapopulations and not by the overall size of the total parasite population, because low migration prevents the overall population from approaching panmixis.
Selection is also known to influence effective population size (Charlesworth et al., 1997; Charlesworth et al., 2003; Charlesworth, 2009). For loci under selection, the realized effective population size is lower than those whose frequency is only affected by drift (Charlesworth et al., 1997; Charlesworth et al., 2003; Charlesworth, 2009). Louse species with smaller infrapopulation sizes, and higher inbreeding, might suffer more from inbreeding depression. This would be a genome wide negative selection, which would be predicted to lower overall effective population size (Hedrick and García-Dorado, 2016). It is unknown if lice suffer from inbreeding depression, given that they normally experience high levels of inbreeding, but would be a topic of interest for future investigation.
Another factor to consider is that smaller-bodied host species also typically have a lower parasite prevalence (i.e., proportion of host individuals that are inhabited by the parasite) (Bush et al., 1997). This pattern might be due to smaller infrapopulations being more susceptible to local extinction because of environmental and demographic stochasticity, a known factor shaping Ne (Charlesworth, 2009; Doña and Johnson, 2020). Therefore, host body size could influence local extinction probability of parasites and thus play a role in determining Ne of permanent parasites (Farrell et al., 2021). Given the lower prevalence and intensity of lice on small-bodied hosts, it may be that the total number of lice in the global population is actually smaller than those found on large-bodied hosts. While it might be expected that small-bodied doves have generally larger population sizes, because of the general inverse relationship between body size and population size of most organisms (White et al., 2007), we found no such relationship in our dataset (R2 = 0.003, p > 0.1). This finding agrees with previous results on other birds (Nee et al., 1991). Thus, while further research on global population estimates of louse species would help understand these relationships, our results suggest that at lower taxonomic levels, host body size and not host population size is the most explanatory factor of parasite Ne.
Considerations of effective population size also have implications for conservation. Parasites are among the earth’s most diverse, threatened, and under-protected animals (Carlson et al., 2017). Under the global parasite conservation plan, risk assessment, along with applying conservation genomics to parasites, were identified as two of the major goals for parasite conservation over the next decade (Carlson et al., 2020). Our result that host body size, but not host population size, is a good predictor of parasite Ne can easily translate into parasite conservation practices, drawing attention to conservation of smaller bodied hosts as a practice to conserve parasites.
Overall, our study shows that host body size plays a major role in shaping parasite population genomics and provides evidence for the essential role that individual hosts play as habitat for permanent parasites with very limited transmission abilities.
Author Contributions
J.D. designed the study, conducted the analyses, prepared figures, wrote the manuscript draft, and edited the manuscript. K.P.J. designed the study, obtained funding, wrote the manuscript draft, and edited the manuscript.
Data accessibility
Intermediary files generated in this study have been deposited in Figshare (reserved DOI: 10.6084/m9.figshare.21269640; private link for review: https://figshare.com/s/2f2de5dc909155da815a).
Acknowledgments
We thank J.M. Bates, B. Benz, S.E. Bush, D.H. Clayton, T. Chesser, R. Faucett, R. Moyle, V.Q. Piacentini, F. Sheldon, A.D. Sweet, J.D. Weckstein, and B. Zonfrillo for assistance in obtaining specimens. We thank B.M. Boyd and S. Virrueta Herrera for assistance with gDNA extractions. A. Hernandez and C. Wright at the University of Illinois Roy J. Carver Biotechnology Center provided assistance with Illumina sequencing. We thank K.K.O. Walden for assistance with submitting reads to NCBI SRA. Funding was provided by US NSF DEB-1342604, DEB-1925487 and DEB-1926919 grant awards to K.P.J., and European Commission grant H2020-MSCA-IF-2019 (INTROSYM:886532) to J.D.
Footnotes
Competing Interest Statement: None.
Discussion has been expanded, and some other minor changes.