Abstract
Powassan virus is an emerging tick-borne virus of concern for public health, but very little is known about its transmission patterns and ecology. Here, we expanded the genomic dataset by sequencing 279 Powassan viruses isolated from Ixodes scapularis ticks from the northeastern United States. Our phylogenetic and phylogeographic reconstructions revealed that Powassan virus lineage II was likely introduced or emerged from a relict population in the Northeast between 1940-1975. Sequences strongly clustered by sampling location, suggesting a highly focal geographical distribution. Our analyses further indicated that Powassan virus lineage II emerged in the northeastern U.S. mostly following a south to north pattern, with a weighted lineage dispersal velocity of ~3 km/year. Since the emergence in the Northeast, we found an overall increase in the effective population size of Powassan virus lineage II, but with growth stagnating during recent years. The cascading effect of population expansion of white-tailed deer and I. scapularis populations likely facilitated the emergence of Powassan virus in the northeastern U.S.
Main
Reports of tick-borne diseases in the United States have been steadily rising, with more than 50,000 cases in 20191. In that same year, a record number of 43 human cases of infection with an emerging tick-borne pathogen, Powassan virus (Flaviviridae: Flavivirus), were reported2–5. Powassan virus infection can cause severe neuroinvasive disease with long-lasting sequelae and high fatality rates in humans. Since its initial identification in 19586, incidence rates of Powassan neuroinvasive disease in humans have dramatically risen in the United States, particularly during recent years in the Northeast7. As Powassan virus infection is difficult to clinically diagnose8 and most infections are asymptomatic3, the reported cases are likely a vast underestimation of the true burden. The increasing number of human infections coupled with the lack of an effective vaccine or medicines, highlights the need to better understand local virus transmission patterns to guide targeted prevention and control measures.
Emergence of tick-borne diseases such as Lyme borreliosis, anaplasmosis, and babesiosis are associated with the spread of Ixodes scapularis ticks following expansion of suitable habitats9 and the reintroduction of their primary adult stage hosts, white-tailed deer (Odocoileus virginianus), into the northeastern United States10,11. Powassan virus consists of two genetically distinct lineages, of which lineage II, also thought to be primarily maintained by I. scapularis ticks and small mammals12,13 followed a similar path3–5. However, very little is known about its ecology and transmission patterns. Genetic approaches, including phylogenetic and phylogeographic inference, are powerful tools to understand patterns of pathogen transmission and spread. However, until recently there were only 23 near-complete Powassan virus genomes available from the United States. Thus, to inform future control efforts and mitigate public health risks, there is a critical need for new and innovative phylogeographic approaches to uncover how Powassan virus is being maintained in the Northeast and which factors facilitate or prevent its spread to other areas.
To investigate, we partnered with public health laboratories throughout the Northeast to sequence Powassan virus isolated from I. scapularis ticks in Connecticut, New York, and Maine. We sequenced samples collected in 2008-2019 from both historic endemic sites and from novel regions on the leading edge of expansion in the Northeast. With this expanded genomic dataset, we performed discrete and continuous phylogeographic analyses to answer important questions on the emergence and spread of Powassan virus lineage II, such as: (1) When did Powassan virus lineage II emerge in the Northeast? (2) How is Powassan virus lineage II locally maintained? (3) What are the patterns and velocity of spread? (4) Can increased transmission explain the recent increase in reported human cases? (5) What is the impact of environmental factors on the dispersal dynamics? Overall, our study provides important insights in the emergence and spread of Powassan virus in the northeastern United States. Our findings help to better identify potential high-risk areas for exposure, which will in turn help to direct future control efforts.
Results
Powassan virus phylogeny
Powassan virus consists of two genetically and ecologically distinct lineages (lineage I and II; Fig. 1). Prior to this study, there were 23 near-complete Powassan virus genome sequences publicly available from the United States. In this study, we sequenced an additional 279 Powassan viruses (2 belonging to lineage I, and 277 belonging to lineage II) from positive tick pools identified by public health laboratories in Connecticut, New York, and Maine from 2008-2019 (Supplementary Tables 1–3). We created Nextstrain pages to visualize the Powassan genomic data with builds for all available genomes14, and a more specific build for genomes available from the northeastern U.S.15. Publicly available lineage I sequences were available from Russia, Canada (initial case in North America), and the United States, but Powassan virus lineage I had not been reported from the United States since the late 1970s15,17. As part of this study, we sequenced two lineage I genomes detected in I. scapularis ticks in 2019 from New York. Later, we sequenced two additional lineage I genomes detected in I. cookei (2020) and Dermacentor variabilis (2021) also from New York18. Identification of Powassan virus lineage I in various tick species highlights how increased virus genomic surveillance can help to expand our knowledge of virus ecology.
Almost all of the more recent Powassan viruses detected from the US, including what we sequenced for this study, belong to lineage II. Powassan virus lineage II, also referred to as “deer tick virus”, consists of two geographically separated clades comprising viruses from the Midwest and the Northeast (Fig. 1). Powassan virus lineage II is the most prevalent in the Northeast, and we first carefully assessed the presence of temporal signal in these data. While the determination coefficient of the root-to-tip regression performed with TempEst19 is relatively small (R2 = 0.23; Fig. 1), we find very strong evidence in favor of temporal signal in the data set using a recently developed Bayesian method20 (log Bayes factor = 41.821), enabling the use of molecular clocks to estimate time-calibrated phylogenies. As part of this analysis, we find that an uncorrelated relaxed clock with an underlying lognormal distribution provides a better model fit to the data compared to a strict clock model (Table 1). We estimate that the evolutionary rate of this clade is 8.25×10-5 substitutions/site/year (95% highest posterior density [HPD] interval: 8.23-10.45×10-5]). Our estimate is higher than previous estimated evolutionary rates for all Powassan viruses (3.3×10-5)22, and than previous estimates based on envelope (2.2×10-4)5 and NS5 coding sequences (3.9×10-5-5.4×10-5)5,23, likely a reflection of the recent emergence of lineage II in the region. Our work increased the number of publicly available Powassan virus lineage II sequences by more than ten-fold, enabling us to better understand the patterns of emergence and spread in the northeastern United States.
(a) Root-to-tip regression performed to assess the temporal signal within the Northeast clade (determination coefficient R2 from the linear regression = 0.23). (b) Maximum likelihood tree was obtained from the phylogenetic analysis of publicly available Powassan virus genomes from the United States, Canada, and Russia. Powassan virus lineage II consists of two geographically separated clades in the Northeast and Midwest. Bootstraps support values (based on 1,000 replicates) are provided for the main internal nodes of the tree.
Emergence in the Northeast
The emergence of Powassan virus lineage II into the Northeast US likely followed the re-emergence of I. scapularis ticks, its primary vector, into the region. Ixodes scapularis originally colonized the Northeast thousands of years ago24,25; however, deforestation and restriction of white-tailed deer populations (primary reproductive host for adult I. scapularis ticks) during the 1800s greatly reduced I. scapularis populations in the Northeast10,26. Reforestation and increasing white-tailed deer populations during the mid-1900s then led to a re-emergence of I. scapularis10,27,28.
Our expanded Powassan virus genomic data enabled us to reconstruct the dispersal history of Powassan virus lineage II in the Northeastern US (Fig. 2a). Our discrete phylogeographic analysis estimates that the time to the most recent common ancestor (tMRCA) for the Northeast Powassan virus lineage II clade is between 1940.3-1974.7 (95% HPD interval; mean 1957.9). This means that lineage II emerged in the Northeast by at least this time period, corresponding to the re-emergence of I. scapularis ticks.
(a) Maximum clade credibility (MCC) tree with branches colored according to the locations inferred at the ancestral nodes. Tip nodes are colored according to their sampling location and are larger than internal nodes. (b) Sampling map and well-supported Markov jumps inferred by discrete phylogeographic inference. Sampling locations are displayed by dots with the size being proportional to the number of POWV genomic sequences sampled and included in our analyses. We only report Markov jumps associated with an adjusted Bayes factor support higher than 3, which corresponds to positive support according to the scale of interpretation as previously defined21.
Distinct transmission foci
Like tick-borne encephalitis virus in Europe29,30, Powassan virus is hypothesized to be primarily maintained within strict foci31,32, meaning that the virus does not routinely migrate between locations. To test this hypothesis, we examined our discrete phylogeographic reconstruction of Powassan virus lineage II in the Northeast (Fig. 2). We found that sequences strongly cluster by location (Fig. 2a) with relatively few transition events over the past 20 years (Fig. 2b).
We further explored the spatial structure using a continuous phylogeographic approach (Fig. 3). These findings again highlight the highly focal distribution, with rare long-distance dispersal events leading to the establishment of new foci (Fig. 3a-d). These transmission foci are particularly clear in Connecticut, where, besides a single dispersal event between locations 1 (Westport) and 2 (Redding), we do not observe mixing between the 5 distinct locations (Fig. 3d, inset). For example, we estimate that the lineage II viruses sequenced in location 2 (Redding) have been separated from location 3 (Bridgeport) for ~33 years (95% HPD interval: 24-41), despite being less than 20 km apart.
(a-d) Reconstruction of the dispersal history of POWV lineages inferred by a spatially-explicit phylogeographic analysis. We mapped branches of the maximum clade credibility (MCC) tree reported in Figure 2 and whose nodes, as well as associated 80% highest posterior density (HPD) regions, are colored according to their time of occurrence. (e) skygrid reconstruction of the evolution of the overall effective size of the viral population (Ne). (f) Evolution of the number of confirmed human cases. (g) Estimation of the white-tailed deer (Odocoileus virginianus) populations in the states of New York and Connecticut. (h) Environmental factors included in landscape phylogeographic analyses to test their impact on Powassan virus dispersal.
We found the same pattern of isolated transmission foci with limited mixing for New York and Maine, albeit with more short distance dispersal events, particularly in New York (Fig. 2b, Fig. 3d). These differences may be explained due to differences in sampling (e.g. Powassan virus sequenced from ticks in New York included adults collected from deer), environmental barriers to spread (e.g. separation of Connecticut locations by rivers), or other ecological factors.
Overall, our data suggest that after the initial emergence of Powassan virus lineage II in the Northeast, migration between nearby and long-distance locations was relatively rare. This supports the hypothesis that Powassan virus is primarily maintained in highly localized transmission foci.
Dispersal history
While we have shown that Powassan virus lineage II in the Northeast is primarily restricted to strict foci, we wanted to better understand the patterns and velocity of spread. Our spatially explicit continuous phylogeographic analysis indicates that Powassan virus lineage II emerged in the Northeastern US mostly following a south to north pattern (Fig. 3a-d). We estimate that the virus first became established in southern New York and Connecticut by the late 1950s (1940.3-1974.7; Fig. 3a). This was followed by a few long-distance dispersal events to more northern regions, perhaps by infected ticks feeding on birds which can migrate over longer distances. We estimate that the virus finally became established in Maine by 1991 (95% HPD = [1958-2011]) through multiple introductions. Our estimates of the relatively recent dispersion of Powassan virus lineage II in the northern part of the Northeast suggest that the virus is likely still emerging in parts of North America, following the northward expansion of I. scapularis33.
We then used our continuous phylogeographic results to estimate the dispersal velocity of Powassan virus lineage II through the Northeastern US. We estimated a weighted lineage dispersal velocity of ~3 km/year (95% HPD = [2.5-3.8]), which corresponds to a relatively slow dispersal capacity when compared to the estimates of the same metric obtained from the continuous phylogeographic analysis of other viruses (Supplementary Table 4). For instance, Powassan lineage II dispersed faster than the rodent-born Lassa virus in western Africa (~1 km/year)34, but considerably slower than the mosquito-borne West Nile virus when it invaded North America (~165 km/year)35. Our dispersal velocity estimates can help us to track the current and future spread of the virus.
The table comes from the study of Klitting et al. (2021)75 and has been completed with the estimate obtained from the continuous phylogeographic reconstruction of POWV lineages in Northeastern USA. For each data set, we report both the posterior median estimate and the 95% HPD interval.
Population size
We next investigated if Powassan virus transmission has increased since its emergence in the Northeast, which could be a cause of the recent increase in reported human cases. We approached this by estimating the virus effective population size, which is influenced by changes in transmission intensity leading to the birth or death of virus lineages. Since the emergence in the Northeast, we found an overall increase in the effective population size of Powassan virus lineage II, but with growth stagnating since ~2005 (Fig. 3e). The latter could be a reflection of the virus becoming established across all of our study sites whereas the effective population size may still be increasing if we included additional sites in new emergence zones. Still, our data suggest that the increase in reported human cases in the region since 2010 (Fig. 3f) does not coincide with an increased virus effective population size (Fig. 3e). Thus, our findings do not support the hypothesis that the recent uptick in human cases is due to a significant increase in Powassan virus transmission; rather, it may be caused by an increase of human exposure to infected ticks.
As hypothesized as a significant factor for the timing of Powassan virus lineage II emergence in the Northeast, our overall estimates for the virus effective population size follow a similar trend as the population expansion of white-tailed deer in Connecticut and New York (Fig. 3g). Reforestation in the Northeast has led to dramatic increases in the white-tailed deer populations, followed by population expansion of I. scapularis, which in turn has facilitated the emergence of tick-borne pathogens such as Borrelia burgdorferi and Babesia microti27. Our findings suggest that the cascading effect of population expansion of white-tailed deer and I. scapularis populations may also have facilitated the emergence of Powassan virus in the Northeast.
Impact of environmental factors on the dispersal dynamic
We exploited our spatially-explicit phylogeographic reconstruction to investigate the impact of environmental factors on the dispersal dynamic of Powassan lineage II. As detailed in the Methods section, we tested the association between a series of environmental factors (Fig. 3h) and the dispersal location36 as well as velocity37. Our analyses revealed that inferred Powassan virus lineage II tended to avoid circulating in areas associated with relatively higher elevation (Bayes factor [BF] > 20; Supplementary Table 5). However, outcomes of our analysis are strongly influenced by the sampling effort and pattern, and therefore we are able to describe environmental conditions related to dispersal locations of inferred viral lineages, but we cannot draw conclusions on the actual impact of those conditions on the dispersal36. This observation could be related to higher abundance of I. scapularis at lower altitudes. Next, we investigated the impact of environmental factors on the dispersal velocity of Powassan virus lineage II. Our analyses did not highlight any environmental factor associated with the heterogeneity of Powassan lineage dispersal velocity across the study area (Supplementary Table 6). This means that none of the tested factors increased the correlation between dispersal duration and geographic distance, the latter thus remaining the main resistance factor to dispersal.
We report Bayes factor (BF) support for the association between environmental values and tree node locations. The results are based on 1,000 posterior trees obtained by spatially-explicit phylogeographic inference. Following Kass & Raftery (1995), we consider a BF value >20 as strong support for a significant correlation between the environmental distances and dispersal durations (in bold). “ENM” refers to ecological niche modeling.
The results are based on 1,000 posterior trees obtained by spatially-explicit phylogeographic inference. “C” and “R” indicate if the considered environmental raster was considered as a conductance (“C”) or resistance factor (“R”), and k is the rescaling parameter used to transform the initial raster (see the text for further details). For regression coefficients and Q values we report both the median estimate and the 95% HPD interval. The Bayes factor (BF) supports are only reported when p(Q > 0) is at least 90%. Following Kass & Raftery (1995), we consider a BF value >20 as strong support for a significant correlation between the environmental distances and dispersal durations (in bold).
Discussion
Despite a rapid increase in the number of Powassan virus infections in humans over recent years, very little was known about the patterns of virus emergence and spread. By sequencing 279 Powassan virus genomes and using phylogenetic and phylogeographic approaches, we have uncovered the patterns of virus emergence, transmission, and spread in the northeastern U.S. Our analyses revealed that Powassan virus lineage II likely emerged in the Northeast around 1940-1975, following the population growth of white-tailed deer and expansion of I. scapularis tick populations10,27. Powassan virus lineage II is maintained in highly localized transmission foci, with few migration events between relatively nearby locations. Our continuous phylogeographic analysis revealed that Powassan virus lineage II likely emerged from southern Connecticut into more northern regions with a weighted lineage dispersal velocity of ~3 km/year. Although we found an overall upward trend in the virus effective population size over the last decades, the recent increase in reported human cases of Powassan virus infection does not coincide with a higher effective population size. This suggests that the recent uptick in human cases is likely not due to a significant increase in Powassan virus enzootic transmission, but it may rather be due to other factors such as increased human exposure to infected ticks as well as an increase in case recognition. Our findings provide important insights in the local emergence patterns and transmission dynamics of Powassan virus in the northeastern U.S. Insights into the highly localized transmission foci that sustain Powassan virus transmission across multiple years, will help to identify areas with high risk of spillover to the human population, which can be targeted for prevention education or control efforts.
Our reconstruction of the emergence of Powassan virus in the northeast follows similar patterns as the emergence reported for other tick-borne pathogens such as Borrelia, Babesia, and Anaplasma38. Similar to Powassan virus, these pathogens are maintained in transmission cycles involving I. scapularis as the main vector and white-footed mice (Peromyscus leucopus) as the main host. Introduction of these tick-borne pathogens in the northeastern United States follows the reforestation in the 20th century leading to rapid population expansions of both white-tailed deer and I. scapularis27,33. Despite the similarities in emergence and ecology, other tick-borne pathogens seem to have more widespread distributions as compared to the highly focal distribution of Powassan virus. Previous studies on Borrelia burgdorferi reported high genetic diversity within local populations39, lack of population genetic structure24, and no strict genetic clustering by location within the Northeast11. This suggests that despite the similarities in ecology, B. burgdorferi and Powassan virus are maintained by different mechanisms.
To explain the highly focal distribution, we have formulated two main hypotheses on how Powassan virus lineage II may be maintained in strict transmission foci (Fig. 4). Our first hypothesis is that vertical transmission from adult to the next larval stage plays a minor role in the Powassan virus transmission cycle. Adult I. scapularis ticks preferentially feed on larger mammals, such as white-tailed deer, and we would expect more mixing of Powassan virus clades if infected adults travel across larger distances when feeding on deer, particularly during early years of emergence when deer habitats were less fragmented. Inefficient vertical transmission from adult to larvae would explain why adults and ticks may move, while Powassan virus foci remain local. Previous studies have provided evidence for vertical transmission of Powassan virus (i.e. one of six infected females transmitted Powassan virus to its progeny)40, but it remains unclear what percentage of progeny within an egg batch becomes infected. Future studies can help to test this hypothesis by determining rates of vertical transmission in the laboratory, and determining infection rates of unfed larvae in the field.
Life cycle adapted from 38 and created with bioRender.com. Two hypotheses are proposed that may explain the focal distribution of Powassan virus.
Our second hypothesis is that backward transmission through life stages from nymphs to larvae is the main mechanism for Powassan virus maintenance (Fig. 4). Powassan virus clades better match the strong geographical structure of white-footed mice41,42 and other small mammals that share the same habitat rather than the weak structure of I. scapularis ticks25. This suggests that Powassan virus is mostly maintained by local transmission cycles that include larvae and nymphs feeding on small mammals, and adult ticks and other animals that can travel long distances are primarily dead-ends, although sporadic long-distance dispersal events may happen. Backward transmission from nymphs to larvae may either occur through direct co-feeding transmission in the absence of systemic infection of the rodent host, as has been reported for tick-borne encephalitis virus43,44, or through feeding on viremic hosts. To further investigate this hypothesis, we propose studies to determine the frequency of co-feeding between nymphs and larvae, to determine the size of transmission foci, and to perform comparative genetics between Powassan virus, I. scapularis, and small mammalian hosts, such as white-footed mice.
Our study has several limitations. First, sampling strategy was not standardized across the northeastern United States nor was it standardized across time. Powassan virus surveillance activities are organized at the state level, and therefore there are differences in effort between states. For instance, samples sequenced from Connecticut and Maine were collected from the same sites during multiple years, whereas a much larger number of varying sites were sampled from year to year in New York. Sampling in New York also included tick collections from hunter-harvested deer, whereas sampling in Connecticut and Maine was exclusively done via drag sampling. Spatial heterogeneity in sampling effort will impact phylogeographic reconstructions, with unsampled locations being excluded from the reconstructed dispersal history, and undersampling of locations may lead to underestimation of the degree of connectivity between sites. Second, we attempted to include as many publicly available genomes in our analyses, but some genomes had to be excluded because they did not have complete metadata (e.g. missing collection date and location) or because they did not fit the molecular clock (e.g. Powassan sequences from human infections). Lastly, choice of molecular clock model may impact estimates for the evolutionary rate and the tMRCA. For example, when performing the analyses with a strict molecular clock model instead of a relaxed clock model, we get a lower estimated evolutionary rate of 3.80×10-5 substitutions/site/year (95% HPD = 2.60-5.57×10-5) and an earlier tMRCA of 1882.8 (95% HPD = 1833.4-1936.9). This highlights the importance of molecular clock model choice based on best model fit and that we should be careful in our interpretation of the relaxed clock model estimates. Our estimates of emergence by the late 1950s should be interpreted as emergence by at least this time period, and not as an exact estimate of the emergence time.
Our study has important implications for our understanding of Powassan virus transmission dynamics and future control. Currently, no vaccines or specific treatments are available for Powassan virus infection, which leaves prevention of disease highly dependent on education and control. The identification of highly localized transmission foci provides both opportunities for better education of the general public about high risk areas and effective targeted control in Powassan virus hotspots. Eradication of Powassan virus in transmission foci that have been maintained for several years without introductions of new virus clades, will likely result in highly effective and long lasting control.
Methods
Sample collection
Tick collections, nucleic acid extraction, and Powassan virus screening were done at the Connecticut Agricultural Experiment Station (CAES), MaineHealth Institute for Research (MHIR), New York State Department of Health (NYSDOH), and Cornell University31,45–48. Briefly, the majority of ticks were collected by dragging a white cloth over the ground and low vegetation, and a smaller proportion were collected from vertebrate hosts (e.g. deer). All collected ticks were sorted by species, life stage, collection site and date before screening for pathogens. Individual or pooled ticks were homogenized and nucleic acid was extracted according to manufacturer’s instructions (Supplementary Table 1). Powassan virus was detected by RT-qPCR or nanochip assays31,45–48. Samples collected from 2008 through 2015 in Connecticut were passaged once on BHK-21 cells before nucleic acid extraction.
Untargeted metagenomic Illumina sequencing
We initially used an untargeted metagenomic approach to sequence Powassan virus isolates which were passaged on cells at CAES. Our protocol is adapted from49, and is openly available50. In brief, 10 μL of extracted nucleic acid was treated with DNase I (New England Biolabs, Ipswich, MA), followed by a clean up step using a ratio of 1.8:1 beads to sample. All clean up steps were done using MagBind TotalPure NGS magnetic beads (Omega Biotek, Norcross, GA) with automated protocols for the Kingfisher flex purification system (Thermo Fisher Scientific, Waltham, MA). First-strand cDNA was synthesized using SuperScript IV VILO (Thermo Fisher Scientific) and second-strand cDNA using Escherichia coli DNA ligase and polymerase (New England Biolabs), followed by a clean up step (1.8:1 beads to sample ratio). Libraries were prepared using the Nextera XT DNA library preparation kit for Illumina (Illumina, San Diego, CA), according to manufacturer’s instructions but using less than the recommended reagent volumes51. Individual and pooled libraries were quantified using the 1x dsDNA HS assay kit on the Qubit 4 (Thermo Fisher Scientific) and size distribution was determined using the high sensitivity DNA kit on the Bioanalyzer 2100 (Agilent, Santa Clara, CA). Pooled libraries were sequenced on the Illumina NovaSeq (paired-end 150) at the Yale Center for Genome Analysis. PCR duplicates were removed, reads were aligned to the reference genome using Bowtie2, and consensus genomes were called at a minimum frequency threshold of 0.75 and minimum coverage of 10X using Geneious Prime 2020.0.4.
Targeted amplicon-based sequencing
Although we were able to successfully sequence Powassan virus from cell culture-passage samples using untargeted metagenomics, we developed an amplicon-based sequencing approach to improve coverage when sequencing from tick homogenates. We used an adapted protocol52, using Nextera XT for library prep as developed for SARS-CoV-2 amplicon-based sequencing53. In brief, cDNA was synthesized from 10 μL of RNA using SuperScript IV VILO (Thermo Fisher Scientific). Two separate primer pools were prepared by mixing equal volumes of each primer with a concentration of 10 μM (Supplementary Table 2). The two primer pools were used to generate tiled amplicons using Q5 high-fidelity 2X master mix (New England Biolabs), followed by a clean up step and quantification using the Qubit. Amplicons were diluted to 1 ng/μL and combined for library prep as described above. Pooled libraries were quantified on Qubit and size distribution were determined on the bioanalyzer. Pooled libraries were sequenced on the Illumina NovaSeq (paired-end 150) at the Yale Center for Genome Analysis. Consensus genomes were generated at a minimum frequency threshold of 0.75 and minimum coverage of 10X using iVar version 1.2.3. All sequencing data is publicly available under BioProject PRJNA889421 and Supplementary Table 3.
Powassan virus phylogeny
We sequenced 279 Powassan virus genomes and estimated a maximum-likelihood tree using IQ-TREE version 1.6.12 with ultrafast bootstrap approximation (1,000 replicates)54 to determine phylogenetic relationships between publicly available and newly sequenced Lineage I and II genomes.
Temporal signal assessment
To evaluate if our Powassan virus data set contains sufficient temporal signal and would permit time-calibrated analyses using molecular clock models (and hence constitutes a measurably evolving population), we performed a Bayesian Evaluation of Temporal Signal (BETS) analysis20,55. This analysis involves assessing the model fit to the data of both a strict clock and an uncorrelated relaxed clock with an underlying lognormal distribution, both with and without the sampling dates associated with the genomes in our data set (Table 1). We employed generalized stepping-stone sampling56 to accurately estimate the log marginal likelihood of each of these four models. For each log marginal likelihood estimation, we ran an initial Markov chain of 500 million iterations, followed by 500 power posteriors that are explored during 1 million iterations, logging every 1000 iterations.
Discrete phylogeographic reconstruction
To investigate the dispersal history of POWV lineages in Northeastern America, we first conducted a discrete phylogeographic analysis using the Bayesian stochastic search variable selection (BSSVS) model57 implemented in BEAST 1.1058. For this analysis, we considered each U.S. county of origin as a distinct location, except for the Connecticut area where each sampling site was considered as a distinct discrete location. We modeled the branch-specific evolutionary rates according to a relaxed molecular clock with an underlying log-normal distribution59 and the nucleotide substitution process according to a GTR+Γ parameterisation60; and we specified a flexible skygrid model as the tree prior61. Three independent Markov chain Monte Carlo (MCMC) algorithms were run for 5×108 iterations and sampled every 105 iterations. Resulting posterior distributions were eventually combined after having discarded 10% of sampled trees in each of them. We used the program Tracer 1.762 for assessing the convergence and mixing properties, and that estimated sampling size (ESS) values associated with estimated parameters were all >200 after having combined the outputs of the three independent analyses. We then used the program TreeAnnotator 1.1058 to obtain a maximum clade credibility (MCC) tree. We reported Markov jumps between discrete locations as estimated by the BSSVS analyses and supported by an adjusted Bayes factor (BF) values >3, which correspond to at least “positive” statistical support following the scale of interpretation defined by Kass & Raftery21. The adjusted BF support takes into account the relative abundance of samples by location63 and is based on a tip labels swapping procedure similar to a tip date randomization that can be performed to test for temporal signal64.
Continuous phylogeographic reconstruction
To reconstruct the dispersal history of POWV lineages in a spatially-explicit context, we performed a continuous phylogeographic analysis using the relaxed random walk (RRW) diffusion model 65,66 implemented in the software package BEAST 1.1058, with a gamma distribution to model the among-branch heterogeneity in diffusion velocity. As for the discrete phylogeographic analysis, branch-specific evolutionary rates were modeled according to a relaxed molecular clock with an underlying log-normal distribution and the nucleotide substitution process according to a GTR+Γ parameterisation60; and we also specified a flexible skygrid model as the tree prior61. The Markov chain Monte-Carlo (MCMC) algorithm was run for 15×108 generations and sampled every 105 generations. We used the program Tracer 1.7 for assessing the convergence and mixing properties, and that estimated sampling size (ESS) values associated with estimated parameters were all >200, the program TreeAnnotator 1.10 to obtain a maximum clade credibility (MCC) tree, as well as the R package “seraphim” 67,68 to extract the spatiotemporal information embedded within posterior trees and to estimate the weighted lineage dispersal velocity, the latter being defined as follows:
where di and ti are the geographic distance traveled by the phylogeny branch and the duration of that branch, respectively.
Landscape phylogeographic analyses
Landscape phylogeographic analyses aim at exploiting phylogeographic reconstructions to unravel the impact of environmental factors on the dispersal history and dynamic of viral lineages69. Specifically, we implemented two previously introduced analytical procedures to investigate the impact of environmental factors on the dispersal location36 and velocity37 of POWV lineages. Both analytical procedures here rely on the comparison between inferred and randomized spatially-annotated trees, the latter sharing the time-scaled topology of the inferred trees but with phylogenetic branch positions that had been randomized across the study area. In practice, phylogenetic node positions were randomized within the study area, under the constraint that branch length (i.e. geographic distance connecting both branch nodes), branch duration, tree topology, and root position remained unchanged67. The purpose of these randomizations is thus to obtain spatially-annotated trees corresponding to the trees inferred by continuous phylogeography but along which we generated a new diffusion process that has been impacted by any environmental factor.
We first investigated whether POWV lineages tended to avoid or preferentially circulate within areas associated with particular environmental conditions. For this purpose, we extracted and subsequently averaged the environmental values at the tree node positions to obtain, for each environmental factor, a posterior distribution of mean environmental values at tree node positions. We then compared values obtained through inferred trees and their corresponding randomized trees using an approximated Bayes factor (BF) support70: BF = (pe/(1-pe))/(0.5/(1-0.5). To test if a particular environmental factor e tended to attract viral lineages, pe was defined as the frequency at which the environmental values from inferred trees were greater than values from randomized trees; and to test if a particular environmental factor e tended to repulse viral lineages, pe was defined as the frequency at which the environmental values from inferred trees were lower than values from randomized trees. We considered BF values > 20 and 3 < BF < 20 as strong and positive statistical supports, respectively21.
Second, we investigated to what extent POWV lineage dispersal velocity was impacted by environmental factors acting as conductance or resistance factors. For each branch in the inferred and randomized trees we calculated an “environmental distance” using two path models: the least-cost path71 and Circuitscape72 algorithms, the latter accommodating uncertainty in the travel route. An environmental distance is calculated first from the raster of the environmental variable, and second from a uniform “null” raster whose cell values are all set to “1”. The environmental distance is a spatial distance that is weighted according to the values of the underlying environmental raster, and therefore constitutes a proxy for geographical distance when computed on the null raster. Each environmental variable was considered twice: once as a potential conductance factor that facilitates movement, and once as a potential resistance factor that impedes it. For each environmental variable, we also generated and tested several distinct rasters by transforming the original raster cell values with the following formula: vt = 1 + k(vo/vmaì), where vt and vo are the transformed and original cell values, and vmax the maximum cell value recorded in the raster. The rescaling parameter k here allows the definition and testing of different strengths of raster cell conductance or resistance, relative to the conductance/resistance of a cell with a minimum value set to “1”, which corresponds to the “null” raster. For each environmental variable, we generated three distinct rasters using the following values for rescaling factor k: k = 10,100, and 1000. For these analyses, we estimated the statistic Q defined as the difference between the coefficient of determinations obtained (i) when branch durations are regressed against the environmental distances computed on an environmental and (ii) when branch durations are regressed against the environmental distances computed on the null raster. We estimated a Q statistic for each environmental raster and each of the 1,000 trees sampled from the posterior distribution. An environmental factor was only considered as potentially explanatory if both its distribution of regression coefficients and its associated distribution of Q values were positive73, i.e. with at least 90% of positive values. In this case, the statistical support associated with the resulting Q distribution was compared with the corresponding null of distribution of Q values obtained when computing environmental distances for phylogenetic branches of randomized trees. Similar to the procedure used for the investigation of the impact of environmental factors on the dispersal locations of POWV lineages, the comparisons between inferred and randomized distributions of Q values was formalized by approximating a Bayes factor support37.
Data availability
All data are included in this article, the supplementary files, and in BioProject PRJNA889421.
Funding
This publication was made possible by CTSA Grant Number UL1 TR001863 from the National Center for Advancing Translational Science (NCATS), a component of the National Institutes of Health (NIH) awarded to CBFV. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of NIH. GB acknowledges support from the Internal Funds KU Leuven under grant agreement C14/18/094 and the Research Foundation - Flanders (“Fonds voor Wetenschappelijk Onderzoek - Vlaanderen,” G0E1420N, G098321N).
Author contributions
CBFV, SD, NDG designed the study; DEB, APD, RMR, SCW,JFA, CBL, REL, MAP, LDK,JLG-K, LBG, RPS, PMA, and ATC collected data/samples; CBFV and JRF performed sequencing; CBFV, AFB, GB, SD, NDG analyzed the data; CBFV, SD, and NDG drafted the manuscript; all authors reviewed and approved the manuscript.
Competing interests
NDG is a consultant for Tempus Labs and the National Basketball Association for work related to COVID-19. All other authors declare no competing interests.
Acknowledgements
We would like to thank Anne Piantadosi, Erica Normandin, Pardis C. Sabeti, Rebekah McMinn, Greg Ebel, Heidi Goethert, Sam R. Telford III, Sebastian Lequime, Alexander A. Fisher, and Marc A. Suchard for their input in discussions on the methods and results of this study.