Abstract
Lassa fever is listed among the diseases that pose the greatest risks to public health by the World Health Organization. This severe viral hemorrhagic fever is caused by Lassa virus, a zoonotic pathogen that repeatedly spills over to humans from its rodent reservoirs. It is currently not known how climate change, transformations in land use, and human population growth could affect the endemic area of this virus, currently limited to parts of West Africa. By exploring the environmental data associated with virus occurrence, we show how temperature, precipitation and the presence of pastures determine ecological suitability for virus circulation. We project that regions in Central and East Africa will likely become suitable for Lassa virus over the next decades and estimate that the total population living in areas suitable for Lassa virus may grow from about 100 million to 700 million by 2070. By analysing geotagged viral genomes, we find that in the event of Lassa virus being introduced into a new suitable region, its spread might remain spatially limited over the first decades. Our results highlight how the endemic area of Lassa virus may expand well beyond West Africa in the next decades due to human impact on the environment, putting hundreds of million more people at risk of infection.
Introduction
Along with other viral infections that have gained prominence in recent years1–3, Lassa fever (Lassa) is listed by the World Health Organization (WHO) as one of the diseases that pose the greatest public health risk4,5. Lassa is a viral hemorrhagic fever with variable but generally high case fatality rates6 for which efficacious countermeasures are currently lacking7,8.
To date, Lassa cases have only been reported in West Africa with Guinea, Liberia, Nigeria and Sierra Leone containing most endemic hotspots9. Nigeria, in particular, has seen a significant increase in incidence in recent years, and confirmed more than a thousand cases in 202010. Increasingly, neighbouring countries, including Benin, Ghana, Ivory Coast, Mali and Togo, have also been reporting infections11–14, suggesting that the true range of Lassa virus may span a sizable part of West Africa.
Lassa is caused by Lassa virus15, a member of the Arenaviridae family (genus Mammarenavirus). Most infections occur through exposure to the excreta of infected Mastomys natalensis. These rodents often live in close contact with human communities16 and are regarded as the primary reservoir for the virus17. Humans likely contribute little to virus transmission and are considered dead-end hosts, based on studies of rodent biology17, ecology15,20, transmission dynamics21,22, and viral genomes23–25. While the virus can only spread where its reservoir is present, the range of M. natalensis extends beyond that of Lassa virus, spanning most of sub-Saharan Africa26,27. The factors underlying this difference in range between the virus and its reservoir have led to long-standing questions about suitability and may be multifactorial: Lassa virus might exclusively circulate within one of the six M. natalensis phylogroups or subspecies, namely A-I28, which is only found in West Africa29–31; other viruses present in M. natalensis populations may prevent Lassa virus circulation through competition32,33; closely related mammarenaviruses inducing cross-reactive immunity34–36 may prevent Lassa virus infection; and finally, Lassa virus prevalence may be influenced by different environmental determinants than its reservoir.
Like the rest of the world, African countries will increasingly be affected by climate change, with warming temperatures and more extreme, yet rarer, precipitation37–39. These changes, combined with an increasing pressure on land resources due to a considerable projected human population expansion, are expected to result in important transformations of land use throughout Africa40–42. In this study, we modelled how the endemic range of Lassa virus may evolve in the next five decades in response to climate change, human population growth, and land use changes. To identify factors driving suitability for virus circulation, we analysed the environmental data associated with the occurrence of Lassa virus for a set of putative explanatory factors. We find that annual mean temperature, annual precipitation and the presence of pastures are the main factors determining ecological suitability for Lassa virus circulation. Using projections of climate, land use, and population up to 2070, we estimated future ecological suitability and show that within decades, the range suitable for Lassa virus may extend well beyond West Africa. Using population projections, we estimate that the extended part of the suitable range will be home to hundreds of millions of people. To identify if, in case of introduction into a new region, specific environmental factors could slow the spread of the virus or even halt it — as proposed for the Niger and Benue rivers23,43 — we analysed geotagged viral genomes and environmental data. We find no evidence that the environmental factors we investigated, not even main rivers, limit virus dispersal. We show however, that over the first decades following a successful introduction into a new region, unless the virus spreads significantly faster than in current endemic areas, its propagation could remain spatially limited. By combining ecological niche modelling with spatially-explicit phylogeography, our study showcases how climate and land use change may transform the future risk of Lassa in Africa.
Results
Temperature, precipitation and pastures/rangeland are the main determinants of ecological suitability for Lassa virus
M. natalensis plays a critical role in the circulation of Lassa virus: as the main virus reservoir, but also as the source of most human infections17,23,25. Interestingly, while M. natalensis occupies a wide range spanning most sub-Saharan Africa17, Lassa virus has never been found outside of West Africa. This difference in range is poorly understood, but may in part be explained by the virus being more sensitive to environmental factors than its reservoir, as in the case of Sin Nombre virus44. For this other rodent-borne virus, environmental conditions can impact the abundance of the host, driving the population density of the reservoir below the threshold needed for virus maintenance44–46. Hence, to investigate factors that may determine ecological suitability, a measure of how suitable environmental conditions are for Lassa virus and M. natalensis, we conducted separate analyses for the virus and its reservoir species. We explored the environmental data associated with the occurrence of the virus (or its reservoir species) and found that annual mean temperature, annual precipitation and pastures/rangeland land coverage are the main determinants of ecological suitability for Lassa virus, while for M. natalensis, precipitation seems to be the most important factor.
To identify factors that determine ecological suitability for Lassa virus and M. natalensis, we built ecological niche models, considering temperature, precipitation, seven types of land cover and human population as potential determinants. Using a boosted regression trees47 (BRT) method, we searched for associations between known occurrences of the virus and its reservoir and the environmental conditions at those sites. As inputs for our models, we used occurrence records collated from online databases and the literature and environmental data obtained from the Inter-Sectoral Impact Model Intercomparison Project phase 2b (ISIMIP2b)48. To assess how each factor contributed to our models, we calculated their relative importance (RI). In the case of BRT models, RI is evaluated based on the number of times the factor is selected for splitting a tree, weighted by the squared improvement to the model as a result of each split, averaged over all trees47.
We found that for Lassa virus, three main factors contributed to the models: temperature (RI = 20.7%), precipitation (RI = 24.5%), pastures and rangeland land coverage (RI = 25.3%; Fig. 1). For M. natalensis, we found that precipitation was the main contributor (RI = 50.4%; Fig. 1). These findings suggest that temperature, precipitation and the presence of pastures/rangeland may be the main factors influencing ecological suitability for Lassa virus, but not its reservoir species, M. natalensis, for which only precipitation appears to be critical.
To assess the relationship between each of our environmental factors and ecological suitability, we plotted response curves, which show how ecological suitability varies with one specific factor, while all others are kept constant at their mean. Ecological suitability values vary between 0 (unsuitable conditions) and 1 (highly suitable conditions). We found that temperatures below 25°C or values of pastures and rangeland land coverage below 20% seem unsuitable for the virus (ecological suitability ∼0; Fig. 1) but still appear relatively suitable (ecological suitability >0.4) for its reservoir species. These results indicate that even if M. natalensis may be found in areas with mean daily temperatures below 25°C and limited pastures and rangeland land coverage, Lassa virus is not likely to be present.
Ecological niche modeling projects a likely expansion of the range suitable for Lassa virus
Our ecological niche modelling analyses showed that temperature, precipitation, and pastures/rangeland coverage are the main factors influencing ecological suitability for Lassa virus circulation. Due to climate change and increasing human pressure on land resources caused by population growth, these variables are expected to change in the next decades40–42. With these expected transformations, the overall area suitable for Lassa virus — also called the ecological niche of the virus49 — will likely undergo substantial changes and expand. To investigate this, we used climate and land cover projections from the year 2030 to 2070 to estimate the future ecological suitability for the virus across Africa. We found that the ecological niche of Lassa virus will likely expand as new regions become suitable, notably in Central and East Africa.
We used our ecological niche models to identify the areas suitable for Lassa virus throughout Africa, based either on current or projected values of temperature, precipitation, land cover and human population from the ISIMIP2b48. We considered environmental values projected at three time points (2030, 2050 and 2070) according to three climate scenarios: representative concentration pathways (RCPs) 2.6, 6.0 and 8.5 — which describe the evolution of global warming depending on different trajectories of greenhouse gases atmospheric concentrations50. For the present-day situation, our ecological niche maps for Lassa virus and M. natalensis were in agreement with previous estimates51,27, showing Lassa virus suitability across West Africa, predominantly in current endemic countries of Guinea, Sierra Leone, Liberia, and Nigeria (Fig. 2A).
At future time points, we found that the ecological niche of Lassa virus may substantially expand under both RCP 6.0 and RCP 8.5 (Figs. 2A and S1). RCP 2.6 and RCP 8.5 are the most extreme scenarios and refer to either stringent mitigation (RCP 2.6), or high-end emissions (RCP 8.5), while RCP 6.0 represents a medium-high emission scenario48. Focusing on RCP 6.0, we projected that by 2070, most of the region between Guinea and Nigeria will become suitable (ecological suitability >0.5) for Lassa virus (Fig. 2A). In addition, we found that several regions will likely become suitable in Central Africa, including in Cameroon and the Democratic Republic of the Congo (DRC), but also in East Africa, notably in Uganda. For M. natalensis, we found that irrespective of the scenario, the ecological niche will likely remain stable in range, with suitability values that increase over time and across the entire niche (Figs. 2A and S1). These results show that, considering a medium-high scenario of evolution of global warming (RCP 6.0), the ecological niche of Lassa virus may expand well beyond current endemic countries, notably into parts of Central and East Africa.
To investigate the factor(s) driving the expansion of the niche of Lassa virus, we represented, on a map of Africa, the environmental values for the main factors influencing ecological suitability at current and future time points (Figs. S4 and S5). In Central and East Africa, areas showing an increased suitability for the virus under RCPs 6.0 and 8.5 also exhibited an increase in temperature and pastures/rangeland land coverage (Figs. 2A, S4 and S5). Based on our observations, these two factors may thus primarily drive the expansion of the range suitable for Lassa virus.
The population living where conditions are suitable for Lassa virus may increase dramatically by 2070, driven by a substantial population growth in both current and future areas suitable for the virus
Having found that the size of the ecological niche of Lassa virus will likely expand in the coming decades, we next investigated how this could affect the future number of people at risk of infection. To estimate the current and future human population in the Lassa virus niche, we considered population projections in areas with an estimated ecological suitability above 0.5 (Fig. S3). We focused again on three future time points (2030, 2050 and 2070) and three climate scenarios (RCPs 2.6, 6.0 and 8.5). We found that under RCP 6.0, the human population living in the niche of Lassa virus, where conditions are suitable for virus circulation, may increase from 92 million today (95% highest posterior density (HPD) interval: [83-98]) to 453 [414-498] million by 2050, and to 700 [624-779] million by 2070 (Figs. 2B and S6, Table S1).
This increase however, may be driven by demographic growth in current suitable areas rather than by the spatial expansion of the virus ecological niche51. To investigate this, we first examined current population numbers in areas suitable for Lassa virus in 2070 (scenario RCP 6.0) and found that they are currently home to 179 million people [159-199]. This result suggests that the population is expected to grow substantially throughout the entire niche of the virus (as projected in 2070), which will more than triple by 2070 (Fig. 2B, Table S1). When comparing the number of people that will live in current or future parts of the niche in 2070, we found that population growth should be comparable in both areas (Table S1). More specifically, our results show that by 2070, 363 [333-384] million people may be exposed to Lassa virus infection in current suitable areas and that expansion of the ecological niche of the virus may put 337 [260-405] million more people at risk of infection.
Phylogeographic inference reveals that Lassa virus circulation is remarkably slow in endemic areas
In our ecological niche modelling analyses, we found that within a few decades, Lassa virus may circulate beyond its current endemic range in West Africa. If the virus is successfully introduced into new suitable regions, we estimated that hundreds of millions more people may be at risk of infection. The emergence of Ebola virus in West Africa and West Nile virus in North America show that zoonotic viruses can travel over long distances to effectively settle into new regions52–55. However, such events remain poorly understood and thus challenging to predict. While it is not possible to assess whether Lassa virus is likely to become established in a new environment, we can investigate how the virus may spread following a potential future introduction using, as a priori estimates, the parameters of virus dispersal in endemic areas. By analysing the spatiotemporal spread of the virus using geotagged viral genomes, we showed that Lassa virus dispersal in endemic areas is remarkably slow compared to other zoonotic viruses.
To infer the spatiotemporal spread of Lassa virus since the emergence of the four major clades25, we analysed publicly available genomic sequences associated with a sampling date and location using a spatially-explicit Bayesian phylogeographic approach56. The genome of Lassa virus is segmented into a large (L) and a small (S) segment that may reassort during coinfections in M. natalensis57,58. As reassortment may result in distinct evolutionary histories for the L and S segments, we analysed them in separate phylogeographic inferences. We also divided our analyses between four main clades (Fig. S7): The “MRU clade” groups the subclades circulating in the Mano River Union (MRU) and Mali (also called lineages IV and V); “NGA clade II”, “III” and “IV” correspond to the main clades circulating in Nigeria (also called lineages II, III and VI, respectively)59. The trees inferred by our phylogeographic analyses capture the spatiotemporal spread of the virus (Fig. 3), with each branch representing dispersal between an estimated start and end location, and associated with an estimated duration.
To investigate the spatial spread of the main clades, we represented the trees inferred by our phylogeographic analyses on maps, separately for the MRU and the Nigerian clades. We observed that in Nigeria, the main clades are confined to distinct areas: clades II and III circulates south and north of the Niger and Benue rivers, respectively, while clade IV is limited to states in the south west (Osun, Ekiti, Ondo, Kwara, Fig. 3). We also noted that sequences from the MRU clade grouped in three main clusters circulating respectively in eastern Sierra Leone, Guinea and Mali (Fig. 3). The strong geographic structure we observed in our phylogenetic trees was consistent across the S and L segments (Figs. 3, S9, and S10) and aligns with previous studies23,24,43. These findings suggest that, although the spread of Lassa virus encompasses hundreds of years (Figs. 3, S9, and S10), virus diversity is distinct across different areas.
To approximate how fast Lassa virus circulates in endemic areas (Manor River Union and Nigeria), we estimated the weighted lineage dispersal velocity60, which corresponds to the total distance covered by the dispersal events in our trees divided by the sum of their durations. We found that Lassa virus circulates with a posterior mean lineage dispersal velocity between 0.8 and 1 km/year (95% HPD interval for the S and L segments: [0.7 - 1.0] and [0.9 - 1.0]; Table 1). Our estimates of the weighted lineage dispersal velocity for each of the main clades show that virus circulation is slowest for the MRU clade and fastest for the Nigerian clade II (Fig. S8) but remains low overall (< 1.5 km/year). These results suggest that Lassa virus circulation is slow, which may in part explain why the main clades are confined to different areas within overall suitable regions (Fig. 2).
To determine how slow the velocity of Lassa virus circulation was compared to other zoonotic viruses, we assembled and sorted all published estimates of weighted lineage dispersal velocities (Table 1). We found that Lassa virus exhibits the slowest lineage dispersal velocity after Nova virus, while Ebola virus appeared to be the fastest. Our results indicate that Lassa virus circulation in endemic areas is particularly slow compared to other zoonotic viruses, possibly due to the small scale of the movements of its reservoir17,26.
Lassa virus dispersal velocity and trajectory do not seem to be strongly affected by the environment
Our phylogeographic inferences show that Lassa virus circulation is remarkably slow in endemic areas, which may explain, at least in part, why the spatial spread of the main clades is limited, even within overall suitable regions. Nevertheless, since Lassa virus depends on M. natalensis for transmission, any environmental feature limiting the mobility of the reservoir may also impact virus dispersal. Main waterways in particular, have been proposed to act as barriers preventing the spread of Lassa virus, based on phylogenetic evidence that virus diversity is distinct across different sides of the Niger and Benue rivers in Nigeria23,43 (Fig. 3). Furthermore, for other viruses, such as rabies, there is evidence that environmental factors including elevation or croplands land coverage have an impact on virus lineage dispersal velocity74,75. To investigate how the environment may limit the propagation of Lassa virus following its potential future introduction into a new suitable region, we explored: (i) the impact of main waterways on the trajectory of Lassa virus dispersal, and (ii) the impact of nine different environmental factors on the velocity of Lassa virus circulation. We found that Lassa virus would likely spread unimpeded by environmental variables, and that major waterways may have limited impact on virus dispersal.
To determine if main waterways act as barriers to virus dispersal, we investigated whether Lassa virus tended to avoid crossing rivers based on our phylogeographic reconstructions. Using a least-cost path algorithm76, we computed the cost for the virus to travel through a landscape crossed by rivers based on both stream network data (Table S6) and the virus dispersal trajectory. We compared the cost of the observed spread inferred by our phylogeographic analyses to the cost computed under a null dispersal model that is unaware of rivers, and then estimated the statistical support for our test by approximating a Bayes Factor (BF) in favor of a cross-avoiding behavior. We repeated our test for a range of stream sizes considering different threshold values of the Strahler number (S) - a proxy for river stream size, based on a hierarchy of tributaries77. We found only moderate evidence (3 < BFs < 20)78 that the virus dispersal trajectory tends to avoid crossing rivers when considering rivers of intermediate sizes (S >4 and <5) and no evidence (BF <3) in the case of larger rivers (S >5, Table S2). Overall, our results provide no strong evidence that waterways may act as barriers to the dispersal of Lassa virus.
We next examined how environmental conditions may affect the velocity of Lassa virus circulation considering a set of nine environmental factors for which we collected geo-referenced data from public databases (Fig. S11, Table S6). For all virus dispersal events inferred by our phylogeographic analyses, we investigated whether the duration of the dispersal correlated with the environmental factors in our testing set. To assess these correlations, we computed an “environmental distance”, which corresponds to the distance of the dispersal event, weighted according to the environmental conditions along the path of dispersal. Our procedure only considers constant-in-time environmental values that do not reflect the climatic and land cover conditions during the earliest part of Lassa virus dispersal history, so we restricted our analyses to the most recent dispersal events (corresponding to tip branches of the trees from our phylogeographic reconstructions Fig. S7). We only found moderate evidence (3 < BFs < 20) that the presence of savannas may slow down viral circulation (Tables S3 and S4). These results suggest that the environmental factors considered in our analysis have no dramatic impact on the velocity of Lassa virus circulation.
Simulations of Lassa virus spread show virus propagation may remain limited following introduction into a new suitable region
In the post-hoc analyses of our phylogeographic inferences, we did not identify any environmental factor that may prevent or notably slow down virus spread in a suitable environment. Hence, in case of introduction into a new suitable region, the main parameter that we can expect to limit Lassa virus propagation based on our analysis would be its slow lineage dispersal velocity. To illustrate how a slow lineage dispersal velocity may limit the spatial extent of virus spread following a potential introduction, we simulated virus dispersal based on the parameters inferred by our phylogeographic analyses (Fig. 4). We ran simulations over a 20-year period in two areas: one projected to become suitable for virus circulation by 2050 under scenario RCP 6.0 and the other one, under RCP 8.5. To simulate virus dispersal, we randomly sampled dispersal events inferred by our phylogeographic analyses for the Nigerian clade II, for which we have the largest number of sequences. We set the trajectory of dispersal events by selecting the ending location with a probability equal to the local ecological suitability (as projected in our ecological niche modelling analyses). By mapping the results of 1,000 simulations of virus dispersal, we show that Lassa virus would likely remain confined within a range of ∼200 km2 (Fig. 4), even when starting within a large suitable area (e.g. with scenario RCP 8.5, Fig. 4 and S14). Our simulations show how, if Lassa virus circulation is as slow as in current endemic areas, virus propagation would remain spatially limited over the first decades following its introduction into a new ecologically suitable area.
Discussion
Previous molecular dating studies have shown that Lassa virus has been circulating for at least 1,000 years and originated in present-day Nigeria, from where it spread to the West, reaching into the MRU region11,25,79,80. Lassa virus is considered endemic in Guinea, Liberia, Nigeria and Sierra Leone, but the virus likely circulates in other neighboring countries along its presumed dispersal path11–14. Although we do not attempt to precisely map the current range of Lassa virus, our ecological suitability estimates for the current period appear globally similar to the results obtained in an earlier study27 and show areas suitable for virus circulation across West Africa (see Figure 2A). Consistent with reports of Lassa virus infections in humans and rodents outside of endemic hotspots, our results suggest that the virus may be present in most coastal West African countries and Mali, prompting strengthened Lassa fever surveillance throughout the whole region.
M. natalensis is considered the primary reservoir of Lassa virus16 and it is still unclear why the distribution of this rodent species extends far beyond that of the virus, which is limited to West Africa26. Our analyses show that different environmental factors determine ecological suitability for the virus and its host, suggesting that the absence of Lassa virus beyond West Africa could be partly due to environmental constraints. To be able to estimate future ecological suitability for Lassa virus circulation, we have used projected environmental data with a low resolution (0.5 decimal degrees), due to the coarse scale of climate change projections48,81. This limited resolution reduced our ability to account for small-scale environmental variations that could affect suitability for Lassa virus; however, the good performance of our models (area under the receiver operating characteristic curves between 0.74 and 0.85) suggests that our approach provides a reasonable estimate of the distribution of Lassa virus infections.
In addition to environmental aspects, several other factors may also contribute to the difference in distribution between Lassa virus and its reservoir. As reported for other mammarenaviruses, the virus may only be present in the M. natalensis subtaxon A-I30,31. Records of Lassa virus infection in M. natalensis subtaxon A-II and in other rodent species including M. erythroleucus or Hylomyscus pamfi82 suggest, however, that susceptibility to Lassa virus infection may not be species or subtaxon specific. Other possible explanatory factors include intra-host competition between different viruses32,83, or cross-immunity due to the circulation of closely related viruses34,35. As most other old world arenaviruses that circulate in M. natalensis are found only in East Africa30,31, there is little data to assess these two mechanisms based on field data. Only in Mayo-Ranewo, eastern Nigeria, rodent trapping studies have identified a Mobala-like virus in M. natalensis84, which does not seem to effectively restrict the transmission of Lassa virus, as infections are often reported in that part of Nigeria85,86.
In our phylogeographic analyses we find that Lassa virus mostly spreads on a small spatial scale, with relatively few long-distance dispersal events (Fig.3) but we do not identify environmental factors that seem to strongly restrict or slow down virus spread. Using a phylogeographic simulation procedure, we also show that a slow lineage dispersal velocity would likely result in a limited spatial propagation if Lassa virus was successfully introduced in a new ecologically suitable area. The slow spread of Lassa virus is likely due to the small scale of the movements of its reservoir, as suggested by genetic studies showing that M. natalensis rodents travel rarely outside of their commensal habitat and are prone to high levels of consanguinity26,17. However, it is surprising that Lassa virus - and by implication its reservoir - seem unrestricted by the environment. Of note, our results were not always consistent between the S and L segments, possibly due to the lower number of L sequences (255) in our data set compared to S sequences (411). More generally, the number of genomic sequences in our datasets may offer limited power for the tests we used to assess the possible impact of rivers and other environmental factors on virus spread. Hence, a larger sampling of Lassa virus genomes throughout the virus range would allow for better evaluation of the role of the environment in limiting spread.
In our study, we use phylogeographic simulations to highlight how, in the absence of restrictions from the environment, a slow lineage dispersal velocity may limit the propagation of Lassa virus in case of introduction into a new ecologically suitable area. We use these simulations for illustration and not prediction as the dispersal dynamics upon virus emergence in a new region are unclear. The virus may spread swiftly through an immunologically naïve rodent population, but the low mobility of the rodent reservoir could still limit the velocity of virus dispersal on a larger scale. Many other elements may come into play, such as a possible change in reservoir host species, or the co-circulation of closely related viruses. Nevertheless, it is worth pointing out that following the emergence of Lassa virus in the MRU, virus circulation remained as slow - if not slower - as in Nigeria, as highlighted by our estimates of weighted lineage dispersal velocity (Fig. S8).
Our ecological niche analyses highlight a risk of expansion of Lassa virus towards regions in Central and East Africa that could be home to up to 337 million people by 2070 (Table S1). To reach the largest ecologically suitable regions we identify in DRC and Uganda, the virus would have to spread over several hundreds of kilometers and cross regions with low ecological suitability. Such long-distance movements likely allowed Lassa virus to reach the Mano River Union from Nigeria several hundred years ago25. This early part of the virus dispersal history, however, remains poorly understood and it is thus hard to predict if the virus is likely to travel across the African continent again. To provide a very conservative estimate of the future risk of exposure to Lassa virus, we can focus on population growth in the endemic range and leave aside its possible expansion. We estimated that population growth in endemic countries alone could alone put 341 [308-360] million people at risk of infection by 2070 (Table S1), compared to an estimated 92 [83-98] million today. A limitation to these estimates is that our population projections do not take into account migrations due to environmental and climate change pressures, which could affect projections in regions where extreme weather conditions are expected.
A large part of the population growth expected in endemic areas is driven by Nigeria (∼91%), a country that has reported an unusual increase in the number of reported Lassa fever cases over the last two years85,86. This uptick was not attributed to increased inter-human transmission23,24 or to the emergence of a specific viral strain23,24; but raised the question of a more intense circulation within the reservoir or of an improvement in surveillance and public awareness. To discriminate between these two hypotheses, we investigated the evolution of the overall genetic diversity of Lassa virus in the main Nigerian clade (clade II) over the past decades, using a coalescent approach that accounts for preferential sampling. We found that the effective population size of Nigeria clade II increased over the last years (segment S; Fig. S13), suggesting that the recent uptick in cases in Nigeria was not the sheer result of an improvement in surveillance. Hence, even if Lassa virus does not expand to new regions in the near future, the virus still actively circulates in increasingly populated endemic areas, and there is thus an urgent need for more efficient prophylactic and therapeutic countermeasures.
With anthropogenic climate change and an increasing impact of human activities on the environment, extensive studies of the ecology and spread of zoonotic and vector-borne diseases are needed to anticipate possible future changes in their distribution87,88. We showed that changes in temperature, precipitation and pastures/rangeland land coverage may expand the ecological niche of Lassa virus beyond current endemic areas, potentially exposing hundreds of million more people to Lassa. By simulating virus spread, we highlight that if virus propagation does not accelerate following introduction into new regions, the emerging circulation foci could remain limited to a small spatial scale over the first decades. Our study provides an example of how ecological niche modelling and spatially-explicit phylogeography can be effectively combined to investigate the future risk of a major zoonotic disease.
Methods
Ecological niche modelling of Mastomys natalensis and Lassa virus
We employed the boosted regression trees89 (BRT) approach implemented in the R package “dismo”90 to perform ecological niche modelling analyses of both Lassa virus and its host, M. natalensis. BRT is a machine learning method that allows to model complex non-linear relationships between the probability of occurrence and various predictor variables90,91. This approach aims to generate a collection of sequentially fitted regression trees that optimise the predictive probability of occurrence based on predictor values89,91, which can also be interpreted as a measure of ecological suitability. In a comprehensive review of distribution modeling methods, Elith et al.89 found BRT to perform best along with the maximum entropy method92. The BRT approach requires both presence and absence data. When unavailable, as this is the case for Lassa virus and its host, absence data can be approximated by random pseudo-absence points sampled from the study area (also referenced as the “background”). For Lassa virus, we only sampled pseudo-absences in raster cells in which the presence of M. natalensis has been recorded. This procedure avoids treating under-sampled areas as ecologically unsuitable for the virus, but also accounts for potential heterogeneity in sampling effort or surveillance93,94. Similarly, for M. natalensis, we only sampled pseudo-absences in raster cells in which the presence of at least one individual of another species belonging to the Muridae family has been recorded. Because it only requires a single occurrence record to consider a presence, we discarded all but one occurrence record per raster cell. We applied the same filtering step for the pseudo-absence points and simply discarded pseudo-absences falling in raster cells with occurrence data. To select the optimal number of trees in the BRT models, we used a spatial cross-validation procedure based on five spatially separated folds generated with the “blockCV” R package95. We employed a spatial rather than a standard cross-validation because the latter may overestimate the ability of the model to make reliable predictions when occurrence data are spatially auto-correlated96, which can frequently be the case. All BRT analyses were run and averaged over 10 cross-validated replicates, with a tree complexity set at 5, an initial number of trees set at 100, a learning rate of 0.005, and a step size of 10. We evaluated the inferences using the area under the receiver operating characteristic (ROC) curve, also simply referred to as “area under the curve” (AUC). Among replicates, AUC values ranged from 0.68 to 0.73 for M. natalensis (mean = 0.71), and from 0.74 to 0.85 for Lassa virus (mean = 0.79).
We obtained occurrence data for M. natalensis species from the Global Biodiversity Information Facility (http://www.gbif.org, accessed 2019-07-19), the Integrated Digitized Biocollections (https://www.idigbio.org, accessed 2020-01-04), the Field Museum of Natural History Zoological collections (https://collections-zoology.fieldmuseum.org, accessed 2019-12-13), and the African Mammalia database (http://projects.biodiversity.be/africanmammalia, accessed 2019-12-14). This data set was supplemented with the data available in the scientific literature (search for term “Mastomys natalensis”, in PubMed and Google). Duplicate records as well as records located in the ocean were excluded from the final data set, totalling 2,504 unique M. natalensis occurrence records. For 26 of those records, the location was not provided as spatial coordinates but as a locality (below or at the administrative level 4). Therefore, the latitude and longitude data correspond to that of the locality (determined as described in the subsection Selection and preparation of viral sequences; see below). Occurrence data for the Muridae family were obtained from the GBIF database. Duplicate records and records located in the ocean were excluded from the data set, totalling 10,806 unique Muridae occurrence records for the African continent. Occurrence data for Lassa virus were obtained by combining the data set from Fichet-Calvet & Rogers97 with records associated with sequences from our Lassa virus sequence data set (see below the subsection Selection and preparation of viral sequences for further detail), records of infected M. natalensis from our host occurrence data set and the data available in the scientific literature (search for term “Lassa virus” in PubMed). Duplicate records were discarded from the data set, resulting in 310 unique Lassa virus occurrence records. For two of those records, the location was not provided as spatial coordinates but as a locality (below or at the administrative level 4) so the latitude and longitude data corresponded to that of the locality (determined as described in the subsection Selection and preparation of viral sequences; see below). Our BRT models were trained on current environmental factors and then used to obtain estimates of future ecological niches for both Lassa virus and M. natalensis.
The BRT analyses were based on several environmental factors: harmonised present-day and future climate, land cover and population data available through the Inter-Sectoral Impact Model Intercomparison Project phase 2b (ISIMIP2b)48. The climate information consists of daily gridded near-surface air temperature and surface precipitation fields derived from four bias-adjusted98 global climate models (GCMs; GFDL-ESM2M99, HadGEM2-ES100, IPSL-CM5A-LR101, and MIROC5102) participating in the fifth phase of the Coupled Model Intercomparison Project (CMIP5103). We considered simulations conducted under historical climate forcings and RCPs 2.6, 6.0 and 8.5. In addition, we considered observed gridded temperature and precipitation from the concatenated products GSWP3 and EWEMBI48 for assessing the current (1986-2005) conditions. For land cover we use version 2 of the Land Use Harmonisation (LUH2104) providing historical and projected land cover states under a range of shared socioeconomic pathways (SSPs), and from which we consider SSP1-26, SSP4-6.0 and SSP8-85. Finally, we retrieve gridded population projections105 under SSP2-26. For each combination of product (GCM, GSWP3-EWEMBI LUH2, gridded population), scenario (historical, RCP, SSP) and analysis window (1986-2005, 2021-2040, 2041-2060, and 2061-2080), we compute the grid-scale temporal mean. For each scenario and time period, we estimated an index of human exposure (IHE) which corresponds to human population estimates (log10-transformed) in raster cells associated with an ecological suitability for Lassa virus above or equal to 0.5. Specifically, we used these IHE values to calculate the number of people at risk of exposure to Lassa virus. To investigate the specific effect of human population growth in current and future suitable areas, we also re-estimated future IHE values using (i) current population estimates with future projections of ecological suitability for Lassa virus to estimate population growth throughout current and future suitable areas, and (ii) future projections of human population with current projection of ecological suitability for Lassa virus to estimate the future population living in current suitable areas (Table S1). For each estimate, we calculated the mean and 95% HPD interval across all climatic models and ecological niche model replicates.
Selection and curation of viral sequences
All publicly available sequences for Lassa virus were downloaded from the NCBI Nucleotide database (keywords: “lassa NOT mopeia NOT natalensis”). They were combined with recently generated sequences from Nigeria that have been sequenced as described previously by Kafetzopoulou and colleagues106 and that are publicly available on the website virological.org (https://virological.org/t/2019-lassa-virus-sequencing-in-nigeria-final-field-report-75-samples/291). We filtered the data by: i) excluding laboratory strains (adapted, passaged multiple times, recombinant, obtained from antiviral or vaccine experiments), ii) excluding sequences without a timestamp, (iii) keeping only sequences from a single timepoint (if multiple timepoints were available for a patient), iv) removing duplicates (when more than one sequence was available for a single strain), and v) excluding sequences from identified hospital epidemics or sequences for which the location corresponded to the site of hospitalisation. The remaining sequences were trimmed to their coding regions and arranged in sense orientation separately for the S segment (NP-NNN-GPC) and the L segment (L-NNN-Z). The sequences were aligned using MAFFT107 and inspected manually. At this step we discarded low quality sequences (manual curation) and very short sequences (combined ORF length <500nt). Since there is an overlap between the sequence data from the work of Siddle and colleagues23 and of Kafetzopoulou and colleagues24, we excluded sequences with zero or one mismatch between the two sets of sequences to ensure that there would not be duplicates in our data sets. Two types of alignments were generated. The alignments with all curated sequences regardless of the availability of detailed location information included 756 S segment sequences and 551 L segment sequences, respectively. The alignments with detailed location information included 411 S segment sequences and 255 L segment sequences, respectively. For the sequences with detailed location information, when no spatial coordinates were provided but only a name, spatial coordinates were determined using a combination of online platforms (Table S5). When several coordinates were available for one location, those matching across several data sets were kept, if the location was found in only one data set, the coordinates corresponding to the highest administrative level were kept.
Inferring the dispersal history of Lassa virus lineages
We performed spatially-explicit phylogeographic reconstructions using the relaxed random walk (RRW) diffusion model56 implemented in BEAST 1.10108, which was coupled with the BEAGLE 3 library109 to improve computational performance. We modelled the nucleotide substitution process according to a GTR+Γ parameterisation110 and branch-specific evolutionary rates according to a relaxed molecular clock with an underlying log-normal distribution111. These phylogeographic analyses were based on the alignments of sequences associated with known spatial coordinates. For both the demographic and phylogeographic reconstructions, we ran a distinct BEAST analysis for each segment (L and S), sampling Markov chain Monte-Carlo (MCMC) chains every 105 generations. We used Tracer 1.7111 for identifying the number of sampled trees to discard as burn-in, but also for inspecting the convergence and mixing, ensuring that estimated sampling size (ESS) values associated with estimated parameters were all >200. We used TreeAnnotator 1.10112 to obtain a maximum clade credibility (MCC) tree for each BEAST analysis. Finally, we used the R package “seraphim”60 to extract the spatiotemporal information embedded within trees obtained by spatially-explicit phylogeographic inference, as well as to estimate the weighted lineage dispersal velocity.
Impact of environmental factors on the dispersal dynamics of Lassa virus lineages
Based on the spatially-explicit phylogeographic reconstructions, we performed two different kinds of analyses to investigate the impact of several environmental factors on the dispersal history and dynamics of Lassa virus lineages. First, we tested the impact of main rivers acting as potential barriers to Lassa virus dispersal (see Table S6 for the source of the original rivers shapefile). For this purpose, we used the least-cost path algorithm76 to compute the total cost for viral lineages to travel through a landscape crossed by rivers. This algorithm uses an underlying environmental raster to compute the minimum cost to move from one position to another. Here, we generated rasters by assigning a value of “1” to raster cells that were not crossed by a main river and a value of 1+k to raster cells crossed by a main river (raster resolution: ∼0.5 arcmin). Because the raster cells that were not crossed by a main river were assigned a uniform value of “1”, k thus defines the additional resistance to movement when the cell does contain such a potential landscape barrier61,113. In order to assess the impact of that rescaling parameter, we tested three different values for k: 10, 100 and 1000. Furthermore, as the notion of “main river” is arbitrary, we used different threshold values of the Strahler number S associated with each river to select the main rivers to consider in each analysis. In hydrology, S can be used as a proxy for stream size by measuring the branching complexity, i.e. the position of a river within the hierarchical river network. In practice, we compared the total cost computed for posterior trees with the total cost computed on the same trees along which we simulated a stochastic diffusion process under a null dispersal model73. Hereafter referred to as “simulated trees”, these trees were obtained by simulating a relaxed random walk process along the branches of trees sampled from the posterior distribution obtained by spatially-explicit phylogeographic inference73. Because this stochastic diffusion process did not take the position of rivers into account, we can expect the total cost to be lower for inferred trees under the assumption that viral lineages did tend to avoid crossing rivers. For each inferred or simulated tree, we computed the total cost TC, i.e. the sum of the least-cost values computed for each phylogenetic branch considered separately. Each “inferred” TC value (TCinferred) was then compared to its corresponding “simulated” value (TCsimulated) by approximating a Bayes factor (BF) support as follows: BF = [pe/(1-pe)]/[0.5/(1-0.5)], where pe is the posterior probability that TCsimulated > TCinferred, i.e. the frequency at which TCsimulated > TCinferred in the samples from the posterior distribution. The prior odds is 1 because we can assume an equal prior expectation for TCinferred and TCsimulated.
Next, we tested the impact of several environmental variables, again described as rasters, on the dispersal velocity of Lassa virus lineages (Table S6, Figure S11): main rivers (as defined by selecting rivers with a S value higher than 2, 3, 4, 5, and 6), forest areas, grasslands, savannas, croplands, annual mean temperature, annual precipitation, and human population density. Except for the generated river rasters (see above), all these rasters had a resolution of ∼2.5 arcmin. For one-dimensional landscape features such as rivers, we had to resort to higher resolution rasters (∼0.5 arcmin) to obtain sufficiently precise pixelations (rasterizations) for this non-continuous environmental factor. Raster cells assigned to rivers would otherwise be exceptionally large given the size of the study area, which could potentially lead to artifactual results. Environmental rasters were tested as potential conductance factors (i.e. facilitating movement) as well as potential resistance factors (i.e. impeding movement). For each environmental variable, we also generated several distinct rasters with the following formula: vt = 1 + k*(vo/vmax), where vt is the transformed cell value, vo the original cell value, and vmax the maximum cell value recorded in the raster. The rescaling parameter k here allows the definition and testing of different strengths of raster cell conductance or resistance, relative to the conductance/resistance of a cell with a minimum value set to “1”70. For each of the three environmental factors, we again tested three different values for k (i.e. 10, 100 and 1000). The following procedure can be summarised in three successive steps114: (i) based on environmental rasters, we computed environmental distance for each branch in inferred and simulated trees. These distances were computed using two different algorithms: the least-cost path and Circuitscape algorithm, the latter using circuit theory to accommodate uncertainty in the route taken115. For computational tractability, high resolution river rasters were only tested with the least-cost path algorithm. (ii) We estimated the correlation between time durations and environmental distances associated with each phylogenetic branch. Specifically, we estimated the statistic Q defined as the difference between the coefficient of determination obtained when branch durations are regressed against environmental distances computed on the environmental raster, and the coefficient of determination obtained when branch durations are regressed against environmental distances computed on a uniform “null” raster, i.e. a uniform raster with a value of “1” assigned to all its cells. We estimated Q for each tree and we thus obtained two distributions of Q values: one for inferred and one for simulated trees. We only considered an environmental raster as potentially explanatory if both its distribution of regression coefficients and its associated distribution of Q values were positive116. (iii) We evaluated the statistical support associated with a positive Q distribution (i.e. with at least 90% of positive values) by comparing it with its corresponding null distribution of Q values based on simulated trees. We formalised this comparison by approximating a BF support as defined above, but this time defining pe as the posterior probability that Qestimated > Qsimulated, i.e. the frequency at which Qestimated > Qsimulated in the samples from the posterior distribution74. For computational reasons, the “main rivers” rasters, which had to be associated with higher resolution (see above), were only tested as resistance factors with the least-cost-path algorithm.
Phylogeographic simulations
We implemented a phylogeographic approach to simulate virus dispersal over a 20-year period following a successful introduction event within a new ecologically suitable area in 2050 under scenarios RCP 6.0 and RCP 8.5. We simulated viral lineage dispersal events by randomly sampling from the dispersal events inferred by our phylogeographic analyses. These simulations were performed under the assumption of no notable impact of underlying environmental factors. To set the trajectory of lineage dispersal events, we selected the ending location with a probability defined by the local ecological suitability. The starting point of those simulations was selected arbitrarily within the most suitable area of the extended part of the ecological niche estimated for LASV in 2050, and was thus different for simulations performed under scenarios RCP 6.0 and RCP 8.5.
Inferring the demographic history of Lassa virus lineages
We performed demographic reconstructions using the flexible skygrid coalescent model117 implemented in BEAST 1.10108. The skygrid model allows to estimate the past evolution of the viral population effective size through time. For these analyses, we also modelled the nucleotide substitution process according to a GTR+Γ parameterisation110 and branch-specific evolutionary rates according to a relaxed molecular clock with an underlying log-normal distribution111. In the case of NGA clade II for which we inferred a recent increase in the global effective population size, we also performed a preferential sampling analysis118. By modeling the sampling times as a process dependent on effective population size, this complementary analysis allows to explicitly take into account heterogeneous sampling density through time, which can improve estimates of global effective population size118.
Data Availability Statement
R scripts and related files needed to run all the ecological niche modelling and landscape phylogeographic analyses, as well as BEAST XML files, are all available at https://github.com/sdellicour/lassa_spreads.
Competing interests
The authors declare no competing interest.
Acknowledgements
The authors thank David Pigott for sharing their environmental suitability predictions for Mastomys natalensis. Simon Dellicour is supported by the Fonds National de la Recherche Scientifique (FNRS, Belgium) and was previously funded by the Fonds Wetenschappelijk Onderzoek (FWO, Belgium). This research was supported by the National Institute Of Allergy And Infectious Diseases of the National Institutes of Health under Award Numbers U01AI151812, R01AI153044 and U19AI135995. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The research leading to these results has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 725422-ReservoirDOCS), from the Wellcome Trust through project 206298/Z/17/Z (The Artic Network), and from the European Union’s Horizon 2020 project MOOD (grant agreement no. 874850). This research was supported by the Research and Innovation Programme of the European Union under H2020 grant agreement n°871029-EVA-GLOBAL. Philip Lemey acknowledges support by the Research Foundation - Flanders (Fonds voor Wetenschappelijk Onderzoek - Vlaanderen, G066215N, G0D5117N and G0B9317N).
References
- 1.↵
- 2.
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.
- 39.↵
- 40.↵
- 41.
- 42.↵
- 43.↵
- 44.↵
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.
- 54.
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.
- 109.↵
- 110.↵
- 111.↵
- 112.
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵