Using drivers and transmission pathways to identify SARS-like coronavirus spillover risk hotspots

The emergence of SARS-like coronaviruses is a multi-stage process from wildlife reservoirs to people. Here we characterize multiple drivers—landscape change, host distribution, and human exposure—associated with the risk of spillover of zoonotic SARS-like coronaviruses to help inform surveillance and mitigation activities. We consider direct and indirect transmission pathways by modeling four scenarios with livestock and mammalian wildlife as potential and known reservoirs before examining how access to healthcare varies within clusters and scenarios. We found 19 clusters with differing risk factor contributions within a single country (N = 9) or transboundary (N = 10). High-risk areas were mainly closer (11-20%) rather than far ( < 1%) from healthcare. Areas far from healthcare reveal healthcare access inequalities, especially Scenario 3, which includes wild mammals and not livestock as secondary hosts. China (N = 2) and Indonesia (N = 1) had clusters with the highest risk. Our findings can help stakeholders in land use planning, integrating healthcare implementation and One Health actions.


Introduction
The process of infectious disease emergence from animals begins with the cross-species transmission (spillover) of a microbe (e.g., virus, bacteria, fungus, protozoans) to a new animal host in which it is pathogenic [1][2][3] . Identifying areas of higher spillover probability is an important strategy for pandemic prevention, and has largely focussed on estimating host distributions 4,5 and modeling frameworks for adding proxies for disease risk and spread in the face of limited data [4][5][6] . Yet, successful emergence events are complex multi-stage processes with many possible pathways leading from the original wildlife reservoir to sustained transmission in people 7 . The probability of any of these pathways occurring and resulting in infection emergence varies temporally and spatially, so cross-scale mapping of the multiple, diverse drivers of disease emergence is needed to better allow decision-makers to know where to focus surveillance and mitigation strategies 8 .
Human infectious diseases almost all came from other species 3 . COVID-19, Ebola virus disease, HIV/AIDS and Zika virus disease are recent examples, whereas those like measles arose after the Neolithic Agricultural Revolution 9 . Zoonotic disease emergence has accelerated in recent decades, likely as a result of diverse interacting drivers such as accelerated land use change 10 , human encroachment of natural habitats, increasing and changing contacts among and between wildlife and domestic animals, and has been mostly linked to mammals and birds 11 . Bats are among the natural hosts of viruses in the coronavirus (family Coronaviridae) subgenus Sarbecovirus (Severe acute respiratory syndrome (SARS)-related coronaviruses), which includes SARS-CoV-1 and SARS-CoV-2 12,13 . Bat hosts of sarbecoviruses are broadly distributed but the highest diversity is in Southeast Asia 5 . Human infection with Sarbecovirus from bats may be more frequent than reported from traditional surveillance 14 and likely includes secondary hosts 15,16 . Viral infection prevalence contributes to the risk of spillover 2 , and can be influenced by biological factors such as birthing cycles 17,18 and external stimuli such as human changes to land use 19 (but see 20,21 ).
Large scale risk assessments in which areas with similar risk profiles are identified provide invaluable information 12,22 and can be rapid, while the development of local, detailed and intricate spillover and outbreak risk assessments are costly and can take a long time 23,24 . Since detailed and validated data for recent reports on outbreak risk reduction are lacking for most regions of the globe (e.g. the Sendai framework, https://sendaimonitor.undrr.org/), a broad evaluation targeting Sarbecovirus emergence can be advantageous to discuss diverse contexts across the region where most natural hosts of sarbecoviruses occur. Human encroachment has led to decreased distances between bat roosts and human settlements 25 , so part of the relevant hazard for inferring spillover risk can be spatially quantified from remotely sensed proxies for socioecological risk factors.
Here, we identify where putative drivers for emergence risk overlap, focusing on the biological possibility of the emergence of a Sarbecovirus. Our goal is to aid mitigation and surveillance activities throughout South, East and Southeast Asia, by identifying both where efforts should focus and which risk factors should be prioritized based on where outstanding values overlap the most (hotspots) and based on healthcare access. Specifically, we asked: Where are the hotspots of risk that capture a range of hypothesized transmission scenarios representing different pathways to the emergence of a novel SARS-like coronavirus? Can we identify spatially cohesive clusters of hypothesized risk drivers that, when combined, increase risk of zoonotic spillover 22,26 ?; How do the access to healthcare for high-risk areas vary according to transmission scenario?
We predicted that there would be co-occurring hotspots for most risk drivers converging in biodiverse regions with bat hosts in regions with greatest pressure from anthropogenic land use change. The four scenarios evaluated represent different nested transmission pathways. We assume that the risk of emerging new SARS-like outbreaks is associated with social, biological and environmental components and, because there are unobserved dynamics for emerging viruses 27 , we evaluated four nested spillover pathway scenarios based on landscape change and potential hosts 28 : Scenario 1 (direct transmission -known bat hosts) represents direct transmission from bats to people, facilitated by the landscape condition, human population, and known bat hosts. Although molecular investigations suggest that direct transmission of sarbecoviruses from bats to humans may be possible 29 , it has yet to be better documented 14,30,31 . Rather, the involvement of an intermediary or bridging host appears more likely, perhaps because this allows for recombination and viral evolution, and/or leads to greater exposure to human populations. Consequently, we developed Scenarios 2, 3, and 4 to represent indirect pathways that build on Scenario 1 by adding livestock (Scenario 2, indirect transmission -mammalian livestock) and wild mammals (Scenario 3, indirect transmission -wild mammals). Scenario 4 (indirect transmission -all mammals) is a global scenario comprising landscape condition, human population, known bat hosts, mammalian livestock and wild mammals.
We expected clusters to occur across country borders and differing pathways to alter risk hotspot distributions.

Characterization of risk driver hotspots
The study region comprises a 25796-pixel grid for the terrestrial area evaluated. Univariate hotspot areas differ in magnitude ( Figure 1) and extent. Most hotspots concentrate at latitudes between 20 and 40 degrees. The univariate hotspots with the largest spatial extent are those obtained for agricultural and harvest land, followed by high integrity forests and areas with high deforestation potential. The majority of the included region comprises coldspots for primary bat hosts. Drivers with the greatest extent of coldspots were livestock (pigs and cattle) followed by known bat hosts. The largest extent of intermediate areas was for built up land, which presented no coldspots due to the ubiquitous nature of human occupation in terrestrial areas. The largest differences in results for all Bovidae livestock versus cattle-only hotspots (see Methods) are in central China, parts of north (Hebei, Shanxi, and Henan) China and central India ( Figure S2). The complete overlap of hotspots considering all ten univariate hotspots at one grid cell never occurred.

Hotspot co-occurrence in clusters
We identified spatially cohesive clusters by including all hypothesized risk drivers ( Figure 3). The optimal number of multivariate spatial clusters is nine when 10% of the human population is used as a minimum bound variable and 19 for 5% of the human population. There is an incremental benefit reduction from adding clusters, from nineteen groups on ( Figure S4). The clusters from the cut-off value of 5% are nested within the 10% clusters ( Figure S5), and we present the clusters for 19 areas.
The detected clusters were commonly transboundary (Table 1), located across a maximum of six countries, and a median of 2 countries. Besides 10 transboundary clusters, nine clusters are restricted to a single country, 6 in China, 2 in India, and 1 in Indonesia (Java). The top ranking clusters in terms of hotspots were all located in a single country (China or Indonesia): Beijing (cluster 19), Java (cluster 17), and Sichuan and Yuzhong District, Chongqing (cluster 16). The clusters with highest scores were among the smaller clusters in geographical extent. Inner-West China (cluster 1), South Lhasa and Arunachal Pradesh (cluster 15), and Philippines, East Timor, West Papua (cluster 9) had the highest scores for coldspots. Areas with the highest scores for the Intermediate class were Assam, West Burma block, Steppe and Sri Lanka (cluster 2), followed by Southwest Indochina (cluster 11) and North India (cluster 14). Clusters with the all Bovidae livestock version are in Figure S6, and they were very similar to the cattle-only versions from the main text, except for the Beijing area and the division of the two larger clusters in India, West India and East India.

Potential outbreak detection and spread
When we cross the risk factor spatial information with healthcare access measured as travel time, the largest differences between combinations of quantiles of the two covariates are in the lowest and highest quantiles of both variables ( Figure 4A, GIF 1). We calculated the areas with high-risk values that are far or close to healthcare for all scenarios ( Figure S7) within the spatial clusters from the skater analysis. From the entire study region, areas closer to healthcare that had high hotspot overlap (areas in yellow in Figure 4, Figure S7) covered an area ranging from 11.96% in Scenario 1, to 20.28% in Scenario 2, 14.66% in Scenario 3, and 13.67% in Scenario 4. Areas far from healthcare that present high hotspot overlap (in red Figure 4, Figure S7) were much rarer and varied according to scenarios, always covering less than 1% of the studied region, ranging from 0.1% in Scenario 1, to 0.30 in Scenario 2, 0.91% in Scenario 3-with significantly higher travel times to healthcare ( Figure 4B, p <0.05)-and 0.22% in Scenario 4 (Table S1, Figure 4B). The relationship between travel time to healthcare and human population counts ( Figure S8) shows that areas far from healthcare tend to have lower population counts (out proxy for exposure), but the relationship is non-linear.

Discussion
Urgent actions are needed to decrease disease emergence risk 32,33 . Using a macroscale approach, we assessed the distribution of locations with a greater risk of experiencing Sarbecovirus spillover events using landscape conditions and exposure of potential hosts (wildlife, domestic, human). Landscape conditions coupled with predictions of the distribution of known hosts and proxies for potential hosts and processes linked to human exposure to novel viruses can be a powerful tool for sample prioritization when limited viral spillover information is available, such as for sarbecoviruses 14 .
The overlap of risk factor hotspots represents pressure points on natural ecosystems that have been extensively altered in terms of agriculture, deforestation, and livestock production. In some cases the environment with One Health approaches. These approaches emphasise nature-based mitigation strategies, looking at the socio-economic drivers that shape local landscape conditions. Our analyses also show that risk factor clusters are commonly multinational, and action plans are a complex task to implement. However, transboundary, coordinated action between nations that share territorial limits is paramount if configuration of hotspots is taken into account when managing, protecting and restoring land to mitigate disease emergence risk.
Remote areas that present little spatial overlap in risk factor hotpots (blue, Figure 4) may represent conditionally safer areas. In those areas, priority should be assessing and reducing other disaster and disease risks. In areas of high potential assessed risk (khaki, orange and red, Figure 4), actions should be focused on the drivers of spillover. Recent literature suggests three broad, costeffective actions to minimize pandemic risk: better surveillance of pathogen spillover, better management of wildlife trade, and substantial reduction of deforestation (i.e. primary prevention) 33 .
Landscape planning should have priority, as these can have other benefits 35,36 and can include preventive measures to reduce levels of contact between people and potential wild and domestic animal hosts. Biosecurity measures and surveillance and integrated wildlife monitoring 37 are also key where multi-component risk levels are higher 38 . Syndromic, virological, serological, and behavioral risk surveillance of people with regular proximity with known reservoir or potential amplifier hosts 38 can be of great value in these hotspots, but the ultimate prevention should be in primary prevention.
Beyond viral monitoring and discovery, prevention can be achieved by reducing deforestation, intensive concentrated livestock production, wildlife trade and increasing sustainable management of agricultural areas 33 .
Surveillance effort correlates with detecting infections and where human populations intersect with wildlife, risk increases 39,40 . In addition, evidence from Brazil suggests zoonotic risk increases with remoteness (along with increased wild mammal species richness) and decreases in areas with greater native forest cover 41 . Our results suggest high-risk areas are often (11-20%) associated with faster travel times to healthcare, compared to remote areas (<1%) (yellow and red respectively, Figure   4). The problem posed by remote sites for emergence mitigation is that while spillover probability and initial ease of spread may be lower, so too is detection probability 39 because of the distance to healthcare. This may allow localized, remote outbreaks to establish and spread in human populations before detection [42][43][44] . Our findings can be helpful in allocating efforts for surveillance, sustainability and conservation actions and long term plans for ecological intervention, including in areas with high emergent risk scores. Importantly, additional layers of prioritisation could be added to implement mitigation actions on hotspots, for instance, where climate change vulnerability is also high, such as in Java 45 . Also, regions of China, in terms of mobility are outstandingly connected, which highlights the need to reduce pressures arising from multiple hotspots.

Scenario 2 (indirect transmission through livestock) had the highest number of regions with
high-risk areas close to healthcare (yellow, Figure 4). These areas are extensive across the study region in all scenarios, and should be prioritised for temporal screening for viruses in livestock, the understanding of known hosts, and investments in improving public health responses to spread. Highrisk areas far from healthcare (red) represent small regions of our study area (<1%) in all scenarios, where Scenario 1 had the fewest and Scenario 3 had the highest areas. These are areas with higher possibilities for spillover, that would also be likely to go undetected during the early stages of humanto-human transmission and spread. In those regions, urgent action to prevent contact, reduce deforestation, and enhance biodiversity protection should take place, as well as improvements in healthcare access. Human populations that are more vulnerable to risks could be targets for equitable distribution of promising solutions, such as pan-coronavirus vaccines 46 .
Our findings are a snapshot of macroscale spatial trends that can be used for prioritising more detailed analysis depending on the context and policy priorities. The United Nations Development Programme (UNDP) recommends the creation of 'Maps of Hope' for maintaining essential life support areas 47 , but the relationship between biodiversity loss, fragmentation, and zoonotic disease is seldom considered in the designation of such areas. We advocate for a One Health approach in which the risks of pathogen emergence are explicitly integrated into initiatives addressing habitat management, restoration and protection 47 , and have demonstrated that this risk can be mapped at large scales with insights into variability in the distribution of key drivers 48 .
We acknowledge the complexity of pathogen responses to land use modification 49  interactions that is difficult to account for. Ecological analyses at finer spatial and temporal scales than used here can elucidate cascading events that result in zoonotic spillover. For example, Hendra virus spillover from bats to horses in Australia seems to be driven by interactions between climatic change altering the flowering phenology of important nectar sources, exacerbating food shortages resulting from native habitat loss and degradation, and nutritional stress in bats that may increase Hendra virus shedding. Native resource declines have concurrently promoted urbanization of many bat populations, increasing the human-bat interface and potential for spillover events to horses, which can act as intermediary hosts, or even potentially direct to humans 52 . Our analyses may capture the macroscale processes, but not these local events.
Similarly, while knowing that the top-priority traded mammals 53  in outbreak response such as conflict 56 and other societal challenges associated with health and the environment might also be considered in local contexts.
The use of remote sensing layers can bring insights for land use planning when considering complex processes such as disease emergence. This process may benefit not only the understanding of risks but also local actions informed by broad patterns 6 . Recent models suggest that the implementation of smaller-scale land-use planning strategies guided by macro-scale patterns may help to reduce the overall burden from emerging infectious diseases 57 , while also taking into account biodiversity conservation. This could be evaluated from multiple perspectives, including in the context of other planetary boundaries and how zoonotic disease risk inserts within it 58 , considering we have already passed the 1-degree warmer planet threshold 59 .
In conclusion, we found evidence that hotspots of multiple conditions that contribute to risk of emerging Sarbecovirus are commonly transboundary, but areas scoring the highest values of risk occurred in China and Indonesia. We also identify, for each transmission scenario, that most high risk areas are not far from healthcare, but in all scenarios there are several that are far away from healthcare, especially Scenario 3 that includes wild mammals as secondary hosts. This work contributes to strengthening evidence of spatial clusters of multiple risk factors for disease emergence. We

Materials and Methods
We use South, East and Southeast Asia (including western New Guinea) as our study region, where most Sarbecovirus hosts are concentrated 5,14 and where many unknown sarbecoviruses are estimated to exist 26 . We define our study region as the terrestrial area of the following countries: Bangladesh, Bhutan, Brunei, Cambodia, China, India, Indonesia, Lao PDR, Malaysia, Myanmar, Nepal, Philippines, Singapore, Sri Lanka, Thailand, Timor-East, and Vietnam.

Characterization of univariate risk indicator hotspots
We identified spatial clusters of components of risk. We assume our inferred risk arises not from individual factors having outstanding high values (hotspots), but instead it arises when they are combined, facilitating conditions for viral spillover. In that sense our inference of risk is an emergent property of the system (Emergent risk). We adapted a broad-scale risk estimation framework (https://mcr2030.undrr.org/quick-risk-estimation-tool) focusing on the potential for sarbecoviruses to emerge. The broad risk factors were five landscape-level conditions and five biological layers, according to four scenarios (see data sources in Table S2). The analysis is naive about the influence of individual drivers on the risk of spillover in the sense that all factors were weighted equally in our scenario evaluations. We selected the following factors for land use change and landscape conditions: Intensity of 1) built-up land, 2) mining and energy, 3) agricultural and harvest land, 4) forest quality, and 5) local forest loss risk. As a measure of human or animal exposure, we used wildlife (known bat hosts and all other wild mammals), livestock (pigs and cattle), and human populations. We use the average of known sarbecovirus bat hosts as our primary host layer 5,14,63,64 . In all scenarios we also include human population counts because this increases the risk of contact and therefore transmission 65 . We included built-up area, energy and mining and agriculture and crop harvest landscape changes in all scenarios because these have been linked to increased coronavirus shedding in bats putatively due to stress and human contact rates 26,66 . We similarly used forest quality and risk of cover loss SARS-CoV-2 has also been detected in wildlife (spillback events) 55,67 .
To avoid collinearity, we only selected variables with product-moment correlation coefficient (r) values < ±0.7 ( Figure S1). The study region was divided into a spatial grid composed of 0.25 decimal degrees-sized tiles (~27 km

Hotspot convergence in clusters
We evaluated the spatial clustering among hotspots including all ten selected drivers (Scenario 4, five landscape descriptors, five potential host components). We opted for doing a single cluster analysis because we cannot weigh the importance of the single variables for influencing an ultimate spillover event. The variables comprised here describe landscape condition, human population, cattle, pig, bat hosts and all other wild mammals. We assume areas that contain most hotspots or that are on the verge of becoming hotpots (intermediate areas) for the components evaluated are at higher risk of emerging new sarbecoviruses. A multivariate spatial cluster analysis was applied to Getis-ord G*i values for every variable after the univariate hotspot analysis using rgeoda 0.0.9 76 . We used the multivariate skater (Spatial `K'luster Analysis by Tree Edge Removal) hierarchical partitioning algorithm 77 to infer contiguous clusters of similar values in the region based on the optimal pruning of a minimum spanning tree. Spatial clusters represent emergent, cohesive risk combinations distributed in space. Contiguity was assessed by a queen weights matrix after transforming pixels to geographical coordinates. Distance functions were set to euclidean. We evaluated the k number of clusters from 1 to 40. To find the optimal number of clusters, we evaluated the total within-cluster sum of squares variation, visually inspecting the point of inflection in the curve towards stabilization.
As the reduction in increment was very smooth, we present the number of clusters for skater informed by the max-p algorithm. We used max-p to find the solution for the optimal number of spatiallydefined clusters setting as a bounding variable (a variable that allows for a minimum value summed for each cluster) the human population amounts at 5% and 10%. The algorithm was computed at 99 interactions with 123456789 as a random seed.
To interpret variation of hotspots between clusters, we counted the number of variables for which the median of the distribution is a hotspot (i.e. falling within the hotspot interval at 95% Gi*). We then discuss the clusters based on the number of drivers that are already hotspots and the median values that fall in intermediate zones, so closer to becoming hotspots, and coldspots for each cluster. Finally, to understand the overall variation (and among clusters) we provide a principal component analysis (PCA) biplot through Scenario 4 to discuss major axes of variation between optimal number of clusters. We ran the hotspot analyses with cattle-only and with the summed values for Bovidae livestock (presented in the Supplemental Material). All geographical coordinates were warped to World Mercator (EPSG: 3395) and World Geodetic System 1984 datum before spatial analysis.

Emergent risk and its relationship with access to healthcare
After identifying the hotspots within the scenarios, we match their proximity to detection by matching the emergent risk score (i.e. number of hotspots) for every pixel with the level of motorized access to healthcare (hospitals and clinics). Access to healthcare measured as travel time was considered as both a proxy for connectivity and an indicator of the likelihood of detection, following infection spillover and spread. We built bivariate maps and three-by-three quantile (N=9) combinations considering the intensity of hotspots from their overlay and the values for access to healthcare, all rescaled from zero to one. We compared travel time in high-risk areas among scenarios using Wilcoxon's test. All analyses were done in QGIS 3.10.7 78 , R 4.1.3 79 and bash 80 . Code for the analyses can be found at https://github.com/renatamuy/hotspots/.