Quantifying transmission of emerging zoonoses: Using mathematical models to maximize the value of surveillance data

Understanding and quantifying the transmission of zoonotic pathogens is essential for directing public health responses, especially for pathogens capable of transmission between humans. However, determining a pathogen’s transmission dynamics is complicated by challenges often encountered in zoonotic disease surveillance, including unobserved sources of transmission (both human and zoonotic), limited spatial information, and unknown scope of surveillance. In this work, we present a model-based inference method that addresses these challenges for subcritical zoonotic pathogens using a spatial model with two levels of mixing. After demonstrating the robustness of the method using simulation studies, we apply the new method to a dataset of human monkeypox cases detected during an active surveillance program from 1982-1986 in the Democratic Republic of the Congo (DRC). Our results provide estimates of the reproductive number and spillover rate of monkeypox during this surveillance period and suggest that most human-to-human transmission events occur over distances of 30km or less. Taking advantage of contact-tracing data available for a subset of monkeypox cases, we find that around 80% of contact-traced links could be correctly recovered from transmission trees inferred using only date and location. Our results highlight the importance of identifying the appropriate spatial scale of transmission, and show how even imperfect spatiotemporal data can be incorporated into models to obtain reliable estimates of human-to-human transmission patterns. Author Summary Surveillance datasets are often the only sources of information about the ecology and epidemiology of zoonotic infectious diseases. Methods that can extract as much information as possible from these datasets therefore provide a key advantage for informing our understanding of the disease dynamics and improving our ability to choose the optimal intervention strategy. We developed and tested a likelihood-based inference method based on a mechanistic model of the spillover and human-to-human transmission processes. We first used simulated datasets to explore which information about the disease dynamics of a subcritical zoonotic pathogen could be successfully extracted from a line-list surveillance dataset with non-localized spatial information and unknown geographic coverage. We then applied the method to a dataset of human monkeypox cases detected during an active surveillance program in the Democratic Republic of the Congo between 1982 and 1986 to obtain estimates of the reproductive number, spillover rate, and spatial dispersal of monkeypox in humans.


Overview of the approach 214
We first validated the inference framework using a simulation study, then applied the 215 Across 125 simulations (25 simulations for each of five parameter sets), estimated values 245 clustered around the true parameter values. The true value for R was included in the 95% 246 credible interval (CI) 119 times (95.2%) and for λ z was included 121 times (96.8%) (Fig 3A). On 247 average, the posterior mean estimate of R differed from the true value by 8.6%; the analogous 248 percent errors for λ z and σ estimates were 6.3% and 7.0%, respectively (S1 Table). Inferences were performed A) when the true number of localities under surveillance was known, 254 B) when the true number was unknown and it was assumed that the number of observed 255 localities was the total number of localities, and C) when the true number of localities was 256 unknown and the corrected-denominator method was used to control for the locality observation 257 process. 258 259 However, this method assumes that the true number of localities under surveillance is 260 known. In real-world situations, 'silent' localities that experience zero cases often do not appear 261 in the dataset, resulting in an unknown true number of surveilled localities. We investigated 262 possible biases in parameter estimates that could arise from assuming that the number of 263 localities that reported one or more cases represents the total number of localities under 264 surveillance. To do so, we used the same set of simulated datasets as described above, but 265 removed knowledge about the number of silent localities. In these datasets, silent localities make 266 up between 21% and 85% of all localities under surveillance, with the proportion driven 267 primarily by the spillover rate. Estimates for the reproductive number R were not strongly 268 impacted (95.2% of the 95% CIs contained the true value with an average percent error of 8.4%), 269 but the spillover rate λ z was consistently overestimated (Fig 3B). The true value for λ z was 270 contained in none of the simulations' 95% CIs and the posterior mean had an average percent 271 error of 153% (S1 Table). 272 To further investigate the effect of this data truncation (whereby localities with zero cases 273 do not appear in the dataset), we performed inference assuming that the observed localities 274 represented all, 1/2, or 1/5 of the total localities under surveillance. While this assumption had a 275 relatively small impact on the estimated R, it greatly impacted the inferred λ z (which is measured 276 as the number of spillover events per locality per day and is therefore strongly affected by 277 We tested the performance of the corrected-denominator method against simulated 288 datasets, looking at the same parameter sets as in the first section. The inferred parameter values 289 cluster well with their corresponding true values ( Fig 3C): mean percent error in R estimates was 290 8.4% and in λ z estimates was 14.0%. Across the 125 simulations, the true parameter value was 291 included in the 95% CI 116 times (92.8%) for R and 117 times (93.6%) for λ z (S1 Table). 292 Because an estimate of the true number of localities under surveillance would help 293 determine the size of the population that could be detected for a given system, we assessed how 294 well we could approximate this value. Given the number of localities with one or more cases and 295 the mean parameter estimates, it is possible to calculate the expected total number of localities 296 under surveillance (see S1 Text). Estimates of the true number of localities calculated for the 297 simulated datasets center on the correct value (S2 Fig). The magnitude of estimate error is driven 298 by the spillover rate, which largely determines the proportion of localities that are observed by 299 surveillance. The mean percent error across simulations with spillover rate of 0.0001, 0.00036, 300 and 0.0007 were 25.4%, 7.9%, and 2.4%, respectively, while simulations with spillover rates of 301 0.004 and above almost always recorded at least one case in each locality during the five year 302 surveillance period and therefore tended to estimate the exact true number of localities. 303 Inferring the sources of transmission events. We investigated how well sampled transmission 304 trees recovered the source of individual cases as well as higher-order measures, such as the 305 fraction of cases originating from zoonotic, within-locality, and between-locality transmission. 306 We tested our method using 125 simulated datasets, with 25 datasets simulated for each of five 307 sets of true parameter values (these are the same datasets as discussed above, simulated with R 308 between 0.2 and 0.6 and spillover rate between 0.0001 and 0.0007). Two hundred plausible 309 transmission trees were sampled for each simulated dataset. 310 When comparing the overall fraction of cases attributed to each source type (zoonotic 311 versus within-locality versus between-locality transmission), the sampled transmission trees 312 closely match the true transmission patterns (Fig 4). On average, the difference between the true 313 fraction of cases caused by zoonotic spillover and the fraction inferred in a tree was 0.022 314 (standard deviation 0.018), the difference for within-locality transmission was 0.006 (standard 315 deviation 0.005), and the difference for between-locality transmission was 0.022 (standard 316 deviation 0.018). values. If the fraction of transmissions for each source is perfectly inferred, points will lie exactly 324 on the transition between bar colors. B. Box plots summarize the error in the inferred fraction of 325 cases originating from each source type. The error size is small across all parameter sets, 326 especially for within-locality human-to-human transmission. The upper whisker was calculated 327 as min(max(x), Q 3 +1.5*IQR) and the lower whisker was calculated as max(min(x),Q 1 -1.5*IQR). 328

329
The success at recovering individual transmission links was high overall but varied 330 slightly depending on the true parameters underlying the simulation (S3 Fig). On average, 331 sampled transmission trees inferred 85.9% of all sources correctly. Better performance was 332 observed for lower spillover rates and lower R, presumably due to the fewer opportunities for 333 misattribution of cases. Some transmission links were more likely to be captured than others: on 334 average 90.9% and 90.1% of sampled trees correctly inferred links with zoonotic and within-335 locality sources, respectively, but only 36.8% of trees correctly identified the source of between-336 locality transmission events. 337

Epidemiological insights into monkeypox 338
Applying the corrected-denominator method to 1980s monkeypox surveillance data. 339 Between 1982 and 1986, the active monkeypox surveillance program in the Democratic 340 Republic of the Congo detected 331 human cases in 171 localities [43]. For each human case, we 341 know the name of the locality as well as the district or administrative subregion (henceforth 342 referred to simply as 'district') and region to which it belongs. However, the total number of 343 localities that would have been detected by surveillance had they experienced a case is unknown. 344 We therefore used the corrected-denominator method to generate estimates under four different 345 assumptions about which administrative unit most suitably represents the broader contact zone. 346 The country-level, region-level, and district-level models correspond to progressively smaller 347 choices of broader contact zones, while the locality-level model assumes that all instances of 348 human-to-human transmission occur within a locality. We anticipate that assuming an 349 excessively large broader contact zone could result in overestimating R and underestimating λ z if 350 too many spurious human-to-human transmission events are inferred from pairs of cases that just 351 happen to occur within a generation-time interval of one another, while assuming an 352 inappropriately small broader contact zone could result in the opposite parameter biases if the 353 model is unable to detect actual incidents of human-to-human transmission because the cases In the monkeypox analysis, the size of the administrative unit used as the broader contact 356 zone has a strong effect on the resulting parameter estimates (Fig 5A). When larger 357 administrative units are assumed to represent the broader contact zone, a given pair of cases is 358 more likely to belong to the same broader contact zone, giving the model more opportunities to 359 infer inter-locality human-to-human transmission events and resulting in larger estimated 360 reproductive number R and a smaller spillover rate λ z . Mean values of the posterior distribution 361 of R range from 0.29 when transmission is assumed to occur only within localities to 0.52 when 362 transmission is assumed to occur among all localities in the country (Table 1)   Both the choice of broader contact zone and the assumed total number of localities have a 407 large impact on estimates of R and λ z ( Fig 5B). As noted above, models assuming smaller 408 broader contact zones allow fewer opportunities for human-to-human transmissions to be 409 inferred, and these models estimate substantially lower R values and correspondingly higher 410 spillover rates. In contrast, assuming that a smaller fraction of surveilled localities were observed corrected-denominator method, the district-level model had the best DIC score, followed by the 431 region and country-level models ( Table 1). The locality-level model received a much larger DIC 432 value, indicating that the data strongly support models that allow transmission between localities.
Similarly, for each of the three assumptions about the true number of surveilled localities, the 434 district-scale model performed best in DIC model comparisons (Table 1). 435 Inferring the sources and distances of transmission events. We used the district-level 436 corrected-denominator method to sample 20,000 transmission trees for the monkeypox dataset. 437 The sampled transmission trees attributed an average of 60.8% (standard deviation of 2.2%) of 438 cases to zoonotic spillover, 28.5% (standard deviation of 0.9%) of cases to within-locality 439 human-to-human transmission, and 10.7% (standard deviation of 2.1%) of cases to between-440 locality human-to-human transmission. . The highest success was seen for pairs of epidemiologically-linked cases whose dates of 475 symptom onset were between 7 and 25 days apart ( Fig 7B). Although it is generally believed that 476 the generation interval for human-to-human transmission of monkeypox is between 7 and 23 477 days [43,53], several case pairs that occurred more than 23 days apart were epidemiologically 478 linked through contact-tracing. It is possible that these links, which were often missed in the 479 sampled transmission trees, are not true instances of human-to-human transmission. Cases that 480 occurred in different localities were also less likely to be linked in a sampled transmission tree, 481 though even for these inter-locality pairs, the district-level model tended to perform better than 482 the other three models (S6 Fig Comparison of the transmission tree generated using only contact-tracing data with the 500 trees created using the district-level and locality-level models highlights how much our 501 perception of the transmission dynamics depends on assumptions about spatial spread (Fig 8). 502 Most of the within-locality transmission links detected through epidemiological contact-tracing 503 appear in the locality-level model's tree, though the locality-level tree suggests substantially 504 more human-to-human transmission events than captured in the contact-tracing tree. However, 505 the locality-level tree misses all inter-locality links. The district-level model's tree captures most 506 of the links indicated by the locality-level tree, and also suggests that inter-locality transmission 507 is occurring, though it has low power to determine exactly which case pairs are linked through 508 inter-locality transmission. indicates how frequently the case was inferred to have a zoonotic source, so transmission links 516 often go from black points (cases caused by zoonotic spillover) to white points (cases infected by 517 a human source). 518

Sensitivity analyses 520
We conducted a variety of sensitivity analysis tests using simulated datasets to assess 521 how robust the method was over a range of parameter values and assumption violations (full 522 descriptions are provided in S1 Text). The method continued to perform well even at very high 523 spillover rates (S7 Fig, S2 Table) and when the offspring distribution used in simulations 524 differed from the one assumed in the inference (S8 Fig, S3 Table). In some situations, assuming 525 a larger broader contact zone than the one used for simulations could lead to an overestimation of 526 R and an underestimation of λ z (S4 and S5 Tables). This outcome is consistent with what was 527 observed in the monkeypox analysis where assuming a larger spatial scale for the broader contact 528 zone corresponded to a higher estimate of R and a smaller estimate of the spillover rate (Fig 5). 529 When simulations were run with highly structured, non-homogeneous spillover, substantial 530 biases in the inference results occurred because this spillover process gives rise to clusters of 531 primary cases that the model mistakes as arising from human-to-human transmission (S9 Fig). 532

Principal findings 534
In this work, we developed and tested a method to infer fundamental epidemiological 535 parameters and transmission patterns for zoonotic pathogens from epidemiological surveillance 536 data with aggregated spatial information. When tested against simulated datasets, the method 537 successfully recovered estimates of R and spillover rate close to the true values and also inferred 538 the fraction of cases resulting from zoonotic, within-locality, and between-locality sources with a 539 high degree of accuracy. The 'unknown denominator problem' that occurs when the total number of localities under surveillance is unknown can cause large biases in parameter estimates, so we 541 modified the inference method to account for this observational process and enable unbiased 542 estimation in the presence of this common data gap. 543 We applied the method to a rich surveillance dataset of human monkeypox in the Congo 544 basin from the 1980s and found that human-to-human transmission of monkeypox between 545 localities plays an important role in the pathogen's spread. Of the four assumptions we tested for 546 the spatial scale of the broader contact zone, the district-level model was best supported by DIC 547 model comparisons and validation with contact-tracing. In addition, the signal of elevated inter-548 locality transmission occurring over ≤ 30 kilometers suggests that most inter-locality 549 transmissions occur in a relatively small neighborhood, consistent with the limited transportation 550 infrastructure in the DRC. This further corroborates that the district-level model, which is the 551 smallest spatial aggregation scale that still permits inter-locality transmission, is likely the most 552 appropriate choice for capturing inter-locality transmission patterns of human monkeypox. 553 The district-level model estimates a reproductive number for human monkeypox of 0.38 554 (0.31-0.45 95% CI). This value is slightly higher than previous estimates of R for the 1980s DRC 555 monkeypox dataset, which was estimated as 0. formation methods were liable to miss transmissions that occurred between localities. Indeed, the 560 estimate obtained using the locality-level model (R = 0.29) closely matches previous estimates. It 561 is also possible that the district-level model may overestimate the amount of human-to-human 562 transmission in the same way that the region-and country-level models picked up a higher signal 563 of human-to-human transmission than the district-level model due to their larger broader contact 564 zone sizes. The size of the DRC's districts and administrative subregions used for the district-565 level model vary in size, but average around fifteen thousand square kilometers, or around one 566 hundred forty kilometers across, encompassing a much greater distance than most human-to-567 human transmission events likely occur over. We therefore expect that the true value of R is 568 bounded by the estimates of the locality-level and the district-level models. 569 In addition to providing an estimate of monkeypox's reproductive number, the methods 570 give insight into the frequency of spillover and the spatial scale of human-to-human 571 transmission. The district-level model estimates a mean spillover rate of around 0.11 spillover 572 events per locality per year, which corresponds to roughly one spillover event every nine years in 573 each locality. It also estimated that around 70% of human-to-human transmissions occur within a 574 locality. This finding contrasts with the assumption that human-to-human transmission occurs 575 within a locality, which is commonly used to generate transmission clusters, and suggests that 576 estimates generated using that assumption may substantially underestimate the amount of 577 human-to-human transmission occurring in the system. The importance of inter-locality contacts 578 has been reported for the neighboring country of Uganda The potential biases introduced when analyzing data reported at a course spatial scale 589 have been explored in a wide range of contexts [56][57][58], yet the implications of using this type of 590 spatial information to infer the transmission dynamics of an infectious disease is not obvious. 591 When spatial information is only reported at the level of large spatial zones like districts, regions, 592 or countries, no finer-scale information is available to inform which human cases transmitted 593 infection to one another between different localities. Here we explored how the size of these 594 spatial zones would affect inference for the monkeypox system by repeating the analysis using 595 spatial information at the district, region, or country resolution. The large differences in 596 parameter estimates generated under different broader contact zone assumptions in the 597 monkeypox analysis illustrates how sensitive inference results can be to the spatial scale 598 assumed for human-to-human transmission, and suggests that reporting spatial data at too large a 599 scale or ignoring inter-locality transmissions can lead to substantial estimate biases. 600 In the context of monkeypox in the DRC, analysis of simulations using the exact 601 geographic coordinates reported for 80% of localities in the monkeypox surveillance dataset 602 replicated the increasing estimates of R and decreasing estimates of spillover rate as the spatial 603 aggregation scale increased (S4 and S5 Tables). However, the magnitude of the effect in 604 simulated datasets was smaller than in the monkeypox analysis. This could be a result of the 605 particular assumptions about inter-locality transmission patterns used in the simulations, but it 606 also opens the question of whether outside large-scale factors such as seasonality or fluctuations in surveillance effort might induce temporal autocorrelation among unlinked human cases, 608 giving rise to temporal clustering of cases that the model interprets as human-to-human 609 transmission. 610 This analysis serves to emphasize the importance of selecting an appropriate spatial scale 611 and using caution when interpreting results obtained using spatially aggregated data. Many 612 methods implicitly assume a certain scale of spatial transmission, often ignoring the possibility 613 of longer-range transmissions, so careful consideration of whether that scale is appropriate for 614 the system is essential. 615 In general, recording precise spatial locations of cases is vital for increasing the 616 inferential power of modeling analyses. Developing methods that maintain spatial information 617 without risking a breach in confidentiality is a nontrivial challenge, but progress has already been 618 made in generating possible solutions such as geographic masking or the verified neighbor 619 approach [59,60]. 620

Model assumptions and future directions 621
In this work, we assumed that the spillover rate was homogenous through time and space, 622 but more complex disease dynamics in the reservoir or spatiotemporal heterogeneity in animal-623 human contacts may cause nontrivial deviations from this assumption in real-world systems. Of 624 particular concern is the possibility that outbreaks in the reservoir could cause periods of 625 amplified local spillover, which could create a clustering pattern of human cases potentially 626 indistinguishable from human-to-human transmission. Without information about disease 627 dynamics in the reservoir, accounting for this heterogeneous spillover will be challenging, but This work assumes that R is constant across all localities; however, to obtain a full picture 635 of pathogen emergence risk, it may be necessary to consider the heterogeneity in transmission 636 intensity among different human populations, as well as the interplay between where R is highest 637 versus where spillover tends to occur [61]. In some zoonotic systems, for instance, spillover 638 predominantly occurs into remote villages and towns that are in close proximity to forested 639 regions. However, we generally expect these villages to have lower levels of human-to-human 640 transmission than the more densely-packed cities [62-64]. A pathogen may even be incapable of 641 supercritical spread until it reaches such a city. Therefore, to assess the probability a pathogen 642 will successfully emerge and to determine which populations to target with control measures, it 643 may be necessary to establish not only the spillover rate and R across different populations, but 644 also the rate of dispersal of the pathogen between those populations [61]. 645 Several assumptions may need to be modified when applying this method to other 646 zoonotic systems. Because we assume that the source of human-to-human transmission events 647 will show symptoms before the recipient, the likelihood function can treat human cases as 648 occurring independently conditional on preceding cases. For zoonotic diseases in which infected cases contribute non-negligibly to transmission), the likelihood expression would need to be 651 modified substantially, and the lack of independence between cases might make a simulation-652 based inference approach necessary. 653 We assume that sufficiently few infections occur relative to the population size that 654 depletion of susceptible individuals does not affect transmission dynamics. While appropriate 655 when there are few human infections or in the early stages of invasion, this assumption could 656 bias estimates if applied in a system with sufficiently high levels of human infection or where 657 transmission occurs primarily among highly clustered contacts, such as individuals within a 658 household. We also note that in the monkeypox example we are estimating the effective 659 reproductive number, which takes into account existing population immunity. If the goal instead 660 were to establish the basic reproductive number (the reproductive number for the pathogen in a 661 fully susceptible human population), accounting for past exposure to the pathogen or other cross-662 immunizing pathogens or vaccines would be necessary. 663 The current methods assume that all human cases that occur during the surveillance 664 period inside the surveillance area are observed. This assumption is plausible for the analysis of 665 the 1980s monkeypox dataset, given the unusually high resources and experience level of this 666 surveillance effort in the aftermath of the smallpox eradication program and the use of serology 667 to detect missed cases retrospectively [43]. However, most zoonotic surveillance systems operate 668 with limited resources and have a much lower detection rate. Ignoring unobserved cases will lead 669 to underestimation of the spillover rate, while the effect on estimation of R will depend on the a detected case triggers a retrospective investigation that detects all cases in that transmission 674 chain. 675

Conclusions 676
This work expands our ability to assess and quantify important zoonotic pathogen traits 677 from commonly available epidemiological surveillance data, even in the absence of exact spatial 678 information or a complete count of localities under surveillance. We anticipate that these 679 methods will have greatest value in the common circumstance when the source of cases, 680 particularly whether a case came from an animal or human source, cannot be readily established. 681 In such situations, the ability to infer the pathogen's reproductive number, spillover rate, and 682 spatial spread patterns from available surveillance data, will greatly enhance our understanding 683 of the pathogen's behavior and could provide valuable insights to help guide surveillance design 684 and outbreak response. 685

Model 687
In broad terms, the model describes the probability of observing a set of symptom onset 688 times and locations of human cases given the timing and location of previous cases and 689 parameters that underlie the transmission process. Human infections can arise from either 690 animal-to-human transmission ('zoonotic spillover') or human-to-human transmission (Fig 1B).

All sources of infection are assumed to generate new cases independently of one another. 694
The number of human cases that become symptomatic on each day in each locality caused by 695 zoonotic spillover is assumed to follow a Poisson distribution with mean λ z . For simplicity and 696 because reservoir disease dynamics are rarely well characterized, we assume the Poisson process 697 is homogenous through time and across localities, but this assumption could be modified for a 698 system where more information is available about the reservoir dynamics (e.g., [34]). New The factor that describes the amount of transmission that occurs between localities v and 723 w (H(v,w)) could reflect Euclidean distance, travel time, inclusion in different spatial zones, or 724 any other available measurement. To accommodate the imperfect spatial information available 725 for many zoonotic surveillance systems, this study focused on developing methods for the 726 situation when only a locality name and an aggregated spatial zone (such as district or country) is 727 reported for cases, rather than an exact position. We assume that inter-locality transmission 728 occurs only among localities within the same broader contact zone (Fig 1A). Because 729 transmission will be greater within a locality than between localities, a proportion σ of secondary 730 cases are assumed to occur in the same locality as the source case and a proportion (1-σ) of 731 secondary cases are assumed to occur amongst the outside localities that are within the same 732 broader contact zone as the source case. This outside transmission is assumed to be divided 733 equally among all localities within the index case's broader contact zone: 734 where Z v indicates the broader contact zone of locality v and v is the total number of localities 736 in the broader contact zone of locality v. For a given locality v, the sum of H(v,w) across all w 737 equals one. To observe the effect of assuming different broader contact zones, the monkeypox 738 case study was repeated under four different assumptions about the spatial scale of human-to-739 human transmission: locality, district, region, and country-level. 740

Model inference 741
Likelihood function. Using the model described above, a likelihood function was used to 742 evaluate a parameter set (θ = {R, λ z , σ}) given the data (D = N t,v cases observed on each day t and 743 locality v): 744 where T is the number of days surveillance was conducted and V is the total number of localities 746 under surveillance. 747 While this approach works well when the total number of surveilled localities is known 748 (see Fig 3A), localities often only appear in the dataset if they have reported cases; as a result we 749 may not know the total number of localities under surveillance. Ignoring localities with zero 750 cases can lead to biased parameter estimates (see Fig 3B). We explored several alternative 751 approaches to account for these silent localities; the preferred approach rescales the likelihood 752 function to reflect that localities with zero cases are not included in the data. Several zoonotic spillover. For a case i observed to occur on day t in locality v, the probability that case j 791 was the source of case i (P ij ) was taken to be the proportion of µ t,v (the expected total number of 792 cases on that day and locality; defined in equation 2) contributed by case j. By sampling d 2 793 transmission trees from each of these transmission-probability matrices, we calculated the 794 proportion of cases that resulted from spillover, within-locality transmission, and between-795 locality transmission in each sampled tree. When testing the method using 125 simulated 796 datasets, 200 sampled transmission trees were generated for each dataset, with d 1 =20 and d 2 797 =10. For the monkeypox dataset, 20,000 transmission trees were generated with d 1 =200 and d 2 798

=100. 799
For inferred inter-locality human-to-human transmission events in the monkeypox 800 dataset, if the GPS coordinates were known for both localities in a transmission pair, the 801 transmission distance was calculated using the gdist function in the R package Imap [71]. The 802 'null distribution,' used for comparing the number of inferred inter-locality transmission events 803 with the number expected to occur if spatial location played no role in transmission, was 804 calculated by pooling all cases for which locality GPS coordinates are known, sampling all inter-805 locality pairs permitted by the model, and recording the distance between the localities in each 806 pair. 807

Simulation of test datasets 808
To test the effectiveness of the methods, datasets with known parameter values were 809 simulated using the model explained above. Simulations were run over 1825 days 810 (approximately 5 years) and 325 surveilled localities. The localities were assumed to be 811 partitioned across thirty districts and six regions, with the distribution of localities across districts 812 and regions similar to that observed for the monkeypox dataset. The generation time interval (the 813 number of days between symptom onset of the source and recipient cases) was assumed to 814 follow a negative binomial distribution with a mean of 16 days and a dispersion parameter of 815 728.7 (as described above), with a maximum generation time interval of 40 days. A number of 816 parameter sets, as well as different underlying model structures, were used for simulations (S6 817   Table). Simulation parameters were chosen to approximate the monkeypox dataset, with σ set at 818 0.75, R ranging from 0.2 to 0.6, and λ z ranging from 0.0001 to 0.1. Unless otherwise specified, 819 simulations were performed assuming the district-level model. Details on the models used for 820 sensitivity analyses that use the exact spatial location of cases or allow highly structured and (4) 1213 It is now necessary to calculate the probability a surveilled locality experiences one or 1214 more cases. This probability is equivalent to one minus the probability of no cases occurring at a 1215 locality during the surveillance period. The following section explains how the probability of 1216 zero cases occurring at a given locality (here denoted p) is calculated. 1217 For zero cases to occur in a locality, there must be no zoonotic spillover into that locality 1218 as well as no human-to-human transmission from an outside locality. The zoonotic component is 1219 relatively straightforward to calculate, as it is simply the probability of zero spillover events on 1220 each of the T days (which equals ). The probability of no transmission from an outside 1221 human source is a bit more complicated and can be broken down by the generation of the outside 1222 case to avoid double-counting. The generation of a case indicates how many human-to-human 1223 transmission events occurred leading to the case. We refer to cases resulting from zoonotic 1224 spillover as primary cases. Individuals infected by primary cases are second generation cases, 1225 individuals infected by second generation cases are third generation cases, etc. For there to be no 1226 cases in a locality, no transmission may have occurred into that locality from outside cases in any 1227 generation: The number of cases caused by a given case (of any generation) in the target locality is 1229 described by a Poisson distribution with expected value equal to This expression still includes the parameter , though fortunately the sensitivity of results to the 1260 value of this parameter is relatively low. We therefore approximate using the expected 1261 number of localities under surveillance in the broader contact zone. This calculation is explained 1262 in the following section. 1263 Estimating total number of localities under surveillance 1264 We wish to use the estimated parameter values for R, λ z , and σ in conjunction with the 1265 number of observed localities in a broader contact zone (W w ) to estimate the total number of 1266 localities under surveillance in that broader contact zone (V w ). If we let q be the probability a 1267 locality is observed (has one or more cases during the surveillance period), then we expect V w *q 1268 ≈ W w . From the section above, we approximate q = 1-p as: 1269 So we estimate V w as the value that satisfies the equation: 1271 is equal between all locality pairs, we expect that the actual amount of shared transmission 1276 between two localities is strongly influenced by the distance between those localities. We 1277 conducted two simulations using localities with set geographic locations and inter-locality transmissions depending on the spatial relationship of the localities. We took the 178 GPS 1279 records available from monkeypox surveillance in the DRC during the 1980s and simulated 1280 transmission across localities with the same coordinates and the same district and region 1281 boundaries. Two types of inter-locality transmission rules were explored. In the first of these, 1282 inter-locality transmissions were assumed to occur equally into a source locality's five closest 1283 neighbors. In the second set of simulations, inter-locality transmissions from a source locality 1284 were assumed to occur equally among all outside localities within 30 km of the source locality. 1285

Simulations with highly structured and non-homogeneous spillover patterns 1286
To illustrate how highly structured and non-homogeneous spillover could bias parameter 1287 estimates, we simulated an extreme case of a zoonotic epidemic traveling through time and 1288 space. We imagined that disease dynamics in the reservoir would occur in a single location for 1289 25 days before moving to a new spot, in an extreme form of a traveling zoonotic epidemic. For 1290 each 25 day period, three localities (selected to be in the same district when possible) would be 1291 selected to experience all of the spillover in the entire system. Aside from this extreme spillover 1292 pattern, the simulation followed the district-level model. 1293 estimates for monkeypox from [1]) and examined how well our inference method, which 1322 assumes a Poisson offspring distribution, estimated the true parameter values. Estimates for these 1323 datasets were only marginally less accurate than estimates for datasets generated with a Poisson 1324 offspring distribution (with an average percent error of 10.9% as opposed to 8.2% for R and of 1325 11.6% as opposed to 10.4% for spillover rate estimates) (S8 Fig, S3 Table). As such, there are 1326 unlikely to be strong biases introduced from a mis-specified offspring distribution for the 1327 monkeypox dataset, though this bias could increase if applied to pathogens with more extreme 1328 transmission variance. 1329

Sensitivity of parameter inference to broader contact zone assumption 1330
To examine how assuming different broader contact zones would affect inference results, 1331 we compared parameter estimates obtained under three choices of broader contact zones for data 1332 simulated under two inter-locality transmission rules. We simulated disease spread in a system 1333 where localities were placed in the same arrangement as seen in 178 localities with GPS 1334 coordinates included in the monkeypox surveillance system, district and region arrangement 1335 were the same as in the 1980s surveillance, and human-to-human transmission could occur either 1336 between a locality and its five closest neighbors or between localities located within 30 km of 1337 one another. Inference results again showed increasing estimates of R and decreasing estimates S1 The expected number of new infections that become symptomatic on day t in locality v caused by an infectious individual who became symptomatic on day s in locality w