## Abstract

Ebola is a viral haemorrhagic fever with high mortality that has caused a number of severe outbreaks in Central and West Africa, the largest of which was in 2014-16 and resulted in 11,325 deaths. Although most previous outbreaks have been relatively small in comparison, the result of managing outbreaks places huge strains on already limited resources. Mathematical models matched to early case reporting data can be used to identify outbreaks that are at high risk of spreading. Here we consider the Ebola outbreak in Equateur Province in the Democratic Republic of the Congo, which was declared on 8 May 2018. We use a simple stochastic metapopulation model to capture the dynamics in the three affected health zones (Bikoro, Iboko and Wangata) and the capital city Kinshasa. We are able to rapidly simulative a large number of realisations and use a likelihood-free method to determine parameters by matching between reported and simulated cases. This likelihood-free matching has a number of advantages over more traditional likelihood-based methods as it is less sensitive to errors in the data and is a natural extension to the prediction framework. Using data from 8 to 24 May 2018 we are able to capture the exponential increases in the number of cases in three locations (Bikoro, Iboko and Wangata) and the probability of transmission to capital city Kinshasa, although our estimated basic reproductive ratio of 4.02 is higher than for previous outbreaks. Using data until 28 June 2018 we are able to infer the decrease in transmission due to public-health intervention measures, such that the reproductive ratio is predicted to drop below one around 16 May 2018 leading to decreasing numbers of cases. We believe this method of fitting models to data offers a generic approach that can deliver rapid results in real time during a range of future outbreaks.

**Author summary** Ebola is an infectious disease that, if left untreated, is often fatal. In addition, the consequence of managing Ebola outbreaks places huge strains on already limited healthcare resources in affected countries. Mathematical models can be useful in identifying high-risk outbreaks, and deciding where to allocate resources. However, many existing models of Ebola cannot capture the spatial spread of the outbreak, require highly detailed data, or are too complicated to be used in real-time during an outbreak.

In this paper we describe a framework that can capture the spatial spread of the 2018 Ebola outbreak in Equateur province in the Democratic Republic of the Congo, and is simple enough that it can be re-evaluated as new data become available. We used this framework to understand how transmission changed as a result of public-health interventions, and the risk of transmission to Kinshasa, which was a major public-health concern. This framework is highly flexible and can easily be adapted to new geographic regions, to use different data sources, or to answer pressing public-health questions. This research provides important insights into this Ebola outbreak, and has the potential to generate substantial impact on a range of pubic-health decision-making.

## 1 Introduction

### 1.1 Background

Ebola virus disease (herein referred to simply as Ebola) is a viral haemorrhagic fever that, if left untreated, is often fatal: the case fatality ratio is approximately 50%, but can range between 25% and 90% [1]. In addition to the high number of deaths, the consequence of managing Ebola outbreaks places huge strains on already limited healthcare resources and can reduce the control of other life-threatening diseases [2, 3, 4].

Fruit bats are thought to be the natural Ebola host [5], although non-human primates can also be infected. Animal-to-human transmission occurs through contact with the blood or organs of infected animals [1]; previous outbreaks have been linked to the handling and consumption of bushmeat [6]. Human-to-human transmission occurs via direct contact with blood or other bodily fluids of symptomatic individuals, or with contaminated materials such as bedding, clothing or needles [1]: as a result, healthcare workers and caregivers within the community are at high risk of infection [1, 2]. Transmission can also occur during traditional burial ceremonies that involve direct contact with the body [7], as individuals remain highly infectious even after death [1]. Funerals have been attributed to high levels of onward transmission during previous outbreaks [8].

The incubation period is typically between 2 and 21 days, during which time individuals are not infectious. Early symptoms include sudden onset of fever, fatigue, muscle pain, headache and sore throat, followed by vomiting, diarrhoea, rash, reduced kidney and liver function and, in some cases, internal and external bleeding [1]. Symptoms of Ebola are non-specific, and typical of other diseases endemic to Central and West Africa such as malaria or typhoid fever. As a result, individuals may not seek immediate treatment and cases may be misdiagnosed, particularly at the beginning of an outbreak when surveillance is limited.

To date, outbreaks have occured in Central and West Africa, the largest of which was in 2014-16 and primarily located in Guinea, Liberia and Sierra Leone: the outbreak resulted in 11,325 deaths from 28,652 cases [9]. Other outbreaks have also been recorded in the Democratic Republic of the Congo (DRC), Uganda, South Sudan and Gabon, although each of these outbreaks have been at most 500 cases and located in predominantly rural areas.

### 1.2 2018 outbreak of Ebola virus disease in Equateur Province

The 2018 outbreak of Ebola in Equateur Province was the twenty-sixth outbreak globally and the ninth within the DRC. On 3 May 2018, 21 suspected cases of Ebola, including 17 deaths, were identified in the Ikoko-Impenge health area within Bikoro Health Zone in Equateur Province and reported to the DRC Ministry of Health. On 5 May, five active cases in hospitalised patients were identified and tested for Ebola by a team from the Ministry of Health, the WHO and Médicins Sans Frontières (MSF); of these five active cases, two tested positive for *Zaire ebolavirus*, one of six known ebolaviruses. The outbreak was officially declared by the World Health Organisation (WHO) on 8 May 2018, and concluded on 24 July 2018, 42 days (two maximum incubation periods) after the last confirmed case on 2 June 2018. At the time of writing a further unrelated outbreak of Ebola in North Kivu Province in the DRC is ongoing.

Data on the number of Ebola cases in each of the affected health zones was released daily by the DRC Ministry of Health [10] and less frequently by WHO [11]. Cases were classified as suspected, probable or confirmed according to WHO guidelines [12] and could undergo reclassification: for example, suspected or probable cases could be reclassified as confirmed positive or negative for Ebola. Cases were confirmed by laboratory testing by a method called reverse polymerase chain reaction; this technique requires access to specialised equipment and skilled technicians and so there is some delay between case detection and confirmation. A total of 52 cases were reported during the outbreak, of which 38 were classed as confirmed and 14 classed as probable [13].

Three health zones were affected over the course of the outbreak: Bikoro, where the outbreak was first detected; Iboko, a rural health zone to the east of Bikoro; and Wangata, located within Mbandaka, a city on the Congo river with a population of approximately 1,2000,000 people [14] (Figure 1). Of the 38 confirmed cases in the outbreak, 4 were reported in Wangata health zone, 10 in Bikoro and 21 in Iboko. Within Bikoro and Iboko health zones, the cases appear to be localised to a small number of remote villages [13]; however, data on the exact location of each case was not publicly available at the time of writing.

Largely driven by experience and fears from the Ebola outbreak in West Africa, there was a rapid local and international response in an effort to prevent further transmission within or out of the DRC. In particular, there was considerable concern over the outbreak reaching the capital city Kinshasa, which has a population of approximately 11,000,000 people and serves international flights to countries in Africa and Europe. Organisations within the affected regions implemented various control measures, including: enhanced community surveillance; the use of rapid diagnostic tests; contact identification and tracing; safe and dignified burials (to reduce transmission during funerals); and vaccination [13]. Contact identification and tracing began on 5-6 May 2018 during the first field investigation. By 27 June 2018 all contacts had completed the 21-day follow-up period and no individuals were being actively monitored.

In addition to usual control measures, an experimental Ebola vaccine rVSV-ZEBOV was approved by the WHO for use as part of a targeted vaccination campaign [15]. This vaccine was studied in Phase 3 trials in Guinea during the 2014 outbreak in West Africa and was found to be highly protective against Ebola [16]; however, the exact efficacy of the vaccine is unknown. A ring vaccination strategy began on 21 May 2018 targeting contacts of confirmed Ebola cases, and contacts of these contacts, plus high-risk individuals such as local and international health-care workers in affected areas. By the end of the outbreak a total of 3,481 individuals had been vaccinated [13].

Real-time mathematical modelling of outbreaks has significantly increased in the last ten years, driven by developments in data collection and new statistical methods [17, 18, 19, 20, 21, 22], although efforts are constrained by limited or poor-quality data. This can be circumvented by using novel online data sources to augment traditional surveillance data [20, 22], or by using likelihood-free methods, such as approximate Bayesian computation (ABC) [18, 23]. ABC relies only upon efficient simulations of the epidemic and so is viable in low-resource settings where additional data are not available.

Despite the many challenges, mathematical modelling can provide important quantitative guidance to public health bodies about the likely scale of an outbreak, the efficacy of current intervention policies, and help to identify routes that can still sustain onward transmission. This information may be particularly beneficial in low-resource settings to target limited resources to regions of greatest need. Due to the severity of the disease and motivated by the need to allocate resources effectively, mathematical modelling of Ebola outbreaks has focussed predominantly on evaluating the efficacy of interventions [24, 25, 26, 27, 28, 29]. Modelling has also been used to assess the risk of international spread [30]; however, only a few were explicitly focussed on modelling in real-time [28, 30]. To date, and to the authors’ knowledge, there has been no modelling to assess interventions in real-time during the outbreak in Equateur province; however, a recent epidemiological study of the outbreak [31] showed a decrease in the delay between symptom onset and hospitalisation and sample testing.

Compartmental models of Ebola infection are designed to capture possible transmission in different settings, namely the community, hospitals and funerals. The majority of results (including those presented here) are based upon a six-compartment model introduced by Legrand et al. [32]; Kucharski et al. [24] use a variant of the six-compartment model that includes a compartment specifically for Ebola community care centres and Ebola treatment centres. Almost all existing Ebola modelling literature is concerned with the 2014-15 outbreak in West Africa, although some earlier analyses focus on the larger outbreaks in the DRC in 1995 [32, 25, 26] and Uganda in 2000 [32, 25].

Due to the close contact required for onward transmission and the severity of the symptoms, cases of Ebola are typically spatially clustered, with occasional long-distance transmission as a result of human movement. Despite this, only a few models allow for spatial dynamics [27, 30]. Merler et al. [27] develop an agent-based model for the 2014-16 outbreak in West Africa; however, parameterisation of agent-based models appears to require highly detailed sociodemographic data and computationally expensive Markov chain Monte Carlo simulations, limiting their usefulness during an outbreak. On the other hand, Gomes et al. [30] use a metapopulation modelling framework, whereby the population of the world is divided into sub-populations according to a Voronoi-like tessellation around transportation hubs. Metapopulation models require less detailed data and computational power than network or agent-based models, and so are an excellent framework to use in real-time outbreak modelling.

For the outbreak in Equateur province we are interested in assessing the early growth of cases, whether (and when) there were any notable change in the dynamics and the risk of Ebola reaching the capital city Kinshasa. To account for the spatial clustering of observed cases and to allow us to quantify the risk of case importation to Kinshasa, we used a metapopulation framework with four sub-populations representing the three affected health zones and Kinshasa. For the within-population model we adapted the widely-used six-compartment model [32], with a non-constant transmission parameter to allow for possible changes in transmission due to various interventions that were implemented.

## 2 Methods

### 2.1 Data

Over the course of the outbreak we obtained reports from the DRC Ministry of Health [10] and WHO [11], which were used to produce time series of the number of cases in each of Wangata, Bikoro and Iboko health zones. Due to the uncertainty of the true status of both suspected and probable cases, and the possibility of reclassification of these cases, in the following analysis we consider cumulative confirmed cases only. We stress that these data only give the date of laboratory confirmation, not the date of symptom onset or case detection, and therefore these dates are multiple steps removed from the underlying epidemiological dynamics. A time series of the cumulative data for the three health zones is given in Figure 2. We note that there is an error in the data for Bikoro on 17 May 2018, where the number of confirmed cases drops from 13 to 10. Although there is no explanation from the DRC Ministry of Health, this error is most likely due to reporting before the case was officially confirmed by laboratory testing.

### 2.2 Model

Our model can be described in two parts: a compartmental model that describes the epidemiological dynamics of Ebola infection within a population [32]; and a metapopulation model that describes the spatial dynamics [33].

#### 2.2.1 Epidemiological dynamics

We use a stochastic compartmental model introduced by Legrand et al. [32] to describe the dynamics of Ebola infection within a single isolated population, with the following 6 compartments: susceptible individuals (*S*), who can be infected after contact with infectious individuals; exposed individuals(*E*), who are infected but not yet infectious to others; infectious individuals (*I*) within the community; hospitalised infectious individuals (*H*); dead individuals (*F*), who are still infectious and may transmit infection during burial; and removed individuals (*R*), who are either recovered or dead and safely buried. Individuals move between these compartments according to certain rates, determined by a set of ten parameters, including a scaling parameter
> 0 that scales each of the transmission parameters *β _{I}*,

*β*and

_{H}*β*, associated with transmission in the community, hospitals and funerals. Figure 3 shows a schematic representation of the model, while Table 1 summarises the possible events and transition rates.

_{F}#### 2.2.2 Spatial dynamics

To describe the spatial dynamics of Ebola infection we use a metapopulation model, whereby the total population is split into *K* interacting sub-populations of sizes *N _{i}*,

*i*= 1,…,

*K*. We define

*σ*to be the proportion of epidemiologically relevant contacts that individuals from population

_{ij}*i*have with individuals in population

*j*, which we will simply refer to as the coupling from population

*i*to population

*j*. We naturally have that and so the within population coupling (which we expect to be close to one) can be expressed as

*σ*= 1 – ∑

_{ii}_{j≠i}σ

*. The force of infection in population*

_{ij}*i*, the rate at which susceptible individuals become infected, can then be written in terms of the coupling parameters as:

As such the transmission is assumed to be due to the movement of healthy susceptible individuals visiting infected locations, such that the risk to individuals in population *i* is related to the coupling terms *σ _{ij}*.

The coupling is usually parameterised using mobility data [34, 35], or some suitable and available proxy, such as mobile phone data [36]. Such mobility data is not available for the DRC, and some remote regions (including some rural areas of Equateur province) are not covered by mobile phone networks. In the absence of such data, we use a generalised gravity model [37] to generate the coupling parameters.

We define *v _{ij}* to be the the number of visits from population

*i*to

*j*. According to the generalised gravity model this is proportional to where

*d*is the distance between populations

_{ij}*i*and

*j*and the parameters

*a*,

*b*and

*c*are to be inferred. The coupling,

*σ*, should then be proportional to the fraction of individuals from population

_{ij}*i*visiting population

*j*, which is

*V*/

_{ij}*N*. In addition, we need to ensure that within population coupling

_{i}*σ*= 1 – ∑

_{ii}_{j≠i}is always positive, placing a limit on the maximum size of external couplings. To this end, we normalise the proportion of visits by the maximum proportion over all populations; however, from this definition we have min

_{i}

*σ*= 0. Therefore, we introduce an additional scaling parameter

_{ii}*A*∈ [0,1] such that min

_{i}

*σ*≥ 0. Combining each of these elements we define the coupling

_{ii}*σ*,

_{ij}*i*≠

*j*, to be: where

*a*,

*b*,

*c*and

*A*are parameters to be inferred from the epidemiological dynamics.

#### 2.2.3 Parameters and inference

We use the above model to describe the dynamics of the Ebola outbreak in *K* = 4 populations: Wangata, Bikoro and Iboko health zones, and Kinshasa. We begin each realisation on 5 April 2018 (identified as the date of symptom onset for the first case) with a single infected individual in Bikoro and run the simulations until some later date *T*_{1}: we use 25 May 2018 as our initial modelling endpoint, and 29 June 2018 as the later modelling endpoint. These simulations are rapid: when *T*_{1} = 25 May 2018 we can perform around 1,000,000 simulations an hour. Individual realisations of the model are simulated using the tau-leaping algorithm[38], with time intervals of *τ* = 1 day, implemented in C++.

There is insufficient data resolution (in terms of individual dynamics or who-infected-who) or duration of outbreak to effectively infer all the epidemiological parameters needed for the simulation model. We therefore take the majority of the parameter values from [32] which were estimated from the 1995 Ebola outbreak in Kikwit city in the south-west of DRC (Table 2) and infer the transmission scaling parameter () and four spatial parameters (*A*, *a*, *b*, *c*) from the data.

To parameterise the spatial component of the model, we need an estimate of both population sizes and pairwise distances between populations. We use estimates of population sizes from global health bodies [14, 44] and define pairwise distances to be great-circle distances between populations (Table 3).

We perform parameter inference using approximate Bayesian computation, a flexible likelihood free approach that is straightforward to compute. A traditional likelihood based methodology would require us to infer infection times and subsequent disease progression (together with associated times) for each case, and is likely to be affected by the temporal aggregation of observed cases as a result of detection and testing. Therefore for each realisation we calculate an error e between the realised (*C ^{sim}*) and observed (

*C*) cumulative confirmed cases from 11 May 2018 until some later date

^{obs}*T*

_{1}. We define confirmed cases in our model to be individuals who have moved from the infected class to either the hospitalised or funeral class. We define the error as a weighted root mean square error, summed over the four sub-populations:

The denominator in this expression is motivated by considering a Poisson distribution. In a Poisson distribution the variance is equal to the mean, therefore we would normalise by dividing through by the observed value at each point; however, given the associated uncertainties in the data, this placed far too much emphasis on correctly matching to the early dynamics when the cases were low. We therefore normalise by the maximum of the observed cases in each location, providing some degree of normalisation between the different sized outbreaks. This approach would fail for Kinshasa where no cases were reported, so we take the normalisation constant to be one.

Parameter inference for the five parameters (, *A*, *a*, *b*, *c*) is performed as a two step process. Firstly, parameter values are chosen from appropriate prior distributions reflecting our belief about their values: Uniform(0,1) for *a*, *b*, and *A*; and Uniform(0, 3) for *c* and . From 10,000,000 random parameter choices and stochastic simulations we choose the 1,000 that have the lowest error, noting that the error is dependent on both the parameter values and the stochastic nature of any one simulation. Secondly, we use these 1,000 best parameter sets, and select nearby parameters to test (normally distributed about one of the best 1,000); we continually update the 1,000 best parameters to reflect newly tested parameters. In this manner, we can hone in on parameter sets that are associated with the lowest errors, and we are no-longer constrained by our initial choice of sample distribution.

#### 2.2.4 Model extension

Our simple model is likely to predict long-term exponential growth of infection until the susceptible population sizes become depleted. In practise sustained exponential growth is rarely observed as a range of control and mitigation measures are usually implemented to limit the spread of infection. We believe that the national and international response to the outbreak in Equateur Province is therefore likely to have substantially reduced the per-capita transmission and hence brought the outbreak under control.

We capture possible changes in transmission by including a step-change in governed by two additional parameters. The transmission scaling, now becomes a function of time such that:
where *T _{C}* defines the time at which control effects begin, and (1 –

*δ*) determines the reduction in transmission from all infectious classes.

These two new parameter values are inferred as detailed above, using the associated errors. We start with the first 50 days of the outbreak (to 24 May 2018) and determine the best 1,000 parameter sets (for all 7 parameters) for 10,000,000 parameter updates; these are then fed into simulations for the first 55 days (to 29 May 2018) and we again iterate the fitting procedure for a further 10,000,000 steps. This incremental fitting processes mimics what would occur in real-time as more data becomes available on a daily basis and parameters are refined. The re-fitting procedure is easily completed by an over-night run. We perform incremental re-fitting until we reach 85 days (29 June 2018) by which point we have had three weeks without any additional confirmed cases in the region.

## 3 Results

### 3.1 Early time estimates

We run our simple model (without any change in transmission) until 25 May 2018 and retain the 1,000 realisations with the smallest total error. We obtain estimates of the five unknown parameters (Figure 4 and Figure B.1) and time series fit to the observed cumulative cases (Figure 5). We note that these time series represent stochastic realisations that accurately match to the data by minimising the error, rather than capturing the expected behaviour associated with the parameters.

From the model fitting process we obtain best-fit distributions of the five unknown parameters: the transmission rate scaling and the four spatial coupling parameters *a*, *b*, *c* and *A*. The posterior distributions are summarised in Figure 4; individual posterior distributions for these parameters and the pairwise relationships between them are given in the Supplementary Information (Figure B.1). The parameter scales the transmission rate from the 2018 outbreak in Equateur Province compared to parameters matched to the 1995 outbreak in Kikwit; the posterior distribution for is given in Figure 4a. We get an estimate for of 1.98 (95% credible interval (CI) [1.17, 2.95] – this is the interval that contains 95% of all the parameter values) and so an estimate for the basic reproductive ratio, *R*_{0}, of 5.34 (95% CI [3.17, 7.96]). We recombine the four spatial parameters to obtain meaningful distributions of the coupling between populations. In Figure 4b we include only the most relevant interactions: coupling between Bikoro and Iboko; from Bikoro and Iboko to Wangata; and from each of Wangata, Bikoro and Iboko to Kinshasa. From these results we observe that the coupling between populations is primarily dominated by distance, with the largest coupling between the closest populations, Bikoro and Iboko. In addition, we find that coupling is slightly larger towards the bigger population, so that the coupling from Iboko to Bikoro is larger than the from Bikoro to Iboko, since Bikoro is larger than Iboko.

From the model fitting process we also obtain time series fits to the observed cumulative cases and final distribution of cases corresponding to 25 May 2018 for the metapopulation as a whole (Figure 5a) and for the four sub-population separately (Figure 5b). For the time series we obtain a good qualitative fit to the observed confirmed cases both at the sub-population and metapopulation scale. The individual replicates (shown in grey) are tightly clustered and generally envelope the cumulative reported cases; however, our model noticeably overestimates the reported number of confirmed cases early in the outbreak (up to approximately 19 May); we believe that this overestimate is due to the process by which cases are classed as confirmed. In general, the model estimates improve over time. We compare the final distribution of the realised cumulative confirmed cases to the observed cumulative confirmed cases at the end of the simulation, corresponding to the 25 May 2018. Overall our model replicates the observed data well: our model estimates the total number of confirmed cases across the entire metapopulation to be 34 (mean 34.0, 95% CI [27, 41]), compared to 35 observed confirmed cases. We can also examine the distribution of cases in each of the four sub-populations. For Kinshasa and Wangata our model replicates the observed data extremely well: we estimate the number of confirmed cases to be 0 (mean 0.03, 95% CI [0, 1]) and 5 (mean 4.62, 95% CI [3, 7]) in Kinshasa and Wangata, respectively, compared to 0 and 4 observed confirmed cases. Our model is less accurate for Bikoro and Iboko, although the data still fall within our credible intervals: our model estimates the number of confirmed cases in Bikoro to be 13 (mean 12.54, 95% CI [9, 16]), compared to 10 actual observed cases; in Iboko our model estimates the number of confirmed cases to be 17 (mean 16.81, 95% CI [13, 22]), compared to 21 actual observed confirmed cases.

### 3.2 Effect of intervention

Our simple model assumes that the transmission rates remain constant over the course of the outbreak and would continue to predict exponential growth of cases if allowed to continue running forward from 25 May 2018; however, this contradicts the observed decline in the rate of observing new cases as various public-health measures come into effect. We therefore consider the step-change model described in Section 2.2.4 in which the transmission rate scaling undergoes a discrete drop. This addition to the model introduces two additional unknown parameters to infer: the time at which the transmission rate changes *T _{C}*, and the percentage reduction in transmission 1-

*δ*.

We run our step-change model for multiple endpoints: 24 May 2018 and in 5 day increments to 28 June 2018. At each end point we retain the best 1,000 realisations with the smallest total error and obtain estimates of the seven unknown parameters (Figure 6 and Figure B.2).

We estimate *R*_{0} as more observed data are included in the model fitting procedure (Figure 6a). As additional data are observed with few or no new cases, our model estimates an earlier and larger decrease in *R*_{0}, providing increasing evidence for a change in transmission. Early in the outbreak (using data up to 24 May 2018) there is only a slight signature of a decline in *R*_{0}; between 29 May and 3 June 2018 there is sufficient change in transmission that *R*_{0} drops below the threshold for continued transmission (*R*_{0} = 1), and by 28 June 2018 *R*_{0} has decreased to well below 1. In the remainder of the analysis we consider the best-fit distributions of the seven unknown parameters for the model run until 28 June 2018. Our estimates of the five parameters present in the simple model (, *a*, *b*, *c*, *A*) change as a result of including more data. We now get an estimate for the transmission rate scaling of 1.49 (95% CI [0.90, 2.33]) and so an initial estimate for *R*_{0} of 4.02 (95% CI [2.43, 6.29]). Our mean estimate for *R*_{0} is lower than for our simple model (4.02 compared to 5.34 previously), and the credible interval is narrower (range of 3.86 compared to 4.78 previously). Again, we recombine the four spatial parameters (*a*,*b*,*c*,*A*) to obtain meaningful distributions of the coupling (Figure 6e); our estimates are in broad agreement with values from our simple model, but again more tightly distributed. In addition to the five original parameters, we estimate the time at which there is a change in transmission *T _{C}* (Figure 6b) and the percentage reduction in transmission 1 –

*δ*(Figure 6c). We get an estimate for

*T*of 15 May 2018 (95% CI [9 May 2018, 22 May 2018]) and an estimate for 1 –

_{C}*δ*of 94.8% (95% CI [85.4, 99.8]). Together with other parameters, we estimate that the reproductive ratio first drops below one on 16 May 2018 (95% CI [9 May 2018, 22 May 2018]), eight days after the outbreak was officially declared. At the end of the simulation on 28 June 2018, we have an estimate for

*R*

_{0}of 0.21 (95% CI [0.13, 0.33]).

As before, we also obtain time series fits to the observed cumulative cases and the final distribution of cases corresponding to 28 June 2018 for the metapopulation as a whole (Figure 7a) and for the four sub-populations separately (Figure 7b). For the time series we now more robustly capture the bulk shape of the outbreak including the transition from exponential growth to disease eradication, although we still slightly overestimate the number of confirmed cases during the early stages of the outbreak. We compare the final distribution of the realised cumulative confirmed cases to the observed cumulative cases on 28 June 2018 (Figure 7, RHS). We now estimate the total number of confirmed cases to be 39 (mean 39.4, 95% CI [34, 45]), compared to 38 observed confirmed cases. At the sub-population level, our mean estimates for the final size are very similar to the observed number of confirmed cases and are more tightly distributed than for the simple model. In Kinshasa we estimate the number of confirmed cases to be 0 (mean 0.005, 95% CI [0, 0]), which matches the 0 actual observed confirmed cases; in Wangata we estimate 4 confirmed cases (mean 4.18, 95% CI [3, 6]), which also matches the 4 observed cases; in Bikoro we estimate 11 confirmed cases (mean 11.06, 95% CI [9, 14]) compared to 10 observed cases; and in Iboko we estimate 24 confirmed cases (mean 24.14, 95% CI [20, 29]), which matches the 24 observed cases.

## 4 Discussion

Ebola presents a significant burden to resource-poor countries in Central and West Africa. In addition to the high number of deaths that Ebola can cause, the result of managing outbreaks places huge strains on already limited resources and can reduce the control of other life-threatening diseases. Mathematical models can be used to effectively allocate resources and identify areas at risk of importing new cases. Here we develop a spatial modelling framework for the Ebola outbreak in Equateur Province in the DRC that can allow us to understand the risk of transmission to Kinshasa, using data available in real-time during the outbreak. Two models are fitted to observed confirmed cases: the first assumes no change in transmission and the second assumes a stepwise change in transmission at some time.

Our first model is fitted to the first three weeks of the outbreak. From these results we predict a low risk for the outbreak reaching the major population centre of Kinshasa (Supplementary Information). If we assume constant transmission then we would predict long-term exponential growth, which was not observed; however our second model is able to capture the exponential growth phase and disease eradication by early June (approximately four weeks after the outbreak was officially declared). We estimate a 94.8% reduction in transmission and that the basic reproductive ratio *R*_{0} drops below one on 16 May 2018, indicating that the control measures are having their desired impact. The timing of the change in transmission is eight days after the outbreak was officially declared and approximately five days after intensive contact tracing had begun. The magnitude of the change is in qualitative agreement with WHO reports detailing the extensive local and international response to the outbreak.

Our model combines a well-established compartmental model for Ebola [32] with a spatial metapopulation structure. Metapopulation modelling has previously been used to assess the risk of international spread of Ebola during the 2014 outbreak in West Africa [30]; however, this approach relies upon the Global Epidemic and Mobility Model [45] and hence cannot be easily adapted for use at smaller scales. Given the temporal disconnect between the epidemiologically important infection times and the observed confirmation times, we adopt a likelihood-free approach where repeated stochastic simulations are used to minimise the error between model and data. Our framework has several practical advantages: it can be readily used with any form of stochastic model, and for small outbreak sizes many realisations can be generated quickly, such that many millions of simulations can be performed and analysed.

Although our model is able to capture the dynamics of the outbreak in Equateur Province, the inferred parameter values for , and thus our estimate for *R*_{0}, are somewhat surprising. For the simple model without change in transmission for the first three weeks of data we get an estimate for *R*_{0} of 5.34 [95% CI [3.17, 7.96]); for the step-change model using all the data, our new estimate of *R*_{0} is 4.02 (95% CI [2.44, 6.28]). Both values are appreciably larger than other estimates of 1.03 [31], and larger than estimates for recent outbreaks: 1.38-3.65 for DRC, 1995; 1.34-2.7 for Uganda, 2000-1 [46]; 1.51-2.53 for West Africa, 2013-15 [47, 48]. We believe that this overestimate of *R*_{0} is a result of the process by which cases are confirmed. Due to the delay between symptom onset and laboratory confirmation, confirmed cases are both spatially and temporally clustered, particularly early on in the outbreak. This delay is shown to be longer at the beginning of the outbreak when surveillance is low [31]. The effect of the delay between symptom onset and laboratory confirmation can be seen as large jumps in the number of confirmed cases in each of Wangata, Bikoro and Iboko. It is unlikely, however, that all cases confirmed on the same day share the same data of symptom onset, which is clear if we compare our data to Barry et al. [31]. This temporal clustering of confirmed cases distorts the data, and is likely the main factor that leads to our overestimate of *R*_{0} compared to previous outbreaks. In principle, we could formulate a model that could mimic the temporal aggregation of cases, but this would generate an additional parameter that would need to be estimated and would place an extra layer of filtering between the epidemiology and results. Alternatively, the overestimate of *R*_{0} could be addressed by modifying and refitting our model to data on the timing of symptom onset, if available.

Our modelling is motivated by the need for real-time analysis of Ebola outbreaks and interventions in a spatial setting. Our analysis is constrained by the quality and detail of the limited data publicly available during the outbreak in Equateur Province. In an attempt to minimise uncertainty around the true status of probable and confirmed cases we have restricted our analysis to confirmed cases only. However, even when we only consider confirmed cases the data we use contains at least one error: the number of confirmed cases in Bikoro drops from 13 to 10 on 17 May 2018. Due to the relatively small number of cases and the limited amount of data publicly available, we only infer parameters associated with the spatial component of the model and a single epidemiological parameter that scales the transmission rate; other epidemiological parameters are taken from the literature [32]. We believe that inference of all the epidemiological parameters of our model is not possible with the type of data that is publicly available; instead we would require information on the history and treatment of cases at the individual level.

Our quantitative results are also limited by assumptions and approximations made during the modelling process. To define the coupling we use the pairwise straight-line distances between populations; however, straight-line distances are likely a poor proxy for ease of travel between populations: in some remote areas (such as parts of Bikoro and Iboko health zones), it may take a significant amount of time to travel over short distances due to poor road infrastructure. In addition, it is not clear that each of the four sub-populations is acting as a homogeneously mixing population, and hence additional spatial structure may be acting on the dynamics, although without more detailed reporting this is impossible to assess. Modelling at a finer spatial scale would also require additional information on population structure and would increase the computational power and time required for simulation and analysis, which is at odds with our aim to generate results in real-time in low-resource settings.

Our analysis has demonstrated that practically useful mathematical models can be matched to publicly available data early in an outbreak, especially if previous analysis has helped to set the time-course of disease progression. The likelihood-free method we have adopted is highly convenient, allowing us to quickly and easily perform matches between a rapid stochastic simulation model and available data. As such the modelling framework that we have described offers a template for early model inference to other outbreaks. In particular, the framework can easily be modified to accommodate different compartmental models, spatial scales, or data sources. Using this framework, and with very limited publicly-available data, we have been able to attribute a very low risk for the infection reaching Kinshasa which would exacerbate wider dissemination; we have also been able to rapidly identify changes in the transmission rate due to public-health interventions and predict that these interventions are sufficient to curtail the spread of infection. Obviously, as more data become available, especially individual-level data on cases, there is a desire to develop bespoke models fitted to the details of the ensuing outbreak; however, rapid early predictions before too much infection has arisen, such as outlined here, have the potential to generate a substantial impact on public-health decision-making.

## A Formulation of parameters in epidemic model

There are ten free parameters in the model and five additional parameters that are calculated from the first ten. The model parameters that can be derived from the free parameters are: the mean time from hospitalisation to death; the mean time from hospitalisation to end of infectious period for survivors; *θ*_{1}, the rate of transition from infectious to hospitalised, which is calculated from *θ* such that *θ*% of infectious cases are hospitalised; *δ*_{1} and *δ*_{2}, the effective case fatality ratios in the infected and hospitalised class, respectively, both which are calculated from *δ* such that the case fatality ratio is *δ*. These five parameters are calculated as follows:

## B Posterior distributions of individual coupling parameters

Figure B.1 and B.2 show the pairwise relationships and posterior distributions for the free parameters in the simple model and the step-change model, respectivly. In the simple model, the parameters are the transmission scaling parameter () and four spatial parameters (*A*,*a*,*b*,*c*). In the step-change model, we also infer the percentage reduction in transmission (1 – *δ*), the time at which the transmission rate changes (*T _{C}*). Parameters are inferred using an approximate Bayesian likelihood method; full details of this method are given in Section 2.2.3.

## C Force of infection at the sub-population level

Figure C.1 shows the cumulative force of infection over time in each of the four sub-populations. The force of infection at population *i* at time *t* is
where *C ^{obs}*(

*t*) is the cumulative observed confirmed cases in population

*j*at time

*t*, is the transmission scaling parameter, (1 –

*δ*) is the reduction in transmission,

*T*is the time at which the transmission rate changes, and

_{C}*σ*is the coupling from population

_{ij}*i*to population

*j*.

In all four sub-populations the force of infection increases as new cases are reported, and decreases around mid-May as transmission is reduced as a result of intervention measures. The force of infection plateaus once no new cases are observed.