Catching a wave: on the suitability of traveling-wave solutions in epidemiological modeling

Ordinary differential equation models such as the classical SIR model are widely used in epidemiology to study and predict infectious disease dynamics. However, these models typically assume that populations are homogeneously mixed, disregarding possible variations in disease prevalence due to spatial heterogeneity. To address this issue, reaction-diffusion models have been proposed as an alternative approach for modeling spatially continuous populations in which individuals move in a diffusive manner. In this study, we explore the conditions under which such spatial structure must be explicitly considered to accurately predict disease spread, and when the assumption of homogeneous mixing remains adequate. In particular, we derive a critical threshold for the diffusion coefficient below which disease transmission dynamics display spatial heterogeneity. We validate our analytical results with individual-based simulations of disease transmission across a two-dimensional continuous landscape. Using this framework, we further explore how key epidemiological parameters such as the establishment probability of a disease, its maximal incidence, and its final epidemic size are affected by incorporating spatial structure in SI, SIS, and SIR models. We discuss the implications of our findings for epidemiological modeling and identify design considerations and limitations for spatial simulation models of disease dynamics.


Introduction
A fundamental understanding of how infectious diseases spread across time and space is crucial for predicting the course of epidemics and guiding potential control measures.Compartmental models have provided the workhorse for infectious disease modeling for almost a century [1,2,3].In these models, the population is divided into compartments based on their infection status, such as susceptible (S), infectious (I), and recovered (R) individuals.One of the simplest compartmental models is the so-called SI model, where individuals can only be susceptible or infected [1].Once infected, an individual perpetually remains in that state.Two classic extensions of this model are the SIS and SIR models [1]: In the former, infected individuals can return to the susceptible class, whereas in the latter they can recover, often implying that they are immune to future infections.
Compartmental models are typically studied with ordinary differential equations (ODEs) describing the expected changes in the proportions of the different population compartments over time [1,2,4,5,3].For example, consider a simple SI model where individuals come into contact with each other at a constant rate r.Let i(t) specify the fraction of infectious individuals in the population.Whenever a susceptible individual meets an infected one, it contracts the pathogen with probability α (for simplicity, we will use the terms 'pathogen' and 'disease' interchangeably throughout this paper, acknowledging their conceptual distinction).If we assume homogeneous mixing among the individuals, the change in the proportion of infectious individuals over time will be given by: Epidemiological models of this type can be extended to include additional compartments, such as a recovered class, in which case their dynamics may need to be described by a set of ODEs.Over the past century, such models have provided invaluable insights into the factors that shape disease transmission by providing conceptual results and threshold quantities (e.g., herd immunity, basic reproduction number, and replacement number) to anticipate disease spread and the impact of countermeasures, thereby guiding decisionmaking and public health policies during disease outbreaks [4,2,3,5,6].
The limitations of ODE models often lie in the underlying assumptions they make for the transmission process.One common assumption of ODE models is homogeneous mixing among individuals [7].Yet, this is rarely the case for real-world populations, which can be structured in multiple ways [8,9,10,11,12].In particular, most populations are geographically structured, such that individuals that are close to each other are more likely to come into contact than individuals that are far apart.Besides spatial structure, kinship and social structure can also have a substantial influence on contact patterns [13,14].As a result, local disease prevalence can vary substantially between different parts of the population, which can have a considerable impact on the expected epidemiological dynamics [15,11,16].While classic ODE models have clear advantages for modeling disease transmission, such as their well-understood mathematical properties, computational efficiency, and ability to provide analytical solutions [11], omitting the influence of population structure in these models can lead to inaccurate conclusions about the anticipated disease dynamics.
In this study, we will focus specifically on the potential effects of spatial population structure, which can be incorporated into disease transmission models in several ways.One approach is the use of a meta-population model that divides the population into subpopulations or "patches", each representing a homogeneous host population such as a city.The dynamics within the patches, and the coupling between them, can then be described using a system of ODEs [17,18,19].Such coupling could be based on geographic distance, empirical observations such as mobile phone data [20], or geolocated social media posts, for instance [21].Alternatively, individual-based or agent-based approaches can be used to model all individuals in the population and their spatial position explicitly [22,16,23].The interactions between individuals (e.g., the rate and strength of contacts) can be regulated by distance-based kernels [5,24], which can result in a highly variable infection probability across space [11].Network models represent another class of approaches that allow for very fine-scaled modeling of population structure [25].In these models, each node usually represents an individual, with edges between nodes representing contacts between individuals.A common assumption of this approach is that a susceptible individual can only contract the disease if an edge connects it to an infectious individual [16,26].However, network models are complex, and their analysis can require an extensive amount of empirical data to model larger populations [16].
For populations inhabiting a continuous geographic landscape, where individuals primarily move within short distances compared to the overall range of the landscape, reaction-diffusion models offer an alternative approach for describing spatial disease dynamics [27,28].These models gained popularity in the 1930s, when both Fisher and Kolmogorov used them to explain the spread of a beneficial allele across a one-dimensional habitat [29,30].In such models, the beneficial allele can propagate through the population in the form of a "traveling wave", with a constant and well-defined velocity.Since then, diffusion models have been widely used in mathematical biology to study not only the spread of adaptive alleles but also invasive organisms and infectious diseases [28,31,32,27,33,34,35,36,37,3,38].In the context of epidemiology, reaction-diffusion models enable us to describe the proportion of infected individuals as a function of space and time.The diffusion term models the random dispersal of infectious individuals or contacts [39], while the reaction term describes the local rates of disease transmission, similar to a compartmental ODE model [27].
In this paper, we use diffusion theory to study under which scenarios a simple compartmental ODE model is sufficient to capture disease dynamics and when spatial structure has to be taken into account.Within the framework of an individual-based simulation model, we derived a simple critical threshold D c for the diffusion coefficient.This threshold marks the point below which the limited dispersal of individuals should significantly affect simulated disease dynamics.To validate our findings, we conduct individualbased simulations using SLiM [40], which allow us to smoothly transition from spatially unstructured to structured populations.Using this framework, we investigate the effect of spatial structure on disease dynamics in three classic compartmental disease transmission models: the SI, SIS, and SIR model [1,4].Furthermore, we highlight potential caveats of simulating continuous disease transmission models in a discrete, individual-based, and biologically realistic manner.

ODE model
Consider a simple Susceptible-Infected (SI) model in a closed population of constant size N without birth or death events [1,5].There are only susceptible and infectious individuals in the population, and infectious individuals do not recover (i.e., they never leave the infectious compartment).Let I(t) and S(t) = N − I(t) denote the overall counts of infectious and susceptible individuals in the population at time t, and s(t) = S(t)/N and i(t) = I(t)/N their respective population frequencies with i(t) + s(t) = 1.If we assume a homogeneously mixing population in which individuals come into contact with each other at a constant contact rate r per time unit, and a disease that establishes in a susceptible individual after contact with an infectious individual with probability α, then the change in the proportion of infectious individuals is given by Equation (1).The solution to this ODE model is a logistic growth function [41]: Whenever I(0) > 0, we have lim t→∞ i(t) = 1, meaning the disease will ultimately spread throughout the entire population.Figure 1A provides an illustration of this logistic spread in the ODE model.Note that the parameters r and α always appear in tandem, with the compound parameter (rα) −1 setting the timescale of the dynamics.However, since r and α regulate distinct processes in the context of our individual-based simulation framework described in section 2.4, they are not summarized into one compound variable in this manuscript.
Assuming one infected individual in the beginning (I(0) = 1) and a sufficiently large population (N ≫ 1), the expected time for the disease to reach 50% frequency in the population is: Since i(t) is symmetric around the inflection point at i(t 1/2 ) = 1/2, the expected time for the disease to spread through the whole population under the assumptions N ≫ 1 and I(0) = 1, is therefore: We call t ODE the "fixation time" of the ODE model.

Reaction-diffusion model
ODE models such as the simple compartmental SI model outlined above are widely used in epidemiological modeling [42], yet their underlying assumption of homogeneous mixing makes it difficult to incorporate spatial population structure.For populations that inhabit a continuous landscape, diffusion theory provides an alternative modeling framework that explicitly includes spatial heterogeneity in disease prevalence [3,43,44,29,30,28].In a diffusion model, the frequency of infected individuals becomes a function not only of time t but also of spatial position x.
Let us consider an SI model in which individuals move randomly according to some dispersal kernel that decays with distance at least as quickly as an exponential function [38].In such a case, we can model dispersal by a diffusion term [38,29,45,28], while disease transmission can be modeled by a reaction term based on the local compartment frequencies.Together, this yields a second-order partial differential equation for the SI diffusion model of the following form: The diffusion coefficient D accounts for the random dispersal of individuals.In a one-dimensional habitat, 2D = σ 2 , where σ 2 is the variance in the expected spatial displacement of individuals per time unit due to their random movement [29,46].The reaction term is analogous to the ODE model from Equation (1), except that the frequency of infected individuals now depends on the spatial position x as well.Equation 5 is also known as the Fisher-Kolmogorov or Fisher-KPP equation [29,30,3].
We can express Equation 5 in dimensionless form by re-scaling time, t = (rα)t, and space, x = x/( D/(rα)) where (rα) −1 is the characteristic timescale of the system: However, we maintain the habitat size L and the simulation step-size (i.e., "tick") ∆t as the units of space and time.Thus, we retain the parameterization of the Fisher-KPP equation (Equation 5), facilitating mapping between the continuous reaction-diffusion model and the finite individual-based simulation framework (section 2.4) while preserving parameter interpretability in both contexts.
Under the assumption that the sum of susceptible and infectious individuals are conserved at a constant carrying capacity, and that there is no change in dispersal due to infection, one solution to Equation 5 is a so-called "traveling wave" [29,47,48,27,3,38,33] in which the disease spreads outward from an initial introduction point.The minimum velocity c 0 of such a wave is defined as: Figure 1B provides an illustration of these dynamics in a one-dimensional habitat, where the disease is introduced at the center and then spreads outward in the form of two symmetric traveling waves to the left and right.The "width" of each wave, i.e., the length of the region between its front (where the frequency of infected individuals is still close to zero) and top (where almost everyone is already infected), is approximately w ≈ 2 D/(rα) [29].It is straightforward to extend this diffusion model to two dimensions.Specifically, if dispersal is isotropic, displacement along the x and y axes can be treated independently, with each following the one-dimensional model.The diffusion coefficient then equals half the variance in the displacement in each of the x and y coordinates: 2D = σ 2 x = σ 2 y .Note that the overall displacement in a two-dimensional space, ∆ = ∆ 2 x + ∆ 2 y , will typically be larger than the displacement in each individual dimension.
In this two-dimensional diffusion model with isotropic dispersal, the disease spreads from an initial introduction point in the form of a circle whose radius grows with velocity v.With decreasing curvature at the wavefront, v asymptotically approaches the same minimal wave speed c 0 as the traveling wave in the one-dimensional model [8,49].If the disease is introduced in a small number of individuals in the center of a square habitat patch with a side length L that is much larger than the wave width (which requires L ≫ 2 D/rα [50]), the wave reaches the corners of the square patch when the circle grows to a radius of L/ √ 2. Thus, the expected fixation time of our SI diffusion model in the twodimensional habitat patch is approximately:

Critical threshold
The ODE model assumes a homogeneously mixing population, while the diffusion model can explicitly account for spatial heterogeneity in disease transmission due to limited dispersal.Naturally, the question arises of how the epidemiological dynamics predicted by these models differ, and under which parameters one is more accurate than the other.To answer this question, we can contrast their expected disease fixation times, t ODE and t DIFF .Setting both equal, we obtain a critical value for the diffusion coefficient under which both models predict the same fixation time: When D < D c the ODE model predicts a shorter fixation time than the diffusion model because the slow dispersal of individuals becomes a limiting factor for how fast a disease can spread throughout the whole habitat.In this low-dispersal regime, we expect the diffusion model to more accurately describe epidemiological dynamics than the ODE model, which should overestimate the rate of disease spread.However, it is crucial to note that our estimation of t DIFF in Equation ( 8) solely accounts for spatial progression, neglecting the time required for the build-up of the wavefront and the time until disease fixation after the wavefront has reached the habitat boundaries.These processes are governed by the logistic growth function of the ODE model.Thus, we use a heuristic approach that approximates i(t) by summing up the expected times until a specific frequency is attained under a traveling wave model with constant speed (Equation 7), and a logistic growth model (Equation 2).
As D becomes on the order of D c , the width of the traveling wave in the diffusion model approaches the habitat size, the disease no longer spreads as a circle with a "sharp" edge [50], and spatial progression ceases to limit disease spread.In this case, our estimate of t DIFF in Equation ( 8) is no longer accurate and the fixation time should gradually approach the prediction of the ODE model.In the high-dispersal regime where D ≫ D c , diffusion is strong enough that the population is effectively well mixed over the timescale relevant for the spread of the disease and the ODE model should describe the epidemiological dynamics most accurately.

Individual-based simulation framework
To test the theoretical predictions derived above, we implemented an individual-based distributed-infectives simulation model for disease transmission in SLiM (version 4.0) [40].By varying the dispersal rates of individuals, our simulation model can smoothly transition from spatially unstructured to highly structured populations, allowing us to investigate disease dynamics across the high and low-dispersal regimes, and to study the predicted transition around the critical diffusion coefficient D c .We focused on an abstract, directly transmitted disease in a closed population (i.e., no birth or death processes), comprised of N = 100, 000 individuals.These individuals move around in a continuous two-dimensional spatial domain, modeled as a square habitat patch with a side length of 1 (i.e., L = 1) and periodic boundaries to avoid edge effects [51,15].
The ODE and diffusion models are both continuous-time models.Our individual-based simulation model in SLiM, on the other hand, measures time in discrete units of "ticks" (i.e., ∆t = 1).Each such "tick" is associated with the opportunity for new infections and recovery of individuals.To minimize differences between the discrete-time simulations and the continuous-time mathematical models, we must ensure that ∆t is sufficiently small compared to the characteristic timescale (rα) −1 .However, this requirement does not limit the generality of our models.We can freely choose the time interval in the real world that a tick in our simulations corresponds to, be it a year or a millisecond.By choosing a sufficiently small period, we can always ensure that ∆t ≪ (rα) −1 .
The disease is transmitted via local contacts between infectious and susceptible individuals.In each tick, we assume that an individual has contact with all individuals that are currently within its interaction distance δ, which we chose such that each individual on average has 15 contacts per tick (i.e., r=15).Note that the average number of contacts per tick is independent of the infection status (i.e., infectious individuals, on average, have as many contacts as susceptible or recovered individuals).For a square habitat with edge length normalized to L = 1 and toroidal boundary conditions, and with 100,000 individuals, this corresponds to an interaction radius of δ ≈ 0.00691.Each contact between a susceptible individual and an infectious individual results in disease transmission with a probability of α=0.001, yielding (rα) −1 ≈ 66.67 ticks ≫ 1.When a susceptible individual has contracted the disease, it becomes infectious in the next tick.At the end of each tick, we simulate isotropic dispersal by sampling the x and y displacements for each individual independently from a normal distribution with mean µ = 0 and variance σ 2 = 2D.The infection status of an individual has no effect on D.
We implemented three compartmental disease transmission models in this simulation framework: (i) the SI model, (ii) the SIS model, and (iii) the SIR model [5].In the SI model, once an individual is infected, it remains infectious until the end of the simulation.In the SIS model, infected individuals can recover and return to the susceptible class in each tick with probability γ.In the SIR model, they recover and gain lasting immunity with probability γ in each tick.
Each simulation run is initialized by uniformly distributing 100,000 sus-ceptible individuals across the habitat.At tick 25, the disease is introduced into the population by infecting the individual closest to the center of the square habitat (i.e., I(0) = 1).For each simulation, we recorded the number of individuals in each compartment at the beginning of each tick until either the disease spread through the whole population, no infectious individuals were left, or the simulation ran for 50,000 ticks.This upper threshold of 50,000 ticks results in a very lenient timeline that is not expected to interfere with the simulated SI disease transmission dynamics.For our simulation parameters, the disease is expected to fix in the SI model between approximately 1,500-14,000 ticks (Equations 4 and 10).

Low-diffusion limit
A key abstraction of the diffusion approach is that it models a continuous density of individuals across the habitat.The relative frequencies of infectious and susceptible individuals are specified at any given position in the habitat, and disease transmission can occur exactly at that position.In our individual-based simulation model, however, a finite number of individuals are dispersed across a continuous habitat.It is unlikely that two of them will ever be at exactly the same position.Thus, one has to decide how close two individuals need to be for one to be able to infect the other, which we implemented by defining an interaction radius δ.
Ideally, δ should be very small (i.e., much smaller than the average dispersal distance of individuals) so that the movement of individuals remains the primary driver of disease spread.Yet, there is a lower limit on δ for any given contact rate in our individual-based simulation model.For example, if we want to ensure that each individual on average has, say, two contacts, this requires a certain minimum interaction radius so that each individual encounters enough individuals within its interaction radius.In our simulated continuous two-dimensional habitat, which is represented by a square with a side length of L and includes N uniformly distributed individuals, the interaction radius δ is defined as δ = L r/(N π).
One important consequence is that, when D becomes sufficiently small, the spread of the disease is no longer driven primarily by the dispersal of individuals, but instead by "hopping" between neighboring individuals with overlapping interaction circles.These "hopping" dynamics can lead to the disease fixing even when individuals do not move at all, thereby setting an upper bound for the fixation time in the limit D → 0.
We can still describe disease spread under these hopping dynamics by the diffusion model, but we need to reinterpret the diffusion term.In particular, we interpret each transmission event as a "dispersal" step of the disease from the location of the infecting individual to the location of the newly infected individual.To derive a rough estimate of the diffusion coefficient (D 0 ) of the transmission, we consider the limit where individuals no longer move at all.Starting from the first infected individual at the center of the habitat, the disease spreads outward in a circular pattern, with the traveling wave now driven entirely by transmission events rather than by individual dispersal.The minimum speed of this traveling wave for the SI model is given by c 0 = 2 √ D 0 rα.One complication is that, in the hopping model, D 0 varies across space because it depends on the occurrence of new transmission events.Thus, D 0 approaches zero in areas where either everyone is already infected or everyone is still susceptible.It is maximal right at the wavefront, where it determines the speed of the wave.
Let us, therefore, consider an infected individual located exactly at the wavefront.If its neighbors are uniformly distributed across its interaction circle, around half of these individuals, on average, are already infected, while the others are still susceptible.The overall rate at which transmission events from the focal infected individual occur then is Pr(transmission) = rα/2.If a transmission event occurs, the variance of the displacement along the x-axis is σ 2 x = δ 2 /4.The effective diffusion coefficient of the hopping process thus results as 2D 0 ≈ Pr(transmission)σ 2 x = rαδ 2 /8.According to Equation ( 8), this yields a fixation time of: This constitutes an approximate upper bound of the fixation time in the limit D → 0, where the fixation time from dispersal alone, given by Equation ( 8), diverges to infinity.By comparing the fixation time estimates t 0 and t DIFF , we determine D l , a threshold for individual dispersal below which disease spread in our individual-based simulation model is primarily driven by transmission events:

SI Model
To test how well our mathematical predictions describe disease spread in the individual-based simulations, we start with the SI model.Qualitatively, we expect that in the low-dispersal regime (D < D c = 3.536 × 10 −6 ), the disease spreads from its introduction at the center of the habitat as a circle with a steadily growing radius.By contrast, in the high-dispersal regime (D ≫ D c ), the population should be sufficiently mixed so that the frequency of infected individuals increases in all areas of the habitat at a similar rate.We next tested our predictions for the disease fixation time under varying dispersal rates (Figure 3A).In the high-dispersal regime, the observed fixation time t sim is independent of D and well approximated by the predictions from the ODE model given in Equation ( 4).The time-resolved proportion of infectious individuals follows the logistic growth curve (Figure 3B, bottom panel).When D becomes smaller than D c , the fixation time starts to increase and becomes inversely proportional to D, as predicted by Equation (8).Once D becomes smaller than D l (D l = 4.476 × 10 −8 ), the fixation time approaches the prediction under the hopping dynamics derived in Equation (10), t sim levels off and again becomes independent of D.
Overall, we observe that fixation times can differ up to one order of magnitude in our simulation model depending on the dispersal rate.This highlights the potential risk of underestimating the expected epidemic duration if predictions are based solely on the ODE model without accounting for population structure.).Here, the proportion of infectious individuals, i(t), increases after an initial lag phase approximately as the area of a circle whose radius expands at the predicted minimum wave speed of the diffusion model (blue dashed line).The initial lag between the expected and the observed proportion of infectious individuals is likely caused by the local establishment of the disease and an initial reduction of wave speed due to the curvature of the wavefront [8,45].i(t) of the simulations is well approximated by a heuristic approach (solid blue line) that accounts for both disease propagation via a constant traveling wave, and the local rise in i(t) described by the ODE model (dotted line).The bottom panel shows an example from the high-dispersal regime (D = 10 −2 ).Here, the proportion of infectious individuals is well approximated by the expected logistic growth function of the ODE model (dotted blue line).Solid red lines represent the median proportion of infectious individuals across 50 simulation runs, and the shaded areas represent the range between the 2.5 and 97.5 percentiles.All data were simulated using the following parameters: N = 100, 000, α = 0.001, r = 15, L = 1.

SIS Model
In the SI model, infected individuals do not recover from the infection, leading to an inevitable spread and ultimate fixation of the disease as long as rα > 0. The SIS model allows infected individuals to transition from infection back to the susceptible state at rate γ.In a deterministic ODE model of an unstructured population, such a disease is expected to reach an equilibrium frequency of 1 − 1/R 0 as long as R 0 > 1, where R 0 = rα/γ represents the basic reproduction number of the disease (i.e., the average number of secondary infections caused by a single infectious individual when introduced into a completely susceptible population) [52,5].If R 0 < 1, the disease is typically incapable of establishing in the population and will quickly die out.
In addition to the critical threshold D c established for the SI model, one can analogously determine a critical threshold D c in the SIS model, below which we expect individual dispersal to affect the disease dynamics.For the SIS model, the frequency of infected individuals i(t) follows a logistic growth function with an equilibrium frequency of 1 − 1/R 0 = 1 − γ/(rα) in a homogeneously mixing population: Equation 12 allows the derivation of an expected "time until equilibrium" estimate for the ODE model, similar to the derivation of the "fixation time" estimate for the SI model (Equation 4).Under the assumptions N ≫ 1 and I(0) = 1, this "time until equilibrium" can be estimated for the SIS model in a homogeneously mixing population as: For a spatial SIS model, the minimal wave speed c 0 accounts for the recovery rate γ of infectious individuals back to the susceptible state, yielding c 0 = 2 D(rα − γ) [29].This approximation for c 0 can then be employed to calculate the "time until equilibrium" estimate for the spatial SIS model: Following the lines discussed for the SI model above, we can contrast the expected times until equilibrium for the ODE (t ODE ) model and our estimation of t DIFF to obtain a critical value for the diffusion coefficient above which we expect a negligible effect of spatial progression on disease dynamics.
To investigate the effects of spatial structure and stochasticity on an SIS model, we modified our individual-based simulations such that infected individuals reenter the susceptible class with a probability γ per tick, yielding R 0 = rα/γ in our simulation framework.Qualitatively, we find that the disease initially still spreads in a circular fashion in the low-dispersal regime and more homogeneously in the high-dispersal regime, as in the SI model (Figure 4).However, the perpetual reentering of infectious individuals into the susceptible class breaks up this strict circular clustering more quickly in the SIS model as compared to the SI model.Over the course of the simulated time period, the frequency of infectious individuals converges to the equilibrium frequency, 1 − 1/R 0 , in both the high-dispersal and low-dispersal regimes.We note that, in our stochastic agent-based simulation model with finite population size, this equilibrium is only quasi-stable: Over much longer time scales than considered here, the infectious individuals will eventually vanish from the population.
While a deterministic ODE model predicts that any disease with R 0 > 1 should successfully establish after introduction, the stochastic fluctuations in our individual-based simulation model can lead to the loss of the disease in certain simulation runs, even for relatively high R 0 values (e.g., approximately 25% of simulation runs with R 0 = 3, as shown in Figure 5A).We did not find a strong influence of the dispersal rate on the probability of disease establishment for R 0 > 2. However, in the high-dispersal regime, the frequency of infected individuals approaches the expected equilibrium frequency more rapidly compared to the low-dispersal regime (Figure 5B-D).Overall, we observed that the rate of disease spread can again differ by more than an order of magnitude in our simulation model depending on the dispersal rate (Figure 5C).).In the low-dispersal regime, the proportion of infectious individuals, i(t), increases after an initial lag phase approximately as the area of a circle whose radius expands at the predicted minimum wave speed of the diffusion model (blue dashed line).The initial lag between the expected and the observed proportion of infectious individuals is likely caused by the local establishment of the disease and an initial reduction of wave speed due to the curvature of the wavefront [8,45].i(t) of the simulations is well approximated by a heuristic approach (solid blue line) that accounts for both disease propagation via a constant traveling wave, and the local rise in i(t) described by the ODE model (dotted line).In the high-dispersal regime, the proportion of infectious individuals is well approximated by the expected logistic growth function of the ODE model with an equilibrium frequency of 1−1/R 0 [5] (dotted blue line).Solid red lines represent the median proportion of infectious individuals across all simulations, and the shaded area represents the range between the 2.5 and 97.5 percentiles.All data were simulated using the following parameters: N = 100, 000, α = 0.001, r = 15, γ = [1/70, 1/80, 1/140, 1/150, 1/200, 1/300].More than 99% of our simulation runs categorized as established spread across the entire habitat.

SIR Model
In the SIS model, individuals can transition between the susceptible and infectious states but do not acquire resistance to the disease.Conversely, the SIR model incorporates lasting resistance by introducing a "recovered state", to which individuals transition after being infected.Once in the recovered state, individuals cannot return to the susceptible or infectious state.According to a deterministic SIR model, a disease is expected to invade a completely susceptible and unstructured population whenever its basic reproduction number is R 0 > 1 [4].In such cases, the final epidemic size (e final ), representing the overall proportion of individuals who have been infected during the epidemic, can be obtained by solving the transcendental equation e −R 0 e final = 1 − e final [3].
To study the SIR model, we modified our simulations of the SIS model such that infected individuals transition to the recovered class with a constant probability γ per tick, rather than returning to the susceptible class.This modification does not alter the basic reproduction number of the disease, which remains at R 0 = rα/γ.Qualitatively, we observed that, similar to the SI and SIS models, the high-dispersal regime closely approximates a homogeneously mixed population (Figure 6) whereas in the low-dispersal regime, the disease spreads in a circular pattern across space, as theoretically predicted [54].However, the traveling wave by which infections spread is now closely followed by another wave describing the emergence of recovered individuals.
The clustering of recovered individuals in proximity to the wavefront of infectious individuals results in an overall reduction in the rate of disease transmission compared to a homogeneously mixed population and has various implications for disease dynamics: First, similar to the SIS model, for R 0 < 2 there is a higher likelihood of the disease being eradicated before it can establish in the low-dispersal regime.This results in a lower establishment probability for smaller diffusion coefficients for R 0 < 2, where for extremely low diffusion coefficients (D = 10 −10 ) the establishment probability remains low until R 0 surpasses 3 (Figure 7A).Second, as previously demonstrated [10,9], higher values of R 0 are required in the low-dispersal regime to approach the theoretically expected final epidemic size (Figure 7B).If dispersal is limited, the clustering of recovered individuals near the infectious wavefront can give rise to isolated 'pockets' of susceptible individuals devoid of any contacts with infectious individuals.This phenomenon leads to a delay or even avoidance of their infection, thereby contributing significantly to the reduction in the final epidemic size.This effect is illustrated in Figure 6, where for D = 10 −6 distinct clusters of susceptible individuals surrounded by recovered ones are observable.Third, due to new infections occurring solely at the wavefront in the low-dispersal regime, the maximal disease incidence (i max ), representing the maximum proportion of the population ever infected at a single point in time, is smaller compared to the maximal incidence in the high-dispersal regime (Figure 7C).Fourth, analogous to the SI and SIS models, reduced dispersal slows down disease spread in general.Simulations with a diffusion coefficient of D = 10 −6 require nearly an order of magnitude more time to reach their respective final epidemic size compared to simulations with D = 10 −2 .Finally, the time-resolved frequency of infectious individuals, i(t), resembles a bell-shaped curve in the high-dispersal regime, as predicted by the ODE model [4].In the low-dispersal regime, on the other hand, i(t) exhibits almost linear growth until late in the simulation, as is predicted from a circular wave progressing with constant speed [29,45] (Figure 7D-E).More than 99% of our simulation runs categorized as established spread across the entire habitat.

Discussion
In this study, we explored the conditions under which it is essential to include spatial structure for precise disease spread predictions when individuals move diffusively across a habitat.We established a critical threshold D c for the diffusion coefficient, below which disease transmission dynamics is expected to exhibit substantial spatial heterogeneity, while for D ≫ D c , the assumption of homogeneous mixing remains adequate.To validate our analytical results, we conducted individual-based simulations on a continuous two-dimensional landscape.Furthermore, we investigated the impact of continuous spatial structure on key epidemiological parameters, including disease establishment probability, maximal incidence, and final epidemic size.
The use of reaction-diffusion models in epidemiology is attractive due to their ability to approximate epidemic states as individuals move and interact in a spatial domain.Furthermore, they often allow calculating the expected speed of disease propagation based on measurable life-history traits, such as individual dispersal, average contact rate, and infection risk [36].However, modeling diffusive disease transmission with a Fisher-KPP equation and its constant traveling wave solution (as we have done in this study) hinges on several assumptions.First, a traveling wave model with constant speed assumes that disease transmission is a highly localized process [25,55], with long-range pathogen transmission being very rare.Additionally, distributedinfectives models, as we focused on in this study, assume uniform individual dispersal across the entire population, disregarding possible variations in movement patterns among individuals [32].In reality, movement patterns can vary significantly and include processes such as long-range dispersal, migration [56], or preferential movement towards specific locations [57].All of these processes could profoundly affect disease transmission dynamics and substantially limit the accuracy of a traveling wave model assuming a constant speed.For example, if a dispersal kernel is not radially symmetric (as assumed in this study), the advance of the wave front is going to deviate from a spherical shape [8].If long-distance dispersal is common, the disease can spread from multiple independent locations, where it has been introduced by long-distance migrants [25].This can ultimately result in a faster spread and accelerating wave speeds over time, as observed for winddispersed plant pathogens or avian influenza [45,27,33,58,25].In such scenarios, conventional reaction-diffusion models assuming an exponentially bounded dispersal distribution [59,58,38] run the risk of significantly underestimating the spread of diseases [45,58,60,33].This, in turn, can constrain the ability to identify and optimize appropriate disease interventions [61].Unfortunately, it can be quite challenging to estimate the occurrence of rare long-distance dispersal events in empirical systems [36].
Second, conventional distributed-infectives reaction-diffusion models often assume that the contact rate and disease establishment proportion are constant for the whole population.However, it is important to recognize that these parameters can exhibit variability among individuals and in response to environmental conditions, which are not explicitly accounted for in our abstract disease model.For instance, the contact rate r can vary significantly among individuals of different age groups [36,62,63], the number of contacts can be impeded by complex landscape features such as mountains, rivers, or other barriers [36,34], and vary with population density or changes in host behavior over time [64].The disease establishment probability α is often strongly influenced by environmental factors such as temperature, relative humidity, and the level of urbanization [65].These individual and environmental factors can play a crucial role in shaping the dynamics of disease transmission [66], yet they are not explicitly incorporated into our current model framework.
Third, the conventional Fisher-KPP equation treats space as a continuous and unbounded domain, disregarding edge effects that may exist in real-world habitats with boundaries [28,29,5,51].To mitigate such edge effects, we implemented periodic boundary conditions in our individual-based simulation framework.This approach ensured that individuals on the boundaries of the simulated population were connected to those on the opposite side, creating a continuous and seamless space.However, one consequence is that susceptible individuals can encounter disease waves closing in from several directions as the disease spreads through the population.Although this may have led to a slight acceleration of disease propagation in scenarios with low dispersal towards the end of the simulation run, the overall impact on the rate of disease fixation was relatively small compared to the total simulation time.If a biological system meets the assumptions of a reaction-diffusion model with a constant traveling wave solution, this mathematical framework can provide an appealing approach to studying disease dynamics.For exam-ple, Noble used a reaction-diffusion model to study the spread of the Black Death across 14th-century Europe [32], finding that the analytically predicted patterns aligned well with historical records.Disease transmission was treated as a distributed-infectives process with a value of rα ≈ 1 − 4 per year and D ≈ 25, 900 km 2 per year, yielding an expected wave "width" of 2 D/(rα) ≈ 161−322 km, which is relatively small compared to the investigated habitat (the distance between Messina and Oslo is approximately 3,200 km).Furthermore, long-distance dispersal was likely very rare in medieval Europe, supporting the suitability of a constant traveling wave model for studying this particular epidemic.In another study, Murray & Brown used a reaction-diffusion model to study the potential spread of rabies among foxes in the UK [67].Rabies is transmitted by close contacts and foxes are highly territorial, which limits their individual dispersal.Similar to Noble's work, the rα and D estimates resulted in a relatively small wave "width" when compared to the studied habitat (the southern part of England) [67].Murray's predictions for a potential spread of rabies in the southern UK were in good agreement with the observed spatial spread of rabies across mainland Europe [67].
While disregarding long-distance dispersal was likely suitable in the aforementioned studies, the scenario should be markedly different for modern human populations.With the growing interconnectedness between human communities, recent epidemics require a comprehensive framework capable of accounting for both short-and long-distance dispersal [68,58,38].Integrating these dispersal mechanisms will be crucial for capturing the realistic dynamics of disease spread and invasion processes [12].For recent human epidemics, Brockmann et al. demonstrated how substituting geographic distances with a probabilistically inspired "effective distance" that accounts for different levels of connectivity between human communities, allows the application of relatively simple diffusion models to study complex disease dynamics [12].However, one has to be aware that this type of network data are rarely available for non-human species.Furthermore, besides distance and connectivity, individual movement is often influenced by various factors such as behavior, social structures, and environmental conditions [38,23,36,12].These factors can significantly contribute to the complexity of human movement patterns, making it particularly challenging to robustly integrate them into disease models [69].If movement patterns are complex, network models or individual-based simulation frameworks might be better suited for study-ing disease dynamics, although these increasingly complex models may also limit the ability to draw general conclusions [38,36].
The critical threshold D c (Equation 9) allows for a back-of-the-envelope calculation of the maximum amount of individual dispersal that would be required to cause a noticeable slowdown in disease spread due to limited dispersal.As an example, consider a simple toy model of a theoretical measles outbreak in a completely susceptible population.Measles is a highly contagious disease that can cause new infections in up to 90% of close contacts (α = 0.9) [70].Assuming an average of four close contacts per day (r = 4) [71], and 979 individuals per km 2 (i.e., the average population density in urbanized areas of the US; N = 979) [72], we obtain D c = 0.002 km 2 per day.This means that in order for our theoretical measles outbreak to be substantially slowed down due to limited dispersal in an area with L = 1 km, the standard deviation of the expected spatial displacement of individuals in each of the x and y coordinates needs to be less than 69 meters per day on average.This toy example demonstrates that spatial dynamics may be omitted for highly contagious diseases over relatively small habitat ranges [50], but have to be accounted for when describing disease dynamics across larger communities such as individual cities [73,37].
In order to assess the accuracy of our mathematical predictions derived from diffusion theory, we conducted individual-based simulations.This highlighted some important differences and complexities that arise when attempting to simulate a continuous reaction-diffusion process.First, in our simulation framework, we defined contacts between individuals based on an interaction radius parameter.This approach allowed disease spread to occur by "hopping" between neighboring individuals with overlapping interaction areas even when individuals remained stationary.As a result, there is an upper bound on the fixation time in our individual-based simulations, which may not be immediately evident in other disease models.One might consider mitigating the influence of the interaction radius on the disease dynamics by implementing a k-nearest-neighbor approach to model contacts.However, caution should be exercised when using a small value of k (e.g., k = 1), in combination with very low dispersal, as this combination could introduce a significant delay in disease spread if the neighbor structure changes only gradually.
Second, to mitigate edge effects, we incorporated periodic boundaries into our simulation framework.Unlike real-world scenarios, where distinct boundaries can profoundly affect disease spread, simulations with periodic boundaries enable us to isolate the impact of individual dispersal on disease transmission.In contrast to non-periodic boundaries, which may introduce bias based on the initial positions of the first infected individuals relative to the edges, periodic boundaries remove this potential for bias by removing the concept of edges altogether.Reflecting boundaries, however, may be more appropriate for simulating specific real-world scenarios.
In conclusion, our study highlights the need for careful interpretation and understanding of spatial factors in epidemiological dynamics, while also demonstrating the complexities and design choices that arise in modeling such dynamics.

Data Accessibility
The individual-based SI, SIS, and SIR disease transmission models are implemented in the open-source software SLiM (version 4.0) [40] and are available on GitHub under https://github.com/AnnaMariaL/SpatialDieaseSim.Videos demonstrating disease spread in structured and unstructured populations are available on YouTube under https://tinyurl.com/bdddam58.

Figure 1 :
Figure 1: Schematic plots of expected disease spread under the ODE model and a traveling wave solution of the reaction-diffusion model.(A) In the ODE model, the frequency of infected individuals, i(t), increases logistically according to Equation (2).The vertical dashed lines represent t 1/2 as defined in Equation (3).(B) In the diffusion model for a one-dimensional habitat, the disease spreads from an initial release at the center of the habitat via two traveling waves with velocity v (minimum velocity c 0 = 2 √ Drα) in either direction.

D = 10 - 6 DFigure 2 :
Figure 2: Two exemplary simulation runs in the SI model.The left column shows a run in the low-dispersal regime (D = 10 −6 ), while the right column shows a run in the high-dispersal regime (D = 10 −2).The top, middle, and bottom plots show population snapshots taken when 99%, 75%, and 25% of the population were still susceptible.As predicted, the disease progresses in a circular pattern in the low-dispersal regime.By contrast, with large dispersal, infectious individuals are homogeneously distributed across space (right column).Data were simulated using the following parameters: N = 10, 000 (ten-times smaller than our standard model for better visualization), α = 0.01, r = 15, L = 1.Videos of simulation runs in the SI model can be found under https://tinyurl.com/bdddam58.

Figure 3 :
Figure 3: Disease dynamics in the SI model.(A) Time until disease fixation in the simulations (t sim ) for varying diffusion coefficients (D).Black dots represent median values and error bars the 2.5 and 97.5 percentiles estimated over 50 simulation runs for each D(for D = 0, the spread stagnated in 3 of the 50 runs due to small clusters of susceptible individuals with interaction areas devoid of infectious individuals, which we excluded from the analysis).The vertical solid line indicates the critical threshold D c according to Equation(9).In the high-dispersal regime (D ≫ D c ), fixation times converge to the prediction of the ODE model (t ODE , dotted line), whereas in the limit of no dispersal (D → 0), they are bounded by the prediction of the hopping model (t 0 , dot-dashed line; D l is indicated as vertical two-dashed line).In the low-dispersal regime where D < D c , the disease advances as a traveling wave, and t sim is well approximated by t DIFF (dashed line).(B) The top panel shows the time-resolved disease dynamics for the low-dispersal regime (D = 10 −6 ).Here, the proportion of infectious individuals, i(t), increases after an initial lag phase approximately as the area of a circle whose radius expands at the predicted minimum wave speed of the diffusion model (blue dashed line).The initial lag between the expected and the observed proportion of infectious individuals is likely caused by the local establishment of the disease and an initial reduction of wave speed due to the curvature of the wavefront[8,45].i(t) of the simulations is well approximated by a heuristic approach (solid blue line) that accounts for both disease propagation via a constant traveling wave, and the local rise in i(t) described by the ODE model (dotted line).The bottom panel shows an example from the high-dispersal regime (D = 10 −2 ).Here, the proportion of infectious individuals is well approximated by the expected logistic growth function of the ODE model (dotted blue line).Solid red lines represent the median proportion of infectious individuals across 50 simulation runs, and the shaded areas represent the range between the 2.5 and 97.5 percentiles.All data were simulated using the following parameters: N = 100, 000, α = 0.001, r = 15, L = 1.

D = 10 - 6 DFigure 4 :
Figure 4: Two exemplary simulation runs in the SIS model.The left column shows a run in the low-dispersal regime (D = 10 −6 < D c ), while the right column shows a run in the high-dispersal regime (D = 10 −2).The top, middle, and bottom plots show population snapshots taken when 99%, 75%, and 25% of the population were still susceptible.As predicted, in the low-dispersal regime, the disease initially advances by a growing circle.By contrast, in the high-dispersal regime infectious individuals are homogeneously distributed across space (right column).In contrast to the SI model, infectious individuals hardly cluster in the SIS model late in the simulations even in the low-dispersal regime due to the perpetual reentering of infectious individuals into the susceptible class with a probability γ per tick.Data were simulated using the following parameters: D = [10 −6 ; 10 −2 ], N = 10, 000, α = 0.01, r = 15, L = 1, γ = 1/30.Videos of simulation runs in the SIS model can be found under https://tinyurl.com/bdddam58.

Figure 5 :
Figure 5: Disease dynamics in the SIS model.(A) Percentage of 50 simulation runs that reached at least a frequency of 1% infectious individuals for varying R 0 values (panels) and three exemplary diffusion coefficients.Only simulation runs with an i(t) of at least 1% are included in (B) to (D).(B) Observed equilibrium frequencies at the end of the simulations.Boxplots are shown with whiskers indicating the most extreme values within 5× IQR of the boxes, but the variance among runs was so small that they are not visible.The expected equilibrium frequency for each R 0 value under an ODE model[5] is depicted by the dotted blue line.Observed equilibrium frequencies were estimated by fitting simulated data to a three-parameter logistic model using the drc R package[53].(C) Time to reach the equilibrium.We defined the onset of the equilibrium as the first time the frequency of infectious individuals reaches 99.95% of the observed equilibrium frequency.The expected time of equilibrium onset under an ODE model is depicted as dotted blue line.(D) Time-resolved disease dynamics for R 0 = 4.5 and two exemplary diffusion coefficients (top panel: low dispersal with D = 10 −6 < D c = 2.87 × 10 −6 , bottom panel: high dispersal with D = 10 −2 ).In the low-dispersal regime, the proportion of infectious individuals, i(t), increases after an initial lag phase approximately as the area of a circle whose radius expands at the predicted minimum wave speed of the diffusion model (blue dashed line).The initial lag between the expected and the observed proportion of infectious individuals is likely caused by the local establishment of the disease and an initial reduction of wave speed due to the curvature of the wavefront[8,45].i(t) of the simulations is well approximated by a heuristic approach (solid blue line) that accounts for both disease propagation via a constant traveling wave, and the local rise in i(t) described by the ODE model (dotted line).In the high-dispersal regime, the proportion of infectious individuals is well approximated by the expected logistic growth function of the ODE model with an equilibrium frequency of 1−1/R 0[5] (dotted blue line).Solid red lines represent the median proportion of infectious individuals across all simulations, and the shaded area represents the range between the 2.5 and 97.5 percentiles.All data were simulated using the following parameters: N = 100, 000, α = 0.001, r = 15, γ = [1/70, 1/80, 1/140, 1/150, 1/200, 1/300].More than 99% of our simulation runs categorized as established spread across the entire habitat.

Figure 6 :Figure 7 :
Figure 6: Two exemplary simulation runs in the SIR model.The left column shows a run in the low-dispersal regime (D = 10 −6 ), while the right column shows a run in the high-dispersal regime (D = 10 −2 ).The top, middle, and bottom plots show population snapshots taken when 99%, 75%, and 25% of the population were still susceptible.As predicted, in the low-dispersal regime the disease advances approximately by a growing circle, with infectious individuals being clustered at the wavefront, and recovered individuals being clustered around the disease origin in the middle of the area.By contrast, in the high-dispersal regime susceptible, infectious, and recovered individuals are homogeneously distributed across space (right column).Data were simulated using the following parameters: D = [10 −6 ; 10 −2 ], N = 10, 000, α = 0.01, r = 15, L = 1, γ = 1/25.Videos of simulation runs in the SIR model can be found under https://tinyurl.com/bdddam58.