Understanding the role of urban design in disease spreading

Cities are complex systems whose characteristics impact the health of people who live in them. Nonetheless, urban determinants of health often vary within spatial scales smaller than the resolution of epidemiological datasets. Thus, as cities expand and their inequalities grow, the development of theoretical frameworks that explain health at the neighborhood level is becoming increasingly critical. To this end, we developed a methodology that uses census data to introduce urban geography as a leading-order predictor in the spread of influenza-like pathogens. Here, we demonstrate our framework using neighborhood-level census data for Guadalajara (GDL, Western Mexico). Our simulations were calibrated using weekly hospitalization data from the 2009 A/H1N1 influenza pandemic and show that daily mobility patterns drive neighborhood-level variations in the basic reproduction number R0, which in turn give rise to robust spatiotemporal patterns in the spread of disease. To generalize our results, we ran simulations in hypothetical cities with the same population, area, schools and businesses as GDL but different land use zoning. Our results demonstrate that the agglomeration of daily activities can largely influence the growth rate, size and timing of urban epidemics. Overall, these findings support the view that cities can be redesigned to limit the geographic scope of influenza-like outbreaks and provide a general mathematical framework to study the mechanisms by which local and remote health consequences result from characteristics of the physical environment.

Cities are complex systems whose characteristics impact the health of people who live in them. Nonetheless, urban determinants of health often vary within spatial scales smaller than the resolution of epidemiological datasets. Thus, as cities expand and their inequalities grow, the development of theoretical frameworks that explain health at the neighbourhood level is becoming increasingly critical. To this end, we developed a methodology that uses census data to introduce urban geography as a leading-order predictor in the spread of influenza-like pathogens. Here, we demonstrate our framework using neighbourhood-level census data for Guadalajara (GDL, Western Mexico). Our simulations show that daily mobility patterns can drive neighbourhood-level variations in the basic reproduction number R 0 , which in turn give rise to robust spatiotemporal patterns in the spread of disease. To generalize our results, we ran simulations in hypothetical cities with the same population, area, schools and businesses as GDL but different land use zoning. Experiments in these synthetic cities demonstrate that the agglomeration of daily activities can influence the growth rate, size and timing of urban epidemics. Overall, these findings support the view that cities can be redesigned to limit the geographical scope of influenza-like outbreaks.

Introduction
Empirical studies have identified inter-city variations in the timing, intensity and severity of influenza-like outbreaks [1][2][3][4]. Aiming to understand the mechanisms through which city characteristics yield such health consequences, epidemiologists have resourced to a variety of methods. Epidemiological data reveal compelling statistical correlations, but do not resolve intracity variations in health that are driven by lifestyle inequalities at the neighbourhood level [2,[4][5][6]. By contrast, agent-based computational models use massive mobility datasets to recreate the behaviour of individuals as they interact and spread infections [7][8][9][10][11][12][13][14]. While this approach allows for household-level analyses and its elevated complexity makes it suitable for targeted experiments [15], it is not the best tool for the development of general strategies in public health [16]. Furthermore, the information necessary to calibrate agent-based simulations is not openly available for most of the world's cities. Thus, despite the fact that epidemics have the potential to be seeded anywhere, these intricate models have been overwhelmingly applied to populations in the developed world. Meanwhile, the fundamental mechanisms that drive health inequalities within metropolitan areas remain elusive.
Urban design determines the densities and relative locations of housing, jobs and services inside a city. Consequently, it influences the transportation choices of the population [17][18][19] and thus helps shape interaction networks through which diseases are spread. For example, the agglomeration of jobs and services drives large fractions of a city's population to gather in small fractions of its area, increasing contact rates between the residents of distant neighbourhoods [20,21]. In what follows, we quantify the agglomeration of urban mobility using two Gini coefficients (0 ≤ G origins , G destinations ≤ 1) that use neighbourhood-level census data to measure spatial inequalities in the area-density of housing and activities throughout a city. The extreme values G = 0 represent homogeneous distributions of the population (trip origins) or its activities (trip destinations); inversely, G = 1 indicates that all housing or activities are concentrated within a single location. In epidemiological terms, G = 1 produces homogeneous mixing conditions, which allow individuals to interact indiscriminately with all other members of the population. Under this scenario, disease spreading may be modelled using simple ordinary differential equations where the total population N = S + I + R is split into susceptible S, infected I and recovered R groups. However, reality deviates from these conditions (G = 1) and successful models must introduce heterogeneous contact networks to link the members of a population [22].
Computational models of disease spreading have incorporated complex network structures to simulate person-to-person contacts through which infections spread [7][8][9][10][11][12][13]. To inform this type of computational models, observational studies have tracked people's interactions in day-to-day settings. Since then, it has been established that contact rates and the resulting risk of infection can vary widely with age, sex, employment status and other characteristics [23][24][25][26]. Thus, output from state-of-the-art epidemiological models can drastically change as additional layers representing sociodemographic inequalities are incorporated to describe urban populations [15].
Given the elevated complexity of processes that influence the evolution and sampling of infectious outbreaks in metropolitan areas, statistical analyses of surveillance data stand to benefit from theoretical explorations of the links between health and place [27]. In this article, we formulate a mathematical framework to evaluate the notion that crowding, which is driven by urbanization and mobility patterns, can change the growth rate and size of infectious outbreaks [4,14]. To do so, our model introduces urban geography as the leading-order component of a distributed-contacts susceptible-infected-recovered (DC SIR) epidemiological model. Our method uses spatially resolved census data to infer transportation patterns in cities and represent disease spreading as a non-local process. As used here, economic data help bypass the need for large mobility datasets in urban health simulations. In sum, our approach sacrifices the hyperrealism of agent-based simulations to instead resolve the spatial patterns that arise when a heterogeneous set of metapopulations interact. Although our model allows for the inclusion of additional layers of complexity, here we only characterize the role of urban design in disease spreading and use the Guadalajara Metropolitan Area (GDL) as an example. City-wide hospitalization data from the 2009 Influenza A/H1N1 pandemic were used to calibrate disease parameters (electronic supplementary material, figure S1) but cannot inform the validity of neighbourhood-level patterns in our results. Lastly, we ran simulations in hypothetical cities with the same area, population density, number of schools and businesses as GDL. This allowed us to demonstrate that changes in the spatial distributions of housing, education and economic activities can yield large variations in the size and early growth rate of epidemics across cities that would be deemed identical from a large-scale perspective.

Modelling framework (a) Inferred mobility patterns
High-resolution maps of residential density provide the background distribution of a city's population, introducing a first layer of spatial dependence to the SIR transmission framework. Likewise, the locations and prominence of activity hubs determine common trip destinations were human interactions occur and diseases are spread. In our gravity model, trips originate at home and the probability density P(x, y ) that a person residing at x will visit y on a given day is the joint result of two factors: the distance r xy between both sites and the overall popularity of y as a destination. The effect of distance r xy on the likelihood of displacements between two points has been studied empirically [28], yielding the general form Simplifying daily mobility as radial displacements out of a place of residence x, P( r xy ) yields a first estimate of the spatial distribution of individuals as they go through their daily routines. Moreover, the model parameters r 0 , b, κ can be used to capture the effects of transportation infrastructure, as the quality of these services largely determines the willingness of individuals to travel long distances [29]. 2)) further refines our approximation of human mobility patterns in cities. TAS is an estimate of the daily number of visitors driven into a location y by education and economic activities [30]. Its value may be calculated using school enrolment data and the number of workers employed at y by different economic sectors, allowing different types of establishments to be weighed by the traffic they each induce. Our model incorporates TAS through equation (2.2), where parameters Re, Se, In and Pr denote the number of jobs registered at y by retail, service, industrial and primary activity organizations, respectively [30]. Similarly, St is the number of students enrolled at educational institutions inside the same area. 3) thus defines a probabilistic gravity model that represents mobility as the weighted influence areas of individuals who try to minimize their displacements but will travel further given an economic incentive (higher TAS). Here, the normalization factor c(x) allows to adjust for the average number of daily trips made by residents of different neighbourhoods.
In what follows, we use the empirical values κ = 80 km, b = 1.75 found by González et al. [28] but prescribe r 0 = 5 km. This is a conservative choice, as it facilitates longer trips and thus homogenizes the transportation habits of the city's inhabitants.
Before integrating these concepts into an epidemiological transmission model, we must distinguish between the behaviour of susceptible and infected groups. We define the mobility of symptomatic groups via equation (2.4), where 0 < α(x) < 1 is an isolation parameter whose spatial dependence allows the representation of neighbourhood-targeted intervention strategies royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:  as well as social and economic factors that influence the adaptive behaviour of sick individuals. The term H(x, y ) accounts for the added probability of visits to the nearest healthcare facility, as sick individuals will likely seek diagnosis and treatment.
In summary, we use the gravity model in equation (2.3) to infer mobility patterns in cities. TAS represents the number of people who visit a location on a given day and is calculated using employer and educational databases alike (equation (2.2)). Although the prominence of different destinations varies throughout the week and within a single day (for example, restaurants and schools have strongly marked daily cycles), our simulations consider TAS to be fixed in time. Lastly, we incorporate all state-run healthcare facilities (henceforth hospitals) by assigning infective individuals to the nearest hospital and assuming that only 8% of infectives will visit these hospitals during their infective period. Sensitivity analyses were performed to ensure that our main conclusions are not drastically affected by the frequency of trips to the hospital and are included in §5. An overview of all spatial components of our framework, as estimated for GDL, is shown in figure 1 and further described in §2c.

(b) Metapopulation model
Let us define three identical domainsΩ = Ω = Ω ∈ R 2 to represent the urban region under study. Members of the population N are then segregated under an SIR transmission scheme and distributed following population density functions S(x, t) This approach links people to their place of residence (henceforth x ∈ Ω for susceptibles andx ∈Ω for infectives), allowing to formulate a DC SIR model (equations (2.5)-(2.7) adapted from [31,32]). Here, β(y ) is the probability of contagion given a susceptible-infective interaction that happens at y and γ is the recovery rate. The interaction kernel k(x, y , t) yields the expected number of interactions between a member of S(x, t) and all infected individuals present at a trip destination y ∈ Ω and time t [31,32]. Thus, k(x, y , t) introduces the heterogeneous contact networks through which diseases are spread. and To define the interaction kernel k(x, y , t), we use origin-destination density functions P S (x, y ), P I (x, y ) for members of S(x, t) and I(x, t), respectively. Next, we compute the total number of infected individuals expected to visit y at a given time, which is the integral from all trip origins Ω P I (x, y )I(x, t) dΩ. Taking the product with P S (x, y ) thus gives an expression for the expected number of SI contacts occurring at y for an individual who resides at x A similar procedure yields the reproductive number R t (x, t) defined as the number of secondary cases caused by a single infected individual who lives atx and contracted the disease at time t. Instead of calculating the number of SI interactions for a susceptible in transit, we now use the spatial distribution of susceptibles at a given time. This is given by the integration of susceptible mobility over all trip origins Ω P S (x, y )S(x, t) dΩ. Multiplying this by the origin-destination density for infected mobility P I (x, y ) then yields the expected number of SI interactions occurring at y for a member of I(x, t) Assuming that all variables are slowly varying over the first generation of disease transmission, we can approximate the basic reproduction number R 0 (x) as Notice that k(x, y , t) and q(x, y , t) represent the vulnerability and disease spreading capacity of individuals during an infectious outbreak. Moreover, their spatial dependence implies that a person's role during an epidemic is a function of their place of residence but determined by characteristics of the locations they visit. With this mathematical foundation, one could consider developing parametrizations of environmental conditions such as relative humidity (function of y ) and demographic factors that influence contact patterns (functions of x,x). Our DC-SIR simulations of GDL classify two subsets of the population by place of residence and age group. Thus, we defined SIR subgroups S j , I j , R j and mobility functions P Sj , P Ij to represent the people in each age group j = 1, 2 (adults and children). Consequently, the integrand over trip destinations Ω in equation (2.5) was modified to account for interactions between a particular susceptible subgroup and infectives I j of all ages, each of them with an age-specific transmission potential β j . In our simulations, β 1 and β 2 were calibrated to minimize the leastsquares misfit between simulation results and data from the 2009 AH1N1 pandemic (electronic supplementary material, figure S1). However, note that our main results depend primarily on the structure of mobility matrices and do not change significantly with variations in β 1 , β 2 .
Because our current focus is to explain the observed epidemiological impacts of urban geography and crowding [2,4,14], contact rates in our model are a linear function of visitor density. Although this assumption can fail to quantify face-to-face contacts within dense gatherings [33,34], a network model that follows this same approach was able to reproduce the evolution COVID-19 case counts and their uneven demographic distribution within 10 large US cities [35]. Moreover, the contact rate variations that arise in our simulations are comparable with the statistical uncertainty of observational estimates [23,25,26] that have not offered a systematic explanation of this variability. Thus, contact rates under the formulation in (2.5) produce plausible outcomes and may help explain unforeseen mechanisms driving health inequality. Future studies under the present framework can further refine the representation of heterogeneous contact rates by assigning full spatial and temporal dependence in the transmission parameter β(x, y ,x, t).

(c) Inclusion of census data
We solved equations (2.5)-(2.7) in finite differences by mapping all their variables onto a 1580element triangular mesh [36] representing GDL (figure 1). Neighbourhood-level data from the 2010 census [37] were used to calculate the total adult (age > 15) and infantile (age ≤ 15) populations at each grid element. TAS(y ) was estimated for adults and children from two publicly available datasets: the 2015 National Statistical Directory of Economical Units (DENUE) lists all registered employers in Mexico along with their sector, number of workers and location [38]. Similarly, the National System for School Information (SNIE) locates all of the country's schools and universities and lists their enrolment at each educational stage [39]. DENUE employment data were combined with SNIE enrolment at the high school and university levels to calculate TAS for the adult population via equation (2.2). On the other hand, TAS for infantile mobility was inferred using SNIE enrolment data for educational stages up to the middle school level, roughly corresponding to the age threshold between our age groups. Next, time-constant origindestination matrices were obtained for all grid elements using equations (2.3) and (2.4). Resulting maps of population density and TAS for adults and children are shown in figure 1.
Mobility parameters in equation (2.1) were established a priori as r 0 = 5 km, b = 1.75, κ = 80 km [28]. Sensibility analyses showed that variations in these values primarily impact G destinations , whose consequences are shown in figure 2, but do not change the fundamental phenomena described in this study. Similarly, the normalizing factor c(x) (equation (2.3)) was chosen to yield an average of 1.18 and 0.85 daily trips per person for adults and children, respectively. To compute the mobility of infected individuals via equation (2.4), we assume that α = 0.8 and add an 8% probability that infectives will visit healthcare facilities once during their infective period. While these values may underestimate rates of patient self-isolation and their tendency to seek treatment and diagnosis, further observational data are required to improve the representation of adaptive behaviour and hospital transmission within this modelling framework.
Numerical solutions to equations (2.5)-(2.7) were obtained using a forward finite-difference scheme that considered piecewise constant functions defined over the triangular mesh shown in figure 1. Continuous input variables were transformed to be constant at all grid elements, which are henceforth noted as Ω j ⊂ Ω, and whose area is A j = Ω j dΩ. For instance, the number of susceptibles in Ω j at time t k was calculated as S j (t k ) = Ω j S(x, t) dΩ. Origin-destination matrices were defined using the corresponding density functions (equation (2.3)) as Similarly, β(y ) can be mapped onto the grid as With all variables defined within the numerical grid, integrals overΩ and Ω in equation (2.5) were replaced by sums over infective originsΩ h and trip destinations Ω m . Thus, the temporal evolution in the number of susceptibles at Ω j over one time step t was computed as  To simplify the analysis of model output, all parameter values were set to be constant in time and space. Although setting time-constant parameters overlooks seasonal variations in viral transmission as well as daily and weekly cycles in urban mobility, the primary results highlighted in this article result directly from the structure of k(x, y , t) and q(x, y , t) given the gravity formulation of P S (x, y ) and P I (x, y ).

Results
Our modelling framework was tested using data for GDL, where census and economic data [37][38][39] were processed to derive the daily mobility patterns of children (age ≤ 15) and adults (age > 15). Lorenz curves in figure 3 represent the different degrees of crowding driven by housing (G origins = 0.32) and daily activities (G destinations = 0.56). As is true for large cities [20,21], inferred trips were largely directed towards a few, hyper-affluent areas. In our model, these areas receive 25% of all daily trips but occupy less than 5% of the metropolis; meanwhile, housing places 25% of the population throughout 13.5% of the city's most densely populated regions. The condition G destinations > G origins requires that some neighbourhoods have a net loss of occupants during workdays, as their inhabitants leave and agglomerate around major activity hubs. This is true for 60% of all neighbourhoods in GDL, which then act as net sources of mobility whose role is to supply more affluent destinations with visitors.
Values of β 1 , β 2 for adults and children were calibrated using city-wide hospitalization data from the 2009 A/H1N1 influenza pandemic (electronic supplementary material, figure S1). In our model, this parameter yields a linear relationship for the probability of falling ill given the mean area density of infectives that one encounters throughout the day. Obtained values of β 1 , β 2 indicate that the daily risk of contagion increased by 1.1 ± 0.1% for every 1000 infective adults per km 2 added to one's surroundings. Similarly, the addition of 1000 infective children per km 2 raised the probability of falling ill on a given day by 3.2 ± 0.3%. With β 1 and β 2 fixed everywhere, spatial patterns in the evolution of epidemics result entirely from the number and age of people who visit each one of the city's neighbourhoods (equations (2.8) and (2.9)). Hence, spatial variations in our simulation results and in the basic reproductive number R 0 (x) are a direct consequence of the city's transportation network (equation (2.10)), which is by definition a product of urban design and land-use patterns (equations (2.1)-(2.3)).
Spatial variations in the basic reproductive number R 0 (x) (figure 4) had a profound impact on the spatio-temporal evolution of outbreaks in our model. Firstly, the age and place of residence of patient zero (initial conditions) significantly influenced the early rate of epidemic growth. In fact, the R 0 of patient zero can delay the peak of an epidemic by as much as 9 days (figure 5a,b). Secondly, people with the highest R 0 (x) also had the highest probability of falling ill. In particular, the increased contact rates of downtown residents allowed them to spread pathogens 50% more efficiently than their suburban counterparts (figure 4), but also put them at a higher risk of contagion. Regardless of initial conditions, this particular situation gave rise to a spatial pattern in which influenza spread as waves of disease that emanate out of the city centre (figure 5c; electronic supplementary material, video S1). Similar patterns have been observed in agent-based simulations before [10] and may thus be a general characteristic of flu-like epidemics in urban populations.
To better appreciate the structure of wave-like patterns and inequalities in the spread of disease, we define the relative incidence Υ (x, t) (equation (3.1)) and use simulation results to inspect its statistical dependence on the distance rx x * between neighbourhoodsx and the city's largest retail hub x * , where TAS peaks. Boxplots in figure 5c show that our analyses predict the  incidence of influenza to be highest in downtown GDL and decreasing towards the suburbs.
It is clear from figures 4 and 5 that mobility hotspots play an important role in setting the growth rate and evolution of epidemics in our model. To gain more insight, we relocated businesses, schools and housing across the GDL numerical grid and ran simulations in the resulting geographies. This allowed us to compare outbreaks across cities with unique transportation networks but the same demographics, area, number of schools, businesses and daily trips as GDL. In one set of experiments, we modified the Gini coefficients of mobility by relocating businesses, schools and housing across areas with low and high TAS or population density. As a result, we modulated the agglomeration of urban mobility but preserved the spatial structure of GDL. In another experiment, we redistributed housing, schools and businesses at random to evaluate whether the effects of agglomeration can be generalized to all cities despite their spatial structure. All simulations used the same values of β (previously calibrated for GDL) and an infective period 1/γ of 4 days. Results are summarized in figure 2.
Our simulations suggest that, for all cities, the basic reproduction number R 0 varies as a function of G destinations and is virtually independent of G origins (figure 2a,b). This relationship is nonlinear for adults and linear for children, mainly because G destinations is set by commercial and economic superclusters that primarily attract adults. Furthermore, our results suggest that housing and activities in metropolitan areas with G destinations < 0.58 can be redistributed so that the mean basic reproduction number of adults remains under the threshold value R 0 = 1. Namely, in this illustrative case where β is fixed everywhere and transmission dynamics are solely driven by urban design, cities can be redesigned to render their populations incapable of sustaining epidemics.
At the end of an outbreak, the attack rate z measures the fraction of the population that contracted the disease. Under homogeneous mixing conditions (G destinations = G origins = 1), R 0 and z are linked by equation (3.2) [40]. At all values of R 0 , this relationship predicts higher attack rates z than observed in our simulations (figure 2c). Disagreements are largest for low values of R 0 , when fractions of the urban population have R 0 < 1 and thus do not contribute to exponential growth. This highlights the importance of understanding population heterogeneity: because R 0 and incidence may covary in space (figures 4 and 5c), population-averaged parameters may differ from the characteristic values obtained from public health reports, whose samples are biased towards subgroups with a higher incidence. Differences between the GDL and random experiments in figure 2 suggest that, at equal R 0 , realistic urban layouts (where activity areas are clustered near each other) may act to decrease the size of epidemics when compared with cases where housing and activities are distributed randomly. However, notice that a greater G destinations is required for randomly designed cities to have the same values of R 0 as GDL (figure 2a).

Discussion
Our parametrization of contact heterogeneity in metropolitan areas produces smaller epidemics than homogeneous mixing models (figure 2) and predicts that, depending on where they live, people play unequal roles in disease spreading ( figure 4). These inequalities relate the identity of patient zero to the timing of influenza-like outbreaks (figure 5a,b) and can give rise to spatiotemporal patterns in the spread of disease (figure 5c; electronic supplementary material, video S1). In fact, these processes mirror those of large-scale transportation networks, whose heterogeneity leads to spatial inequalities in the size, timing and growth rate of epidemics [1,41,42]. As similar insight is used for the development of optimal intervention strategies in global-scale infectious outbreaks [43], results like those presented in figure 4  health officials who seek to optimize resources during infectious outbreaks. For example, health inequalities inferred here suggest that the efficiency of vaccination campaigns is likely to vary whether they target the inhabitants of the city centre, its visitors, or people living in suburban areas (figures 4 and 5). Similarly, the need for treatment and diagnosis is expected to differ across neighbourhoods (figure 5c). Although our figures present the health impacts of urban geography as a function of place of residence, inequalities result primarily from the remote influence of people in other neighbourhoods and the characteristics of places where interactions occur (equations (2.5)-(2.10)). The inclusion of non-residential processes is essential to the accurate representation of environmental health impacts [44,45] and is achieved here through interaction kernels k(x, y , t)  and q(x, y , t). Although we used a constant disease parameter, the representation of local and remote environmental conditions can be further refined by assigning full spatial dependence to transmission parameters so that β = β(x, y ,x, t). Generally speaking, our analytical framework is designed to map geographical information onto social space, establishing connections that are weighed by a risk of contagion β. This assigns realistic network characteristics to a spatially explicit disease transmission model, which simplifies the interpretation of simulation results and may enable additional theoretical and statistical analyses [46,47].
Our results highlight the extent to which the assumption of homogeneous mixing in metropolitan areas can bias model results. Most notably, epidemic size in simulations that use the same transmission parameters β 1 , β 2 for adults and children living in hypothetical, seemingly identical cities, covers the entire range 0 < z ≤ 1 (figure 2c). This suggests that macroscopic quantities such as population size and density hold little to no dynamical significance in the evolution of infectious outbreaks. Instead, the spatial organization of human mobility (quantified here using the Gini coefficient G destinations , figure 3) seems to control epidemic growth ( figure 2a,b) and is statistically correlated with population size [4,14]. Dalziel and collaborators explored this notion [14] by comparing agent-based model results across 48 Canadian cities, but were not able to isolate the effects of spatial heterogeneity from all other information embedded in massive mobility datasets that were used as model input.
Relationships between G destinations , R 0 and the attack rate z shown in figure 2 suggest that, through its influence on transportation networks, urban design may be optimized to modify epidemic growth rates and thus reduce the probability of seeding large outbreaks. The existence of invasion thresholds under which populations cannot sustain epidemic growth is a general characteristic of metapopulation models [48]. While many authors have conjectured about the possibility of designing cities to minimize disease prevalence [4,14,49,50], we present the first dynamical argument to link these ideas and the invasion threshold proposition of Colizza & Vespignani [48]. More specifically, our results suggest that, by evenly distributing activity hubs throughout a city (instead of clustering them in the city centre), city planners can segregate subsets of the population and potentially inhibit the rapid transmission of pathogens across distant neighbourhoods (figure 2). Consistent with known statistical relations between the transmission potential of influenza and the spatial organization of human behaviour [4,14], our framework explains a plausible mechanism behind health inequalities across the neighbourhoods of the world's cities.
The mechanism through which urban design impacts epidemic growth is illustrated in figure 6: when daily activities are centralized in a handful of hyper-affluent areas (figure 6a), mixing patterns become more homogeneous [21] and thus favour interactions between residents of distant neighbourhoods (figure 6c). However, when activity hubs are distributed throughout the city (figure 6b), economic opportunities become locally available to greater fractions of the population. Ultimately, this reduces the probability of interactions between residents of distant neighbourhoods (figure 6d) and can thus inhibit the spatial spread of influenza-like pathogens. This mechanism is evidenced by results in figure 2. Positive relationships in figure 2 suggest that the size and R 0 of urban epidemics increase with the centralization of daily activities. Moreover, the results of simulations made using randomized activity layouts (pink lines) show that when Notice that children's R 0 was not affected by the breakup of activity hubs in figure 2b. This lack of effect may be due to the fact that children's TAS is already scattered through GDL in a seemingly random manner, while the TAS of adults is highly organized (figure 1). While the centralization of labour markets is thought to favour economic efficiency [19,51], our theoretical analyses suggest that this may have negative epidemiological consequences. By contrast, the decentralization of economic activities may help slow the propagation of influenza-like outbreaks (figures 2 and 6), reduce pollution and improve overall welfare [52].

Conclusion
Agent-based models are the standard method to run realistic simulations of disease spreading in cities [7][8][9][10][12][13][14]. These intricate models have allowed for the evaluation of intervention strategies [12] and health inequalities [8,15], but require input from massive mobility datasets that are not openly available for most of the world's cities. Consequently, USA [53] and European cities are over-represented by agent-based methods, which remain inaccessible for many research groups and public health organizations. Likewise, large-scale epidemiological models often assume homogeneous mixing conditions in cities and neglect all processes that occur at the neighbourhood level [1,4,11,54]. When compared with agent-based model output, this simplification can lead to overestimating the size of epidemics and miscalculating their timing, and can thus introduce considerable bias [11]. Although far simpler, the parametrization of small-scale heterogeneity developed in this study yields the same conclusions (figures 2 and 5). Thus, we believe that the mathematical framework presented here is an adequate alternative to introduce neighbourhood-level processes in large-scale epidemiological simulations.
Because it uses standard census and economic data to infer daily mobility patterns, the method described in §2 enables researchers to investigate transmission dynamics in virtually any city. Moreover, formulating our model in terms of economic and educational activities allows simulations to consider different economic scenarios, which is crucial for long-term planning and the representation of lockdowns [35].
Unfortunately, mobility parameters used here (equations (2.1)-(2.3)) were not calibrated with origin-destination data (which exist for GDL during the 2009 A/H1N1 pandemic [13] but are not openly available) and may thus be inaccurate. Nonetheless, our analyses rely on the patterns that arise from fundamental processes driven by the agglomeration of economic activities and highlight the role of small-scale inequalities that are not resolved by most epidemiological datasets [5]. Agent-based simulations of influenza outbreaks in the city of Buffalo [10] show spatio-temporal patterns (their fig. 9) that are very similar to those presented here (electronic supplementary material, video S1; figure 4) and thus suggest that urban design may play a leading-order role in setting the dynamics of influenza-like outbreaks in cities throughout the world.
In reality, spatial inequalities in health result from the complex interactions of social, environmental and biological processes among which urban design is only one. However, most datasets in public health lack the detail and coverage necessary to resolve neighbourhood-level inequalities in the spread of disease [5]. Thus, small-scale characteristics of model output are rarely validated and many of their underlying assumptions often go unchecked.
The main results described in this article stem from the fundamental assumption that transmission rates increase with the density of visitors at places of interest (equations (2.6) and (2.8)). Although publicly available data for GDL cannot validate this assumption, zip code-level analyses [4] have demonstrated that systematic spatial disparities in the spread of influenza are related to crowding caused by daily mobility patterns. More recently, a mobility-based network model using this same assumption was able to reproduce observed COVID-19 case counts across 10 US cities, as well as observed variations in disease prevalence among racial and socioeconomic groups [35].
Although our analyses focus on the possible effects of urban design on disease spreading, complete understanding of heterogeneous epidemic growth must account for complex socioenvironmental factors that modulate disease exposure [5,15]. The theoretical formulation developed in this article provides a sociodemographic backdrop onto which successive layers of complexity can be added. Namely, the introduction of a spatially dependent β(x, y ,x) can account for the demographics of susceptibles (x), infectives (x) and environmental conditions under which interactions occur (y ).
For example, the present analysis does not thoroughly explore the role of disease transmission in and around hospitals where sick people agglomerate to seek treatment and diagnosis. However, estimates made using equation (2.10) suggest that hospital visits may introduce variations to spatial patterns in the spread of disease ( figure 7). This is most likely to happen when hospitals are located near major commercial, educational and activity hubs that increase the number of susceptibles who risk interacting with dense groups of sick individuals. Thus, the distribution of healthcare services relative to mobility hotspots may exacerbate the effects of urban design in disease spreading. Future analyses also stand to benefit from improved and spatially dependent representations of reactive behaviour, the ability to self-isolate and time-varying mobility matrices among other factors.
Mathematical representations of processes that drive health inequality are valuable because, before spatial patterns found in epidemiological data can be attributed to social and geographical phenomena, scientists must have clear references that help identify the telltale signs of such phenomena impacting disease spreading. Without prior theoretical explorations, purely statistical studies of public health data are unlikely to disentangle the dozens of variables that influence heterogeneous disease prevalence. For example, researchers have noted that health inequality cannot be fully explained by people's place of residence, as people's health is also influenced by the places they visit [44]. Yet very few studies account for non-local processes that relate characteristics of neighbourhoods to the health of residents elsewhere [45]. Our theoretical framework (equations (2.5)-(2.7)) provides an analytical, tractable way to avoid this 'residential effect fallacy' and fully account for the remote influence of neighbourhoods and their people throughout a given city. The development of frameworks that seek to understand the complex drivers of inequality is crucial to help design new policies and research that can improve urban health [27,49,50]. As future process-based studies explore the numerous urban determinants of health, multifactorial indices that describe a population's vulnerability may help tailor plans to improve cities' resilience and preparedness against new crises. This study proposes the Gini coefficient (figure 3) as a viable indicator for the influence of urban geography on epidemic growth (figure 2). Likewise, our simulations show that the unequal distribution of daily activities throughout a given city can drive robust patterns in the spread of disease. As a result, urban design may give rise to systematic health inequalities between the neighbourhoods and subgroups that make up urban societies (figures 4 and 5; electronic supplementary material, video S1).
Data accessibility. Software and data used to perform numerical simulations shown in this study can be accessed at https://github.com/inciente/Urban-Epidemiology/tree/master/RUDDS.