## Abstract

The interaction among multiple microbial strains affects the spread of infectious diseases and the efficacy of interventions. Genomic tools have made it increasingly easy to observe pathogenic strains diversity, but the best interpretation of such diversity has remained difficult because of relationships with host and environmental factors. Here, we focus on host-to-host contact behavior and study how it changes populations of pathogens in a minimal model of multi-strain interaction. We simulated a population of identical strains competing by mutual exclusion and spreading on a dynamical network of hosts according to a stochastic susceptible-infectious-susceptible model. We computed ecological indicators of diversity and dominance in strain populations for a collection of networks illustrating various properties found in real-world examples. Heterogeneities in the number of contacts among hosts were found to reduce diversity and increase dominance by making the repartition of strains among infected hosts more uneven, while strong community structure among hosts increased strain diversity. We found that the introduction of strains associated with hosts entering and leaving the system led to the highest pathogenic richness at intermediate turnover levels. These results were finally illustrated using the spread of *Staphylococcus aureus* in a long-term health-care facility where close proximity interactions and strain carriage were collected simultaneously. We found that network structural and temporal properties could account for a large part of the variability observed in strain diversity. These results show how stochasticity and network structure affect the population ecology of pathogens and warns against interpreting observations as unambiguous evidence of epidemiological differences between strains.

**Author summary** Pathogens are structured in multiple strains that interact and co-circulate on the same host population. This ecological diversity affects, in many cases, the spread dynamics and the efficacy of vaccination and antibiotic treatment. Thus understanding its biological and host-behavioral drivers is crucial for outbreak assessment and for explaining trends of new-strain emergence. We used stochastic modeling and network theory to quantify the role of host contact behavior on strain richness and dominance. We systematically compared multi-strain spread on different network models displaying properties observed in real-world contact patterns. We then analyzed the real-case example of *Staphylococcus aureus* spread in a hospital, leveraging on a combined dataset of carriage and close proximity interactions. We found that contact dynamics has a profound impact on a strain population. Contact heterogeneity, for instance, reduces strain diversity by reducing the number of circulating strains and leading few strains to dominate over the others. These results have important implications in disease ecology and in the epidemiological interpretation of biological data.

## Introduction

Interactions between strains of the same pathogen play a central role in how they spread in host populations. [1–7]. In *Streptococcus pneumoniae* and *Staphylococcus aureus*, for instance, several dozen strains can be characterized for which differences in transmissibility, virulence and duration of colonization have been reported in some cases [**8**,9]. Strain diversity may also affect the efficacy of prophylactic control measures such as vaccination or treatment. Indeed, strains may be associated with different antibiotic resistance profiles [3, 5, 10, 11], and developed vaccines may only target a subset of strains [2, 3, 12]. With the increasing availability of genotypic information, it has become easy to describe the ecology of population of pathogens and to monitor patterns of extinction and dominance of pathogen variants [13–17]. However, the reasons for multi-strain coexistence patterns (e.g. coexistence between resistant and sensitive strains) or dominance of certain strains (e.g. in response to the selection pressure induced by treatment and preventive measures) remain elusive. One may invoke selection due to different pathogen characteristics, but also environmental and host population characteristics, leading to differences in host behavior, settings and spatial structure may affect the ecology of strains [14–19]. In particular, human-to-human contacts play a central role in infectious disease transmission [20]. This is increasingly well described thanks to extensive high-resolution data - including mobility patterns [21–23], sexual encounters [24], close proximity interactions in schools [25,26], workplaces [27], hospitals [16,28–31], etc.-, that enable basing epidemiological assessment on contact data with real-life complexity [32,33]. For instance, the frequency of contacts can be highly heterogeneous leading more active individuals to be at once more vulnerable to infections and acting as super-spreaders after infection [24,33–35]. Organizational structure of certain settings (school classes, hospital wards, etc.) and other spatial proximity constraints lead to the formation of communities that can delay epidemic spread [36,37]. Individual turnover in the host population is also described as a key factor in controlling an epidemic [20,38]. It is likely that, since they impact the spread of single pathogens, the same characteristics could affect the dynamics in multi-strain populations. It was shown, indeed, that network structure impacts transmission with two interacting strains [39–46], the evolution of epidemiological traits [47–49] and the effect of cross-immunity [50,51]. Yet in these cases, complex biological mechanisms - such as mutation, variations in transmissibility and infectious period, cross immunity - were used to differentiate between pathogens, thereby making the role of network characteristics difficult to assess in its own right.

For this reason, we focused on the dynamical pattern of human contacts and examined whether it contributes to shaping the population ecology of interacting strains under minimal epidemiological assumptions regarding transmission. We described a neutral situation where all strains have the same epidemiological traits and compete via mutual exclusion (concurrent infection with multiple strains is assumed to be impossible) in a Susceptible-Infected-Susceptible (SIS) framework. We studied the spread of pathogens in a host population during a limited time window, disregarding long-term evolution dynamics of pathogens. More precisely, new strains were introduced through host turnover rather than *de novo* mutation or recombination in pathogens. We quantified the effect of network properties on the ecological diversity in strain populations with richness and dominance indicators. We assessed in turn heterogeneities in contact frequency, community structure and host turnover by comparing simulation results obtained with network models exhibiting a specific feature. We then interpreted *S. aureus* carriage in patients of a long-term care facility in the light of these results.

## Results

### Effects of contact heterogeneity

We simulated the stochastic spread of multiple strains on a dynamical contact network of individuals (nodes of the network). Individuals can be either susceptible or infected with a single strain at a given time, and, for each strain, *β* and *μ* indicate the transmission and the recovery probability respectively. We assumed continuous turnover of individuals, who enter the system with probability λ_{in}, and associated injection of previously unseen strains, carried by incoming individuals with probability *p _{s}*. In order to probe the effect of contact heterogeneity on strain ecology we compared a homogeneous model (HOM) in which all nodes have the same activity potential, i.e. they have equal rate of activation to establish contacts, with a heterogeneous model (HET) where the average activity potential is the same, but the rate of activation is heterogeneous across individuals [34]. Then, for each network model we characterized the structure of pathogen population at the equilibrium through ecological diversity measures, including species richness and evenness/dominance indices [52,53].

We show sample epidemic trajectories in Fig 1A and average quantities in panels B-D of Fig 1. The prevalence, summarized in Fig 1B for different transmissibility values, displays a well-known behavior for both static and dynamic networks: contact heterogeneities lower the transmissibility threshold above which total prevalence is significantly above zero, thus allowing the spread of pathogens with low-transmissibility. At the same time, however, heterogeneities hamper the epidemic spread when *β* is large, reducing the equilibrium prevalence [35]. Fig 1 shows that richness (i.e. the number of distinct strains co-circulating) is not linked to the prevalence in a straightforward way. For sufficiently large *β*, the reduction in richness of HET with respect to HOM is important even for the case with mild contact heterogeneity, when prevalence is barely affected (Fig 1C). The scaling between prevalence and richness is not linear as *β* varies (Fig 1D), and the relation between the two quantities varies appreciably among contact networks. In correspondence of a fixed value of prevalence, heterogeneous networks have lower richness - e.g. a prevalence value of 250 corresponds to 21% lower richness in HET with respect to HOM, as highlighted in Fig 1D. This fact can be explained by the balance between injection of new strains and extinction of already circulating ones. The extinction of a stochastic SIS process is certain, being the disease-free state the unique absorbing state. When multiple SIS processes spread on the same network, the persistence time of a single process is short, in the sense that it scales linearly with the size of the system [54] (in contrast to the lifetime of a single SIS process which scales exponentially with the size of the system when *β* is above the threshold value [55]). Here we find that network heterogeneity shortens the persistence time of a strain (see also Fig S1 in the supporting information). Indeed active nodes involved in a larger number of contacts get infected more frequently [35]. Strains introduced by low-activity nodes are likely to be surrounded by nodes already infected, thus limiting transmission. As a consequence they encounter extinction more easily. In other words, contact heterogeneities strengthen the competition induced by mutual exclusion.

The presence of hubs not only reduces richness for sufficiently large *β*, but affects more profoundly the distribution of strains’ abundances, i.e. the strain-specific prevalence, leading to stronger fluctuations (Fig 2A). If on one hand hubs accelerate the extinction of certain strains, on the other hand they act as super-spreaders and amplify the prevalence of other strains. This results in a situation of dominance where certain strains, despite having no biological advantage, become able to overcome the others reaching a significant proportion of the population. This situation is synthesized by the Berger-Parker index, defined as the relative abundance of the most abundant strain - an alternative indicator, the Shannon evenness, is shown in Fig S2 in the Supplementary Information. Fig 2B-C shows how this quantity varies when increasing strain transmissibility. As expected, at low *β* values the short transmission chains produced by different strains barely interact. The competition becomes, instead, more pronounced as *β* increases and, consistently, the effect of the network topology becomes more relevant.

We tested whether additional mechanisms of strain injection were leading to different results. In Fig S3 we assumed new strains to infect susceptible nodes already present in the system with probability *q _{s}*, mimicking in this way transmissions originating from an external source, as it can happen in real cases. The plot of Fig S3 shows the same qualitative behavior described here.

### Effect of community structure

We considered a community model (COM) with *n _{C}* communities in which all nodes are as active as in HOM, but direct a fraction

*p*of their links within their community and the rest to nodes in the remaining

_{IN}*n*− 1 communities. The closer

_{C}*p*is to 1, the stronger the repartition in communities.

_{IN}Fig 3A,B shows that a network with communities displays a higher richness for large *β*; even when community structure barely affects prevalence (Fig 3B). However, the effect is important only when communities are fairly isolated (*p _{IN}* = 0.99) and the injection from the outside is not so frequent - otherwise the effect is masked by strain injection which occurs uniformly across communities. In particular, for the values of

*p*= 0.78 and

_{IN}*p*= 0.079, chosen to match the real-case scenario discussed latter in the text (i.e. the spread

_{s}*S. aureus*within a hospital), the difference with the homogeneous case is very small. For low

*β*, the behavior of the Berger-Parker index follows the trend in richness. The initial decrease in this indicator is due to the increase in richness, that occurs at constant prevalence and is thus associated to a decrease in the average abundance [56] - green curve in Fig 3C corresponding to

*p*= 0.99 and

_{IN}*p*= 0.01. At larger values of

_{s}*β*, instead, increased competition levels induced higher dominance levels.

The increase in strain diversity is due to the reduced competition among strains introduced in different communities. When coupling among communities is low, indeed, strains may spend the majority of time within the community they were injected in, thus avoiding strains injected in other communities. Fig 3D confirms this hypothesis by showing the Inverse Participation Ratio *(IPR)* [57] that quantifies uniformity in the repartition of abundance across communities. Values close to zero indicate uniform repartition, while, conversely, values close to 1 indicate that, on average, a strain is confined within a single community for most of the time (more details are reported in the Material and Methods section). The strength of the community structure does not affect the repartition of the total prevalence (squares in the plot), however it alters the average IPR value computed from the abundance of single strains, thus strains become more localized as *p _{IN}* increases. Notice that a certain degree of localization is present also in the homogeneous network, due to the case in which injected strains cause very few generations before getting extinct.

### Effect of turnover of individuals

Another important factor is node turnover as it has a profound impact on the ecological dynamics of strains for two reasons: incoming individuals contribute to richness by injecting new strains; on the other hand, the removal from the population of infected nodes breaks transmission chains and hampers the persistence of strains. The result of the interplay between these two mechanisms is summarized by the plot of richness as a function of *β* and node length of stay, *τ*, - Fig 4A. The figure, obtained with the HOM model, shows two distinct regimes. In the former case, richness decreases as *τ* increases, because replacement of individuals becomes slower and injections less frequent. In the high *β* regime, instead, the average richness at fixed *β* does not depend monotonically on the node turnover but it is instead maximized at intermediate *τ*. Interestingly, the optimal value of *τ* decreases as *β* increases. This behavior can be explained by looking at the balance between injection and extinction that determines the equilibrium value of richness, . This reads [58]:
where λ_{in} *p _{s}* is the rate at which new strains are introduced and

*T*

_{pers}is the average persistence time of a strain. The trade-off between injection and extinction appears as the ratio between the two time scales,

*T*

_{pers}and τ. In the limit τ → 0 the spread plays no role, even for high

*β*. As

*τ*increases, newly introduced infectious seeds have a higher probability to spread, thus the average extinction time initially increases super-linearly with τ (see Fig S4 in the Supplementary Information) resulting in an increase of richness. However, past a certain value of τ,

*T*

_{pers}does not grow super-linearly anymore, thus a further increase in τ is detrimental for pathogen diversity because it is associated to fewer introductions. This general behavior was not altered by the accounting for introductions by transmissions from an external source as shown in Fig S3.

We derive an approximate formula for *T*_{pers} considering an emerging strain competing with a single effective strain formed by all other strains grouped together. This formulation, enabled by the neutral hypothesis, makes it possible to write the master equation describing the dynamics and using the Fokker-Planck approximation to derive persistence times (see Material and Methods). Analytical results well reproduce the behavior recovered by simulations, and, in particular, the value of the length of stay maximizing richness for different β as shown by the comparison between white stars and continuous line in Fig 4C. The quantitative match for other values of *p _{s}* is reported in Fig S5 in the Supplementary Information.

Unlike richness, Berger-Parker index always increases monotonically with the length of stay - Fig 4B. This behavior is due to the correlation of this indicator with average abundance, similarly to what we discussed in the previous section.

### Spread of *S. aureus* in a hospital setting

We conclude by analyzing the real-case example of the *S. aureus* spread in a hospital setting [10, 59]. We used close-proximity-interaction (CPI) data recorded in a long-term health-care facility during 4 months by the i-Bird study [16, 28, 31]. These describe a high-resolution dynamical network, whose complex structure reflects the hospital organization, the subdivision in wards and the admission and discharge of patients [60]. Together with the measurements of contacts, weekly nasal swabs were done to monitor the *S. aureus* carriage status of the participants and identify the spa-type and the antibiotic resistance profile of the colonizing strains.

The modeling framework considered here well applies to this case. The SIS model is widely adopted for modeling the *S. aureus* colonization [62,63], and the assumption of mutual exclusion is made by the majority of works to model the high level of cross-protection recognized by both epidemiological and microbiological studies [64,65]. The dynamic CPI network was previously shown to be associated with paths of strain propagation [16]. Consistently, we assumed that transmission is mediated by network links with transmissibility β. In addition, new strains are introduced in the population carried by incoming patients, or contacts with persons not taking part in the study.

Fig 5A shows weekly carriage and its breakdown in different strains. Prevalence and richness fluctuate around the average values 87,3 ± 6,3 cases and 39,8 ± 2 strains, respectively. Simulation results are reported in Fig 5B, that displays the impact of transmission and introduction rate on richness and prevalence. When introduction rate is low we find a positive trend between richness and prevalence, consistently with the synthetic case. For higher injection rates, instead, the relation between richness and prevalence becomes first less pronounced and then decreasing because as soon as the transmission rate becomes larger the injection is hampered by the depletion of susceptibles.

To quantify the effect of contact patterns on *S. aureus* population ecology we compared simulation results with the ones on a network null model. Specifically we built the RAND null model that randomizes contacts while preserving just the first and the last contact of every individual. The randomization preserves node turnover, number of active nodes and links and destroys contact heterogeneities and community structure along with higher-order correlations. Fig 5C shows the comparison for different transmissibility values. The effect of the network is consistent with the theoretical results described for a heterogeneous network, i.e. smaller richness values correspond to the same prevalence in the real network compared to the homogeneous one. We then quantified the level of dominance of the multi-strain distribution by means of the Berger-Parker index. We chose for each network introduction and transmissibility rates that better reproduce empirical richness and prevalence and, interestingly, we found that, for the two cases, same average richness and prevalence correspond to different Berger-Parker behaviors. The Berger-Parker obtained with the real network is the highest and the one that better matches the empirical values - i.e. the empirical values are within one standard deviation of the mean for almost all weeks. Based on this result we argue that contact heterogeneities, along with the other properties of the contact network, contribute to the increased dominance of certain strains.

## Discussion

Multiple biological and environmental factors concur in shaping pathogen diversity. We focused here on the host contact network and we used a minimal transmission model to assess the impact of this ingredient on strain population ecology, quantifying the effects of three main network properties, i.e. heterogeneous activity potential, presence of communities and turnover of individuals. Results show that the structure and dynamics of contacts can alter profoundly strains’ co-circulation. Contact heterogeneities, by quickly driving low-abundance strains to extinction, reduce strain richness and favor strain dominance. Highly active nodes are known to play an important role in outbreak dynamics by acting as super-spreaders [33]. Here we showed that a similar mechanism could allow strains with no biological advantage to generate a large number of cases and outcompete other equally fit strains. This mechanism may potentially bias the interpretation of biological data. Dynamical models that do not properly account for contact structure could overestimate the difference in strains’ epidemiological traits in the attempt to explain observed fluctuations in strain abundance induced in reality by super-spreading events. Moreover, these models could provide biased assessment of transmission vs. introduction rates.

The presence of communities causes the separation of strains and mitigates the effect of competition thus enhancing co-existence. A similar behavior was already pointed out for the spread of *S. pneumoniae*, as induced by age assortativity [66], for the case of *S. aureus* where distinct settings were considered [62], and for a population of antigenic distinct strains in presence of cross-immunity [51]. We found that the impact of community structure is not so strong, and it is likely minor when individuals of different communities have frequent contacts. No appreciable variation was observed, indeed, for *p _{IN}* = 0.78, chosen to match the inter-ward coupling of the hospital CPIs network. Similar results can be expected for school classes or workplace departments presenting a similar level of community mixing. The effect on richness becomes appreciable for low community coupling (e.g.

*p*= 0.99 in Fig 3). This is consistent with a certain degree of diversity observed among strain belonging to separated communities, as it is the case of different hospitals [15].

_{IN}Eventually, the analysis of turnover of individuals revealed major effects on strain diversity, when this mechanism is also the main responsible for the introduction of strains in the population. When transmissibility is low richness decreases with host length of stay. When transmissibility is above the epidemic threshold we showed the existence of an optimal value of the length of stay that maximizes strain richness as a result of the interplay between two competing timescales, namely the typical inter-introduction time and the average persistence time of a strain. This provides insights for the spread of bacterial infections in transmission settings, such as hospitals or farms, that are of particular relevance for the spread of antimicrobial resistance and that are characterized by a rapid host turnover [15, 31, 67]. For the case of hospitals, for instance, they suggest that variations in patients’ length of stay, as induced by a change of policy, could have appreciable effects on the population structure of nosocomial pathogens.

We adopted a neutral model to better disentangle the relative role of the different network properties. A wide disease-ecology literature addressed the consequences of neutral hypotheses on multi-strain balance in order to provide a benchmark for interpreting the observed co-existence patterns and gauging the effect of selective forces potentially at play [11, 18, 68, 69]. Many of these works addressed, for instance, the co-existence between susceptible and resistant strains of *S. pneumoniae* [11, 68]. However, this assumption was rarely adopted in network models, that consider for the majority strains with different epidemiological traits with the aim of describing pathogen selection and evolution [47–49, 70]. Strains were assumed to have the same infection parameters in [50, 51], where the role of community structure and clustering was analyzed in conjunction with cross-immunity. With respect to these works, the minimal transmission model used here enabled a transparent comprehension of the role of the network. Multiple identical SIS processes can be mapped, in fact, on a single SIS process, in such a way that the wide literature of single SIS processes allows for a better understanding of the behavior recovered in the simulations [32, 33]. Strains can be also grouped in two macro strains. This strategy allowed us to adopt the viewpoint of an emerging strain and study its competition with the others seen as a unique macro-strain. The associated Markov equation and Fokker-Planck approximation allow computing the average extinction time, capturing the key aspects of the dynamics. We focused here on three major properties of human contacts. Future work can leverage on a similar transmission model to address other properties known to alter spreading dynamics, such as heterogeneous inter-contact time distribution or topological and temporal correlations.

As a case study, we analyzed the spread of *S. aureus* in a hospital taking advantage of the simultaneous availability of contact and carriage information [16]. The temporal and topological features of the network lead to a lower prevalence and richness with respect to the homogeneous mixing (although the effect was quite small). In addition, similar prevalence and richness values are associated to different dominance levels in different networks - i.e. different values of the Berger-Parker index -, with the real network leading to a higher dominance as observed in reality. This behavior can be explained by the theoretical results and can be attributed essentially to the effect of contact heterogeneities, considering that the community structure does not have appreciable effects for this network, as discussed above. The importance of accounting for host contacts and hospital organization in the assessment of bacterial spread and designing intervention has been recognized by several studies [16, 28–31,61]. Here we show that this element may be critical also for understanding the population ecology of the bacterium. It is important to note however that, while the realistic network provides results that are closer to the data, this ingredient explains only part of the heterogeneity observed in the abundance. This shows that the contact network is a relevant factor, but other factors should be considered as well. The approach used here is intentionally simplified, as we focused on the main dynamical consequences of the contact network. Clearly, more detailed models can be designed to reproduce more closely the data. A certain degree of variation in the epidemiological traits could be at play, as for example the fitness cost of resistance [8]. Role of hosts in the network (e.g. patients vs. health-care workers), and heterogeneities in health conditions, antibiotic treatment and hygiene practices are also known to affect duration of carriage and chance of transmission [16, 28, 31, 61]. Eventually, we must consider that the comparison of model output with carriage data is also affected by the limitation of the dataset itself, already described in [16]. In particular, the weekly swabs may leave transient colonization undetected. Moreover, while the relevance of CPIs as proxies for epidemiological links has been demonstrated [16], the transmission through the environment (e.g. in the form of fomites) is also possible.

The understanding provided here can be relevant for other population settings, temporal scales and geographical levels. In addition, the modeling framework could be applied to pathogens other than *S. aureus*, such as *human papillomavirus, S. pneumoniae* and *Neisseria meningitidis*, for which the strong interest in the study of the strain ecology is justified by the public health need for understanding and anticipating trends in antibiotic resistance, or the long-term effect of vaccination [1, 2, 4, 5]. With this respect, if the simple framework introduced here increases our theoretical comprehension of the multi-strain dynamics, more tailored models may become necessary according to the specific case. In particular, we have considered complete mutual exclusion as the only mechanism for competition. In reality, a secondary inoculation in a host that is already a carrier may give raise to alternative outcomes, such as co-infection or replacement [71]. In addition infection or carriage may confer a certain level of long-lasting strain-specific protection and/or a short-duration transcendent immunity [11, 50]. Eventually mechanisms of mutation and/or recombination are at play and their inclusion into the model can be important according to the time scale of interest.

## Materials and methods

### Network models

For all the network models considered, the stochastic generative algorithms return a sequence of time-stamped networks and share a similar general scheme:

**Turnover dynamics:** new nodes arrive according to a Poisson process with rate λ_{in} and leave after a random time drawn from an exponential probability distribution with average *τ*. After a short initial transient, population size is Poisson distributed with average .

**Activation Pattern:** during each time step each node activates with a given probability that depends on the actual model considered. Each active node then creates a number of stubs which is drawn from a zero-truncated Poisson distribution. The active status lasts for just a single time step.

**Stub matching:** stubs are then matched according to the actual model considered.

The generative models considered in this work are:

**HOM:** in this model each node has the same probability *a*_{H} to be active during each time step. Stubs are matched completely at random in order to form links. We discard eventual self-links and multiple-links that may occur during the matching procedure.

**HET:** here each node has its own activation probability *a _{i}*, drawn from a power-law distribution

*P*(

*a*) ∝

*a*

^{−γ}, with

*a*∈ (∊, 1]. We tune the variance varying γ - lower γ higher variance. We then set ∊ to have the average activity

*ā*equal to

*a*

_{H}in HOM. Stub-matching procedure is the same as in HOM.

**COM:** incoming nodes are assigned to one among *n _{C}* communities with equal probability, in such a way that communities have the same size on average. Each stub is matched within the respective community with probability

*p*or outside the community with complementary probability.

_{IN}### Hospital network and null models

We use a dynamical contact network obtained from CPI data collected during the i-Bird study in a French hospital. Details of the network are already reported in [16]. We aggregated the CPIs daily keeping the information about their cumulated duration within each day. We discard CPIs relative to the first 2 weeks and the last 4 weeks of dataset, corresponding to a period of adjustments in the measurements and progressive dismissal of the experiment, respectively. Simulations conducted with the CPIs network were compared with results obtained with a null model which we refer to as RAND. According to this randomization scheme the activity of node is randomized while respecting the constraint that removal and addition of contacts must not alter the time of the first and the last contact of each node (*t _{S}* and

*t*respectively). Notice that RAND preserves the number of nodes that are present at any time in the network by preserving their first contact

_{L}*t*and their length of stay

_{S}*t*. Null models randomizing the latter properties lead to misleading results when node length of stay is heterogeneous and node turnover occurs [72]. RAND also sets all contact weights equal to the average weight value.

_{L}− t_{S}### Simulation details

Transmission dynamics is entirely stochastic and emerges from the combination of transmission, recovery and strain injection. During time step *t* each node is updated according to its state: each infected node transmits the strain it is carrying to a susceptible neighbor with probability *β*, and infected nodes turn susceptible with probability *μ*. For the *S. aureus* case study, transmission probability depends the cumulated duration of the contact within the day (*w _{ij}*) according to the expression . Due to mutual exclusion, an individual can be infected by a single strain at a given time [73]. New unseen strains are injected with rate

*ɩ*. At each time step incoming individuals can bring a new strain with probability

*p*. In addition, susceptible individuals may turn infectious carrying a new strain with probability

_{s}*q*. The two mechanisms mimic respectively incoming infectious individuals (e.g. admission of colonized patients) and transmission from an external source (in the hospital example this corresponds to contacts with individuals that were not participating in the study). Injection rate is thus given by , where is the number of susceptible at the equilibrium. In the theoretical analysis in the main paper we assumed

_{s}*q*= 0 for simplicity, thus variations in

_{s}*ɩ*where induced by variations in

*s*

_{in}and

*p*. The case

_{s}*q*> 0 was considered in the Supplementary Information. In the hospital case study

_{s}*p*was set to 0.079 (directly estimated from the data), while

_{s}*q*was explored.

_{s}In the theoretical analysis parameters were set, in the majority of cases, to match the hospital case study - e.g value of *p _{s}*, average number of nodes average activation rate a, number of communities (

*n*), etc.

_{C}### Analysis of carriage data

Carriage data was obtained from weekly swabs in multiple body areas, including the nares. Swabs that resulted positive to *S. aureus* were further examined. Spa-type and antibiotic resistance profiles (MSSA or MRSA) were then determined. In this work we regard two strains as different if they differ in spa-type and/or antibiotic resistance profile. We considered carriage data obtained from nasal swabs dismissing other body areas since the anterior nares represent the most important niche for *S. aureus* [74].

### Ecological measures and other indicators

We described strain population diversity through standard ecological indicators. The abundance of a strain *i*, *N _{i}*, is the strain-associated prevalence. From this quantity we computed the abundance distribution, being the frequency of strains with abundance

*N*. The Berger-Parker index is the relative abundance of the dominant strain, i.e. .

To analyze repartition of strains across communities we use the Inverse Participation Ration (IPR) [57]. Given a vector with elements , all within [0, 1], the IPR is given by:

If all the components are of the order (*l*^{−1}) then the IPR is small. Instead if one component *v _{i}* ~ 1 then IPR ~ 1 too, reflecting localization of . The IPR for total prevalence is computed by setting

*v*equal to the fraction of infected individuals belonging to community

_{i}*i*, while the IPR for a single strain is computed by setting

*v*equal to the fraction of individuals infected by that particular strain and belonging to community i. We can extend the IPR computation to HOM case by assigning nodes to different groups as in COM but without affecting the stub-matching scheme.

_{i}### Analytical results for the homogenous network

In order to estimate the value of the length of stay maximizing the average richness for a given value of *β* when the contact structure is given by the HOM network we consider a homogeneous mixing version of our system.

Due to Eq (1) the calculation of the average richness reduces to the calculation of the average persistence time. In order to estimate such quantity we focus on a particular strain, labelled as “strain A”, which is injected at *t* = 0 and we group all other strains under the label “strain B”. We are allowed to do so because all strains have identical parameters. We therefore reduce our initial, multi-strain problem, to a two-strain problem. Since all new strains that will be injected after *t* = 0 will be labeled as strain B, it is clear that A is doomed to extinction since there exists an infinite reservoir of B. The average time to extinction is therefore the average time to extinction of strain A.

Since HOM network realizes quite well homogeneous mixing conditions we regard our system as homogeneously mixed. Within this framework it is sufficient to specify the numbers of hosts infected by strain A (m), hosts infected by strain B (*n*) and susceptible hosts (*s*). The master equation for the joint probability distribution *P*(*m, n, s*) is given by [75]:

Where . The various terms represent contributions due to infection, recovery, admission and discharge of nodes. In order to obtain some approximate solution to this equation we assume that the average number of individuals *m* + *n* + *s* and the total prevalence *m* + *n* do not fluctuate in time and are therefore equal to and respectively, where *i*(∞) is given by:

After performing the Van-Kampen size expansion we are left with a Fokker-Planck equation for the density of A :
where *D*_{1} = *β*′ (1 − *i*(∞)) *x* − *μ* − *γ* and *D*_{2} = *β*′ (1 − *i*(∞)) *x* + *μ* + γ are the so-called drift and diffusion coefficients respectively.

According to the theory of stochastic processes [75] the average extinction time *T*_{pers}(*x*_{0}) (where *x*_{0} represents the initial density of strain A) satisfies:
with boundary conditions *T*_{pers}(0) = 0 and . The solution is finally given by:
where *Ei*(*x*) is the exponential integral function and γ* _{E}* is Euler-Mascheroni constant. When a new strain is introduced its prevalence is just 1, therefore we estimate the average extinction time using .

## Acknowledgments

Authors would like to thank Vittoria Colizza, Lulla Opatowski and Laura Temime for useful discussion.

## Footnotes

↵* chiara.poletto{at}inserm.fr