Abstract
Humans and other group-living animals tend to distribute their social effort disproportionately. Individuals predominantly interact with their closest companions while maintaining weaker social bonds with less familiar group members. By incorporating this behaviour into a mathematical model we find that a single parameter, which we refer to as social fluidity, controls the rate of social mixing within the group. We compare the social fluidity of 13 species by applying the model to empirical human and animal social interaction data. To investigate how social behavior influences the likelihood of an epidemic outbreak we derive an analytical expression of the relationship between social fluidity and the basic reproductive number of an infectious disease. For highly fluid social behaviour disease transmission is density-dependent. For species that form more stable social bonds, the model describes frequency-dependent transmission that is sensitive to changes in social fluidity.
Social behavior is fundamental to the survival of many species. It allows the formation of social groups providing fitness advantages from greater access to resources and better protection from predators [1]. Structure within these groups can be found in the way individuals communicate across space, cooperate in sexual or parental behavior, or clash in territorial or mating conflicts [2]. While animal societies are usually studied independently of each other, some questions about the nature of social living can only be answered by comparing behavior across a range of species [3, 4].
When social interaction requires shared physical space it can also be a conduit for the transmission of infectious disease [5]. For epidemic modellers it is vital to know what level of contact is necessary for host-to-host transmission as this determines how the density and structure of the population affect the rate at which the disease will spread [6, 7]. Typically, if the disease spreads through the environment then the transmission rate is assumed to scale proportionally to the local population density (density-dependence), whereas if transmission requires close proximity encounters that only occur between bonded individuals then we expect social connectivity to determine the outcome (frequency-dependence) [8].
In reality, however, animal-disease systems are not so easy to categorize [9]. For example, as social groups grow in size, new bonds must be created to maintain cohesiveness [10]. To manage their time and the increased cognitive effort required to maintain these bonds, individuals tend to interact mostly with their closest companions while weaker ties are maintained through infrequent contact [11–13]. This variability in the way social effort is distributed has been shown to affect contagion processes [14], and it leads us to the question motivating this study: can quantifying how group-living individuals choose to invest their social effort allow us to model the effects of population density on epidemic spread?
There is growing evidence for the disproportionate distribution of social effort in human communication [15–18]. Attempts to quantify this aspect of sociality in animal systems, however, are challenged by the fact that data on some individuals may be far richer than on others. These biases can be introduced in the data collection process, or result from behavioural differences across the sampled population [19]. Furthermore, while heterogeneous interaction frequencies and temporal dynamics such as circadian rhythms and bursty activity patterns have become common in social network models [20], little has been done to incorporate the way the individual chooses to distribute their social effort.
Here, we introduce a mathematical model founded on the concept of social fluidity which we define as variability in the amount of social effort the individual invests in each member of their social group. Using empirical data from previous studies, we estimate the social fluidity of 57 human and animal social systems. We use it in analytical and computational models of disease spread and show that the basic reproductive number defined on social fluidity is a better predictor of disease outcome compared to other social behavioral indicators. In addition, social fluidity emerges as a coherent mathematical framework providing the smooth connection between density-dependent and frequency-dependent disease systems, which have historically been studied in isolation.
Characterizing social behaviour
Our objective is to measure social behaviour in a range of human and animal populations. We start by introducing a model that captures a hidden element of social dynamics: how individual group members distribute their social effort. We mathematically describe the relationships between social variables that are routinely found in studies of animal behavior, the number of social ties and the number of interactions observed, and apply the model to empirical data to reveal behavioural differences between several species.
Social behavior model
Consider a closed system of N individuals and a set of interactions between pairs of individuals that were recorded during some observation period. These observations can be represented as a network: each individual, i, is a node; an edge exists between two nodes i and j if at least one interaction was observed between them; the edge weight, wi,j, denotes the number of times this interaction was observed. The total number of interactions of i is denoted strength, si = ∑j wi,j, and the number of nodes with whom i is observed interacting is its degree, ki [21].
We define xj|i to be the probability that an interaction involving i will also involve node j. Therefore the probability that at least one of these interactions is with j is . The main assumption of the model is that the values of xj|i over all i, j pairs are distributed according to a probability distribution, ρ(x).1 Thus, if a node interacts s times, the marginal probability that an edge exists between that node and any other given node in the network is
Our goal is to find a form of ρ that accurately reproduces network structure observed in real social systems. Motivated by our exploration of empirical interaction patterns from a variety of species (Fig. S1), we propose that ρ has a power-law form: where ϕ (> 0) controls the variability in the values of x, and E simply truncates the distribution to avoid divergence. Combining (1) and (2) we find where the notation 2F1 refers to the Gauss hypergeometric function [22]. It follows from ∑jxj|i= 1 that which can be solved numerically to find ∊ for given values of N and ϕ. The expectation of the degree is κ(s, ϕ, N) = (N − 1)Ψ(s, ϕ, ϵ).
Fig. 1 illustrates how the value of ϕ can produce different types of social behavior. As ϕ is the main determinant of social behaviour in our model, we use the term social fluidity to refer to this quantity. Low social fluidity (ϕ ≪ 1) produces what we might describe as “allegiant” behavior: interactions with the same partner are frequently repeated at the expense of interactions with unfamiliar individuals. As ϕ increases, the model produces more “gregarious” behavior: interactions are repeated less frequently and the number of partners is larger. While this phenomenon could be similarly described as “social strategy” or “loyalty” [23, 24], here we use a different measure as it is consistent with previously studied social drivers of epidemic spread [25] establishing a direct connection with disease risk at the population scale.
Estimating social fluidity in empirical networks
To understand the results of the model in the context of real systems we estimate ϕ in 57 networks from 20 studies of human and animal social behavior (further details in the supplement) [26–46], focusing our attention to those interactions which are capable of disease transmission (i.e. those that, at the least, require close spatial proximity).
Each dataset provides the number of interactions that were observed between pairs of individuals. We assume that the system is closed, and that the total network size (N) is equal to the number of individuals observed in at least one interaction. To estimate social fluidity we find the value of ϕ that minimizes ∑i[ki − κ(si, ϕ, N)]2 (the total squared squared error between the observed degrees and their expectation given by the model). Being estimated from the relationship between strength and degree, and not their absolute values, social fluidity is a good candidate for comparing social behavior across different systems as it is independent of the distributions of si or ki, and of the timescale of interactions.
Fig. 2 shows the estimated values of ϕ for all networks in our study. We organize the measurements of social fluidity by interaction type. Aggressive interactions have the highest fluidity (which implies that most interactions are rarely repeated between the same individuals), while grooming and other forms of social bonding have the lowest (which implies frequent repeated interactions between the same individuals). Social fluidity also appears to be related to species: ant systems cluster around ϕ = 1, monkeys around ϕ = 0.5, humans take a range of values that depend on the social environment. Sociality type does not appear to affect ϕ; sheep, bison, and cattle have different social fluidity compared to kangaroos and bats, though they are all categorized as fission-fusion species [3].
There is no significant correlation between the mean number of interactions per individual and social fluidity (Pearson r2 = 0.02, p = 0.26), which implies that sampling bias does not affect the estimation of social fluidity. Similarly, network size does not correlate with ϕ (Pearson r2 = 0.02, p = 0.33). Larger values of ϕ correspond to higher mean degrees (Pearson r2 = 0.27, p < 0.001) and lower variability in the distribution of edge weights (measured as the index of dispersion of wi,j; Pearson r2 = 0.26, p < 0.001). Weight variability and mean degree are uncorrelated in these data (Pearson r2 = 0.01, p = 0.59) implying that ϕ combines these two entirely distinct features of social behavior.
Finally, the modularity of the network (computed by the Louvain method on the unweighted network [47]) is negatively correlated with ϕ (r2 = 0.57, p < 0.001). This is expected as individuals tend to be loyal to those within the same module while maintaining weaker connections with the remaining network.
Characterizing disease spread with social fluidity
Our objective is to characterize how social behavior influences the susceptibility of the group to infectious disease in a range of human and animal social systems. We start by introducing a analytical transmission model that incorporates social fluidity. Using this model, we mathematically characterize the impact of social fluidity on density dependence, and apply the model to empirical networks to predict disease spread.
Disease transmission model
We consider the transmission of an infectious disease on the social behavior model introduced in the previous section. An infectious node i interacting with a susceptible node j will transmit the infection with probability β. The node will recover from infection with rate γ, assuming an exponential distribution of the length of the infectious period. The probability that the infection is transmitted from i to any given j is assuming that the interactions si of i are distributed randomly across an observation period of duration τ.
By integrating Eq. (5) over all possible values xj|i and and infectious period durations and multiplying by the number of susceptible individuals (N − 1) we obtain the expected number of infections caused by individual i,
The basic reproductive number (usually denoted R0) is defined as the mean number of secondary infections caused by a typical infectious individual in an otherwise susceptible population [48]. We will use the notation to signify the social fluidity reproductive number, that is the analogue of R0 derived from our social behaviour model.
We assess the relation of the reproductive number with the population density by focusing on a special case where every node has the same strength, i.e si = s for all i, so that . Furthermore, we choose where is as ϕ → ∞, i.e, a constant that represents what the basic reproductive number would be if every new interaction occurred between a pair of individuals who have not previously interacted with each other.
Fig. 3 shows the effect of social fluidity on the density dependence of the disease. At small population sizes, increases with N and converges as N goes to ∞ (Fig. 3A). The rate of this convergence increases with ϕ, and the limit it converges to is higher, meaning that ϕ determines the extent to which density affects the spread of disease. As N → ∞, we find that for ϕ > 1. When ϕ < 1, . At these values of ϕ the disease is constrained by individuals choosing to repeat interactions despite having the choice of infinitely many potential interaction partners (Fig 3B).
Estimating infection spread in empirical networks with heterogeneous connectivity
To apply this analogue of a reproductive number to an animal-disease system, we need to account for heterogeneous levels of social connectivity in the given population and thus the tendency for infected individuals to be those with a greater number of social partners [49]. For the basic reproductive number, this is often done using the mean excess degree, i.e. the degree of an individual selected with probability proportional to their degree [50]. Following a similar reasoning, we define , which incorporates the effect of social fluidity, as the expected number of infections (r(si)) caused by an individual that has been selected with probability proportional to their degree (ki):
Given the degree and strength of each individual in a network, the duration over which those interactions occurrred, and the transmission and recovery rates of the disease, we are able to estimate ϕ, compute Eq.(6) for each individual, and finally use Eq.(7) to derive a statistic that provides a measure of the risk of the host population to disease outbreak.
Numerical validation using empirical networks
We simulated the spread of disease through the interactions that occurred in the empirical data (materials and methods). We compute , defined as the ratio of the number of individuals infected at the (g + 1)-th generation to the number infected at the g-th generation over 103 simulated outbreaks, for g = 0, 1, 2 (g = 0 refers to the initial seed of the outbreak).
Table 1 shows the Pearson correlation coefficient between and its corresponding value obtained Eq.(7). For comparison, the correlation is shown for other indicators and network statistics. The results correspond to one set of simulation conditions, and are robust across a wide range of parameter combinations (see supplementary tables). Note that a different value of β was chosen for each network to control for the varying interaction rates between networks while keeping the upper bound constant (materials and methods). Thus, the mean strength does not have a significant effect on .
These correlations support a known result regarding repeat contacts in network models of disease spread: that indicators of disease risk that are derived solely from the degree distribution are unreliable and the role of edge weights should not be neglected [51,52]. After transmission has occurred from one individual to another, repeating the same interaction serves no advantage for disease (most directly-transmitted microparasites are not dose-dependent). Since a large edge weight implies a high frequency of repeated interactions, networks with a higher mean weight tend to have lower basic reproductive numbers. Furthermore, variability in the distribution of weights concentrates a yet larger proportion of interactions onto a small number of edges, further increasing the number of repeat interactions and reducing the reproductive number.
Correlation between modularity and is partly due to the strong correlation between modular networks and those with high social fluidity. Consistent with other evidence [53], this suggests that transmission events occur mostly within the module of the seed node, with weaker social ties facilitating transmission to other modules. The effect of clustering (a measure of the number of connected triples in network [54]) correlates with smaller , consistent with other theoretical work [51, 55].
Finally, we find the model estimate of the social fluidity reproductive number to be, on average, within 10% of the simulated value, at g = 1. At g = 2 the amount of error is larger (to up to 29% for some parameter choices). Prediction accuracy at this generation is negatively correlated with the mean clustering coefficient. This is not surprising as does not account for the accelerated depletion of susceptible neighbours that is known to occur in clustered networks [51, 55]. No other properties of the network affect the accuracy of consistently across all parameter combinations (see supplementary tables).
Discussion
We proposed a measure of fluidity in social behavior which quantifies how much mixing exists within the social relationships of a population. While social networks can be measured with a variety of metrics including size, connectivity, contact heterogeneity and frequency, our methodology reduces all such factors to a single quantity allowing comparisons across a range of human and animal social systems. Social fluidity correlates with both the density of social ties (mean degree) and the variability in the weight of those ties, though these quantities do not correlate with each other. Social fluidity is thus able to combine these two aspects seamlessly in one quantity.
By measuring social fluidity across a range of human and animal systems we are able to rank social behaviors. We identify aggressive interactions as the most socially fluid; this indicates a possible learning effect whereby each aggressive encounter is followed by a period during which individuals avoid further aggression with each other [56]. At the opposite end of the scale, we find interactions that strengthen bonds (and thus require repeated interactions) such as grooming in monkeys [57] and food-sharing in bats [33]. The fact that food-sharing ants are far more fluid than bats, despite performing the same kind of interaction, reflects their eusocial nature and the absence of any need to consistently reinforce bonds with their kin [58].
Most studies that aim to describe and quantify social structure are met with a number of challenges, including ours. First, the degree of an individual, for example, is known to scale with the length of the observation period [59]. By focusing not on the absolute value of degree, but instead on how degree scales with the number of observations, our analysis controls for this bias. Second, observed interactions have been assumed to persist over time [60]. In our model, only the distribution of edge weights remains constant through time, an assumption consistent with growing evidence [24, 61]. Third, duration of contacts is known to be important for disease spread [52]. We did not include explicitly the duration of each contact in our model, since this information was only available in a fraction of the datasets [62]. There is therefore potential to improve the applicability of this model as more high resolution data becomes openly available.
Our estimate of reproductive number derived from social fluidity provides a better predictor for the epidemic risk of a host population, going beyond predictors based on density or degree only. To illustrate this point, the social network of individuals at a conference (; conference_0, supplementary document) is predicted to be at higher risk compared to the social network at a school (; highschool_0), despite having a smaller size and lower connectivity (N = 93 vs. N = 312, and vs. , respectively). The discrepancy in the risk prediction comes from the lower frequency of repeated contacts between individuals in the conference, compared to the school. Interactions between infectious individuals and those they have previously infected are redundant in terms of transmission. This dynamic is nicely captured by the social fluidity, with ϕ = 0.66 for the conference and ϕ = 0.40 for the high school.
Unlike previous work that explores the disease consequences of population mixing [25, 63], our analysis allows us to investigate this relation across a range of social systems. We see, for example, how the relationship between mixing and disease risk scales with population density. For social systems that have high values of social fluidity, is highly sensitive to changes in N, whereas this sensitivity is not present at low values of ϕ. This corroborates past work on the scaling of transmission being associated to heterogeneity in contact [64,65]. Going beyond previous work, our model captures in a coherent theoretical framework both density-dependence and frequency-dependence, and social fluidity is the measure to tune from one to the other in a continuous way. Since many empirical studies support a transmission function that is somewhere between these two modeling paradigms [7, 66–68], the modeling approaches applied in this paper can be carried forward to inform transmission relationships in future disease studies.
Materials & Methods
A. Python libraries
Mean clustering coefficients were computed using the networkx Python library. To evaluate the hyper-geometric function in (3) we used the hyp2f1 function from the scipy.special Python library. Numerical solutions to Eq.(4) using the fsolve function from the scipy.optimize Python library. All scripts, data, and documentation used in this study are available through https://github.com/EwanColman/Social-Fluidity.
B. Data handling
Only freely available downloadable sources of data have been used for this study. Details of the experimentation and data collection can be found through their respective publications. Here we note some additional processes we have applied for our study.
Each human contact dataset lists the identities of the people in contact, as well as the 20-second interval of detection [26–29, 32]. To exclude contacts detected while participants momentarily walked past one another, only contacts detected in at least two consecutive intervals are considered interactions. Data were then separated into 24 hour subsets.
Bee trophallaxis provided experimental data for 5 unrelated colonies under continuous observation. We use the first hour of recorded data for each colony [46]. The ant trophallaxis study provided 6 networks: 3 unrelated colonies continuously observed under 2 different experimental conditions [30]. Ant antennation study provided 6 networks: 3 colonies, each observed in 2 sessions separated by a two week period. The bat study collected individual data at different times and under different experimental conditions [33]. For bats that were studied on more than one occasion we use only the first day they were observed.
Some data sets provided data for group membership collected through intermittent, rather than continuous, observation [34–38]. We construct networks from these data by recording an interaction when two individuals were seen to be in the same group during one round of observation. The shark data was divided into 6 datasets, each one constructed from 10 consecutive observations, and spread out through the full time period over which the data was collected.
For the grooming data [39, 40], if one animal was grooming another during one round of observations then this would be recorded as a directed interaction. Similarly for aggressive interactions [41–45, 56]. When an animal was determined to be the winner of a dominance encounter then this would be recorded as a directed interaction between the winner and the loser. We consider interaction in either direction to be a contact in the network.
We considered including two rodent datasets in which interaction is defined as being observed within the same territorial space [66,68]. We did not find this suitable for our analysis since the network we obtain, and the consequent results are sensitive to setting of arbitrary threshold values regarding what should, or should not, be considered sufficient contact for an interaction.
For data that did not contain the time of each interaction, contact time series were generated synthetically. For those datasets, the interactions between each pair were given synthetic timestamps in three different ways, Poisson: the time of each interaction is chosen uniformly at random from {0, 1, …, 104} seconds, Circadian: chosen uniformly at random from {0, 1, …, 3333, 6666, …, 104}, and Bursty: interaction times occur with power-law distributed inter-event times adjusted to give an expected total duration of 104 seconds.
C. Disease simulation
Simulations of disease spread were executed using the contacts provided by the datasets. The the bat network was omitted from this part since these data were collected over a series of independent experiments carried out at different times and under different experimental treatments. In one run of the simulation, one seed node is randomly chosen from the network and, at a randomly selected point in time during the duration of the data, transitions to the infectious state. The duration for which they remain infectious is a random variable drawn from an exponential distribution with mean 1/γ. During this time any contact they have with other individuals who have not previously been infected will cause an infection with probability β.
The simulation runs until all individuals who were infected at the second generation of the disease, i.e. those infected by those infected by the seed, have recovered. The datasets are ‘looped’ to ensure that the timeframe of the data collection does not influence the outcome. In other words, immediately after the latest interaction, the interactions are repeated exactly as they were originally. This continues to happen until the termination criteria is met.
We set the parameters to normalise for the variation in contacts rates between networks. To achieve this we consider a hypothetical counterpart to each network in which the strength of every node is the same, but each interaction occurs between a pair of individuals who have not previously interacted. This is equivalent to ϕ → ∞. Under these conditions xj|i= 1/(N − 1) for all pairs i, j. It follows that Eq. (5) becomes Ti→j ≈ siβ/γτ (N − 1), then r(si) ≈ siβ/γτ, and, since ki = si for all nodes i, Eq. (7) gives
The value of can be chosen arbitrarily. Then, by setting γ = 1/τ and we guarantee that Eq. (8) holds for every network. To test that our results hold over a range of disease scenarios we repeat our analysis with , 3, and 4.
Acknowledgments
This work was supported by NSF grant number 1414296. We are grateful for insightful feedback from Pratha Sah. We also thank all the researchers who have made their behavioral data openly accessible, making this study possible.
Footnotes
Changes to document structure. New data analysed. Old datasets removed since not appropriate. Substantial development of the model.
↵1 xj|i are subject to network interdependencies. Specifically, AX = XT A and X1 = 0, where X is a matrix whose i, j entry is −1 if i = j and xj|i otherwise, A is any diagonal matrix with positive entries, and 0 and 1 are column vectors of length N containing only 0 and 1, respectively. Thus, ρ(x) is the distribution of marginal xj|i values of the joint distribution P (X).
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].
- [13].↵
- [14].↵
- [15].↵
- [16].
- [17].
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].
- [28].
- [29].↵
- [30].↵
- [31].
- [32].↵
- [33].↵
- [34].↵
- [35].
- [36].
- [37].
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].
- [43].
- [44].
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].
- [68].↵