## Abstract

This paper introduces a novel method to quantify sociality in human and animal populations and explores the connection between social behaviour and the spread of infectious disease. Individuals living in groups tend to distribute their social effort heterogeneously, with some group members receiving more attention than others. By incorporating this heterogeneity into a mathematical model, we find that a single parameter, which we name Social Fluidity, controls the level of social mixing in the population. We estimate the social fluidity of 51 empirical human and animal social systems using maximum likelihood techniques. An analytical formula that connects social fluidity to both the population size and the basic reproductive number of an infectious disease is derived and simulations of the spread of disease are performed. We find that social fluidity outperforms other network-based metrics in predicting the basic reproductive number of an infectious disease and that the effect of population size on disease transmission is insignificant compared to the effect of social fluidity.

## 1 Background

Socialization is fundamental to the well-being of humans and the survival and many animal species [1–3]. To receive the fitness benefits of group membership, individuals must engage in social activities such as grooming, food-sharing, and conversation, in order to maintain a healthy and stable society [4,5]. In addition to many factors including time, energy, resource availability, and cognition [6,7], it is commonly thought that the threat of infectious disease imposes a limit on the social capacity of animal groups [8,9].

Larger populations are assumed to provide a greater number of opportunities for an infectious disease to transmit. For this to be true, however, the number of potential pathways for disease transmission must scale accordingly [10,11]. Moreover, some empirical results contradict the hypothesis that disease risk increases with group size [12,13]. Since the contact structure of the host population and the mode of transmission both play a role, the relationship between group size and epidemic risk is complex [14–18].

Networks provide some understanding towards the role of within-group contact structure [19]. It is recognized that the degree of an individual in a social network, i.e. the number of other individuals with whom they interact, has epidemiological significance [20]. While an increase in population size may cause an increase in the degree of an individual, some of these ties may weaken as a result. In humans, for example, social effort is invested mostly in close friends and family members, less is invested in the wider friendship circle, and as the circle extends to a wider group of people the frequency of interaction decreases [21–23].

It has been proposed that a better understanding of this aspect of social behaviour may lead to a quantitative approach to comparing sociality across species [24]. We further suggest that it will lead to better understanding of disease spread [25]. While there is a significant epidemiology literature that challenges the assumption of homogeneous mixing [26–29], no effort has previously been made to quantify the level mixing in a way that allows comparison across a range of animal disease systems.

In this paper we develop a mathematical model of human and animal social behaviour in which one parameter controls the heterogeneity in the way individuals choose to distribute their social effort. In the context of disease transmission, the same parameter, which we call “social fluidity” can be interpreted as the amount of mixing within the population. By estimating the social fluidity in a number of human and animal social systems we find evidence that opposes the hypothesis that the threat of infectious disease imposes limitations on group size.

## 2 Methods

### 2.1 Social behavior model

Our analysis concerns a closed system of *N* individuals and a process of interaction that may occur between pairs of individuals. The system can be thought of as a network; an individual human or animal, which we call *i*, is a node; the weight *w _{i,j}* is the number of times two nodes

*i*and

*j*were observed interacting together; if

*w*> 0 then we say that an edge exists between

_{i,j}*i*and

*j*; the number of times that

*i*was observed interacting is its strength,

*s*(

_{i}*s*= ∑

_{i}_{j}

*W*); and the number of partners with whom

_{i,j}*i*was observed interacting is its degree,

*d*.

_{i}We consider the interactions of one node, *i*, which we call the focal node. The relationship between *s _{i}* and

*d*is analogous to the relationship between the number of animals observed and the number of species observed in wildlife surveys [30–32]. Just as each new animal observation can either return a previously unobserved species or one that has been observed before, each observed interaction of

_{i}*i*can either return a new interaction partner or one that

*i*has interacted with before. The likelihood of sampling a given species, in this case, is replaced with a the likelihood that

*i*will interact with

*j*, where

*j*is any other member of the population. Formally, we define

*x*

_{j|i}to be the probability that

*j*is the interaction partner of

*i*given that

*i*is observed in an interaction.

Applying a similar approach to early work in species accumulation curves [33], the probability that *i* has interacted with *j* at least once over the course of *s _{i}* observations is

In our model we assume that the values of *x*_{j|i} over all *i*, *j* pairs are distributed heterogeneously according to a probability distribution, *ρ*(*x*). Thus, the probability that an edge exists between any chosen focal individual *i*, and any given member of the population is no longer specific to the particular choice of *i* and *j*. The general formula for the probability that an edge exists between the focal individual and any other individual in the population, after *s* observations, is

Our goal is to find a form of *ρ* that accurately reproduces the degree distributions seen in real social systems. This can be achieved with the following truncated power law,

The terms in this expression are explained as follows: the quotient on the left ensures that *ρ*(*x*) meets the requirement of a probability density function that *∫ ρ*(*x*)*dx* =1, *ϵ* truncates the distribution to avoid an asymptote at *x* = 0, *ρ*(*x*) is also truncated 1 to ensure that all values of *x*_{j|i}, which are probabilities, are less than 1, and finally, *ϕ* is a parameter that controls the heterogeneity in the values of *x*_{j|i}.

While the model contains two parameters, *ϵ* and *ϕ*, the requirement that Σ_{jxj|i} = 1 can only be met if *ϵ* = *ϵ*(*N*, *ϕ*) is chosen to be a specific value (a detailed explanation of how to compute e is included in the online supplement S1.2). Larger values of *N* correspond to smaller values of *ϵ* and so larger populations see a higher frequency of weak relationships, i.e. small values of *x*_{j|i}. Since *ϕ* is the only free parameter in the model, and is therefor the only determinant of social behaviour, we use the term “social fluidity” to refer to this quantity.

Figure 1 illustrates how the value of *ϕ* can create different types of social behaviour. Low values of *ϕ* produce a type of social behaviour that we might describe as “allegiant”; interactions with the same partner are frequently repeated at the expense of interactions with unfamiliar individuals. As *ϕ* increases, the model produces more “gregarious” behaviour; interactions are repeated less frequently and the number of partners is larger. High social fluidity corresponds to a high level of mixing withing the population or social group.

Combining Eq.(2) and Eq.(3) we find that, for an individual that has been observed *s* times,
where the notation _{2}*F*_{1} refers to the Gauss hypergeometric function [34]. The probabilty that the observed degree of a node is equal to *d* is determined by *N* – 1 independent Bernoulli trials, each with success probability Ψ(*s*). The degree distribution therefore binomial *d*(*s*) ~ *B*(*N* – 1, Ψ(*s*)), however, since this distribution gives non-zero probabilities for cases where *d* > *s*, which are invalid, we instead use *d*(*s*) ~ *B*(*s*, (*N* – 1)Ψ(*s*)/*s*) when 0 < *s* < *N*.

We use maximum likelihood estimation to fit this model to the empirical data and return the social fluidity *ϕ* (details are contained in the online supplement S1.4). The goodness-of-fit is calculated by comparing its likelihood to the likelihood of a null model in which the degree of each individual is a uniformly distributed random integer within the range of feasible values (details are contained in the online supplement S1.4).

### 2.2 Disease transmission

#### 2.2.1 Analytical model

Our goal is to analyse how social fluidity and population size influence the likelihood of epidemic outbreaks. We do this by deriving analytically the rate of infection for a model of disease transmission in a population whose contact dynamics follow the model of the previous section. This results in a formula that predicts *R*_{0}, which is defined as the number of secondary infections caused by a single infectious individual in an otherwise fully susceptible population. We calibrate the disease parameters in a way that controls for the fact that rates of activity may vary between different populations. This results in a formula that quantifies the effect size of social fluidity that can be applied across a range of social systems.

The disease model is described as follows: an individual becomes infectious at some random point in time and may recover at any subsequent point in time, the probability of recovery during a 1 second interval is γ. Interactions that occur during this infectious period result in the interaction partner becoming infected with probability *β*. This simple disease model disregards the fact that interactions vary in duration, intimacy, and contact type (for which we often do not have data); *β* here represents a probability of infection that combines all of these factors.

We first predict the reproductive number, *R*_{0}(*s _{i}*), of an individual who was observed interacting

*s*times during an observation period of duration Δ

_{i}*t*. The rate of activity, in this case, is estimated to be

*s*/Δ

_{i}*t*interactions per unit of time. By assuming that these interactions occur according to a Poisson process, transmission of the disease from the infected individual,

*i*, and another individual in the population,

*j*, also occurs as a Poisson process with rate

*s*

_{i}x_{j|i}

*β*/Δ

*t*. The length of time for which

*i*remains infectious is exponentially distributed with rate parameter γ. For an infectious period of length

*τ*, the probability that the infection transmits from

*i*to any given

*j*is

The reproductive number for *i*, *R*_{0}(*s _{i}*), is found by integrating Eq.(5) over all possible values of

*τ*and

*x*

_{j|i}then multiplying by the number of susceptible individuals,

*N*– 1. The result is given in section…of the online supplement.

The range of human and animal systems is diverse, and social activity can happens on extremely different timescales. Additionally, the type of diseases that affect one species is unlikely to affect another. Instead of choosing parameter values that relate to some specific disease, it is more informative to select parameter values for each system separately in a way that exposes the effects of population size and social fluidity. To achieve this, the recovery rate, γ, is chosen in such a way that *R*_{0} would always be the same value if, hypothetically, the effects of social fluidity and population size were not present.

We define *R*^{∗} to be the value of *R*_{0} in a large population with homogenous mixing. Calibration is achieved when γ is chosen to be
where Δ_{t} is the duration of the time-frame of the data and 〈*s*〉 is the mean of *S _{i}* over the whole population. The effect of this calibration is that the recovery rate, γ, is proportional to the mean rate of activity. Consequently, a population with a higher frequencey of social interaction will be coupled with a disease that has a longer mean infectious period.

After performing the calibration we arrive at the following result for the basic reproductive number of an individual that was observed interacting *s* times,

Note that no temporal information appears in this equation. In all the analysis presented we arbitrarily choose *R*^{∗} = 2.

#### 2.2.2 Disease simulation

Because the fidelity of the social behavior model, i.e. the extent to which it agrees with the data, varies across the different social settings, we expect that the accuracy of Eq.(18) to vary. To test this we simulated the transmission for each individual (reported results are the mean of 10^{3} simulations). For the simulations we arbitrarily chose a transmission probability of *β* = 1/4 [need to justify this]. The mean absolute error |*e*| measures the mean difference between the individual reproductive number, *r _{i}*, calculated from Eq.(18) and computed by the simulation. A full description of the disease simulation can be found in the online supplement. Full details are provided in Section S2.4 the online supplement.

## 3 Results

### 3.1 Measuring social fluidity

Social fluidity, *ϕ*, quantifies heterogeneity in the way individuals distribute their social effort among the other members of the population. We estimated *ϕ* in 51 datasets taken from 18 studies of human and animal social behaviour [14,16,35–50]. Details of each data source are included in the online supplement S3. Figure S1 shows the data and the distribution fitted using maximum likelihood estimation. We find that the model provides a good fit to every data-set; model fidelity is positive in every case, which implies that the empirical data follows the model better than synthetic data generated from a mixture of 92% from the model distribution and 8% from random noise (see online supplement S1.4).

Social fluidity does not appear to be affected by the sample size; there is no significant correlation between the mean number of observations per individual, *s̄*, and *ϕ* (Pearson *R*^{2} = 0.004, *p* = 0.663). Larger populations tend to have smaller social fluidity values (Pearson *R*^{2} = 0.223, *p* < 0.001). This correlation is dependent on the presence of a few large populations in our data (*N* > 200) which may be, to some degree, subdivided into smaller groups. As a consequence, the social effort of one individual becomes concentrated on a relatively small proportion of the whole population, which causes heterogeneity to increase, and *ϕ* to decrease.

Populations with higher values of *ϕ* tend to have higher mean degree (Pearson *R*^{2} = 0.332, *p* = 0.001) and less heterogeneity in the distribution of edge weights (measured as the variance divided by the mean /〈*w*〉) (Pearson *R*^{2} = 0.332, *p* = 0.001). Incidentally, weight heterogeneity and mean degree are uncorrelated (Pearson *R*^{2} < 0.001, *p* = 0.984). The fact that social fluidity correlates with both the weight heterogeneity and the mean degree, yet they do not correlate with each other, illustrates that by measuring *ϕ* we are combining elements from two distinct features of sociality.

Figure 2 shows the estimated values of *ϕ*. Social fluidity appears to depend on the type of interaction observed. Aggressive interactions have the highest fluidity; this is expected since it is unusual for an aggressive encounter, such as a display of dominance, to be caused by an underlying bond between the pair of animals. In fact, it may be more likely that the animals will avoid each other following the interaction. Social fluidity also appears to be related to species; ant systems cluster around *ϕ* =1, mouse and voles around *ϕ* = 0.7, and humans around *ϕ* = 0.6 (with the exception of all five days of high school data and the last day of a conference).

### 3.2 Effect of social fluidity on disease transmission

Figure 3**A** shows the value of *R*_{0} predicted from Eq.(7) with parameter values, i.e. *N*,Φ,*s _{i}* and

*d*all individuals in the population, taken from the empirical data. Since

_{i}*R*

^{∗}=2 represents the expectation of

*R*

_{0}in a homogeneously mixed population of infinite size, the values displayed in the figure illustrate the magnitude of the effects of both

*ϕ*and

*N*. When

*ϕ*< 1, social fluidity determines

*R*

_{0}moreso than poulation size.

At small population sizes, *R*_{0} increases with N and converges as *N* goes to ∞ (Figure 3**B**). The rate of this convergence increases with *ϕ*. When *ϕ* < 1, the limit of *R*_{0} is a function of *ϕ* (Figure 3**C**). At these values of *ϕ*, the individual will choose to repeat interactions despite having the choice of infinitely many potential interaction partners. When *ϕ* > 1 and the population is large the probability of a repeated interaction falls to zero. In this case, *R*_{0} = *R*^{∗} since, under the current circumstances, having no repeated interactions is effectively equivalent to homogeneous mixing.

A small amount of error is observed between predicted *R*_{0} and the simulated *R*_{0} (between 0.1 and 0.258 with one outlier at 0.344) with the predictions consistently overestimating *R _{0}* in human systems, possibly because of the bursty nature of human contact (see Table S2). Despite this, our overall conclusions from the disease simulation are consistent with the predicted results. Figure 4 shows that

*ϕ*correlates with the simulated

*R*

_{0}better than other network metrics. Since this correlation is very strong, the relationship between each of these metrics and

*R*

_{0}is qualitatively the same as the their relationship with

*ϕ*.

Consequently, as Figure 4**B** shows, *R*_{0} tends to be smaller in the largest populations (*N* > 200) possibly due to subdivision within the population. The mean number of interactions observed shows no correlation with *R*_{0}, shown in Figure 4**C**, which we expect since we deliberately controlled this variable. Finally, neither mean degree (Figure 4**D**) or weight heterogeneity (Figure 4**E**) correlate with *R*_{0} as well as social fluidity.

## 4 Discussion

Social fluidity quantifies how much mixing exists within the social relationships in a population. While structure within a population can take many forms, the methodology introduced here succeeds in reducing all such factors to a single number allowing comparisons to be made across various human and animal social systems.

Most studies that aim to describe and quantify social structure use a network representation of their system as part of their analysis. One questionable assumption that is often made is that social relationships exist; that because interaction was observed between two individuals, their is some underlying bond present that will persist into the future [51]. The methodology presented in this paper assumes only that the distribution of relationships remains constant through time, an assumption that is consistent with a growing amount of evidence [52,53].

Another criticism of network-based analysis is that it is highly sensitive to biases in the way data is collected. The degree of an individual, for example, is likely to be high if that individual was, by chance, observed a large number of times during the duration of the experiment. By focusing not on the absolute value of degree, and instead focusing on the scaling relationship between the number of observations and the degree, the analysis presented in this paper is free of this bias. Our analysis, however, does depend on a reliable estimate of the population size.

The estimated values of *ϕ* predict the basic reproductive number *R*_{0} better than the alternative network-based metrics we tested. It is, however, incorrect to then deduce that social fluidity outperforms network-based approaches in predicting the number of infections that occur at the second generation of infection (or third or fourth etc.). It is likely that additional structural factors will influence the way the disease propagates though the population. Until these factors are properly understood, the basic reproductive number remains an important epidemiological quantity. In particular, identifying whether the *R*_{0} of a disease system is greater or less than 1, the epidemic threshold, is considered a useful indicator of whether the disease will die out quickly or reach an epidemic state.

By considering the epidemic threshold and how it affected by both population size and social fluidity we are able to conclude that population size is a relatively insignificant factor in determining the risk of epidemic outbreak. To demonstrate: compare the human face-to-face data from a hospital ward (`conference_1` in supplementary table 1), here *N* = 49, *ϕ* = 0.521, and *R*_{0} = 1.239, to data from the conference (`hospital_1` in supplementary table 1), for which *N* =92, *ϕ* = 0.592, and *R*_{0} = 1.234. These two sets were chosen for the example because they have similar *R*_{0} results. Our analysis shows that *R*_{0} depends on both *N* and *ϕ*, yet, while N is remarkably different between the two systems, the value of *ϕ* needed only to change by 0.71 to compensate (neither of these *ϕ* values are outliers in the distribution of human face-to-face systems).

In conclusion: if the human and animal contact data analyzed in this study is representative of the pathways that may be taken by an infectious disease, then the risks associated with population size (or group size) are easily offset by minor changes in the way individuals distribute their social effort among members of the population (e.g. by avoiding homogeneously mixing with every other group member). If the survival of a group depends on its size then it seems like this social adaptation can be made with few significant consequences.

## Footnotes

↵∗ ec975{at}georgetown.edu