Reliable hypotheses testing in animal social network analyses: global index, index of interactions and residual regression

Animal social network analyses (ASNA) have led to a foundational shift in our understanding of animal sociality that transcends the disciplinary boundaries of genetics, spatial movements, epidemiology, information transmission, evolution, species assemblages and conservation. However, some analytical protocols (i.e., permutation tests) used in ASNA have recently been called into question due to the unacceptable rates of false negatives (type I error) and false positives (type II error) they generate in statistical hypothesis testing. Here, we show that these rates are related to the way in which observation heterogeneity is accounted for in association indices. To solve this issue, we propose a method termed the “global index” (GI) that consists of computing the average of individual associations indices per unit of time. In addition, we developed an “index of interactions” (II) that allows the use of the GI approach for directed behaviours. Our simulations show that GI: 1) returns more reasonable rates of false negatives and positives, with or without observational biases in the collected data, 2) can be applied to both directed and undirected behaviours, 3) can be applied to focal sampling, scan sampling or “gambit of the group” data collection protocols, and 4) can be applied to first- and second-order social network measures. Finally, we provide a method to control for non-social biological confounding factors using linear regression residuals. By providing a reliable approach for a wide range of scenarios, we propose a novel methodology in ASNA with the aim of better understanding social interactions from a mechanistic, ecological and evolutionary perspective.


INTRODUCTION
permutation analytical protocols, such as inflated rates of false negatives (i.e., non-rejection of 1 3 3 a false null hypothesis) and false positives (i.e., acceptance of a false null hypothesis) for both 1 3 4 permutation approaches 25,26 . For example, Puga-Gonzalez, et al. 25 used simulations for a case 1 3 5 with data biases arising from the data collection protocol (e.g., oversampling of specific 1 3 6 individual categories), and found that pre-network permutations showed rates of false 1 3 7 positives of 35.6%, while network permutations showed rates of false negatives of 60.8% and 1 3 8 rates of false positives of 36.6%. Yet, very few biological data collected in natura are immune 1 3 9 to biases related to the system under study or related to necessarily limited sampling. Given negatives and false positives are likely to be a common problem and a pressing issue to 1 4 2 resolve for ensuring the reliability of hypotheses testing in ASNA. 1 4 3 Whereas important statistical advances have helped improve the reliability of statistical 1 4 4 hypothesis testing 20,22 in ANSA, limitations remain. Franks, et al. 20 proposed the use of linear 1 4 5 regression for testing, while adding control factor(s) to account for potential biases. However, 1 4 6 this approach was only validated for undirected association data using pre-network 1 4 7 permutations and inclusion of additional variables in the model reduce degree of freedom. Farine & Carter 2021 22 suggested an approach using both permutation processes to estimate 1 4 9 the deviance from randomness, but this approach still returns high rates of false positives and 1 5 0 can only be used for association data. Finally, Hart, et al. 27 recently demonstrated that 1 5 1 parametric tests without permutations show rates of false positives and false negatives similar 1 5 2 to those of network permutation tests, thereby calling into question the use of permutations 1 5 3 themselves. Moreover, although authors argue that permutations do not control for data 1 5 4 dependency, the purpose of permutations is to provide an alternative to compute a test statistic 1 5 5 against null models and to avoid reducing the degree of freedom of parametric tests. As a 1 5 6 result, it currently remains difficult to identify a standard analytical protocol in ASNA, 1 5 7 according to the type of data collected and the data collection protocol. 1 5 8 Here, we propose an approach for addressing the issue of high rates of false positives and 1 5 9 false negatives. The main idea behind our approach is that indices of associations measure the 1 6 0 proportion of time that a dyad spends in association, however, while the computation of these 1 6 1 indices considers the sampling effort, indices of associations are calculated using absolute 1 6 2 time (i.e., a SRI of 0.5 means that a dyad spends 50% of the time associated, regardless of 1 6 3 whether individuals were observed 10 times or 100 times). Thus, in order to account for 1 6 4 sampling effects, indices of associations need to be weighted to obtain a value relative to the 1 6 5 sampling effort. We term this the "global index" (GI) approach. In addition, as indices of 1 6 6 associations have been designed only for undirected behaviours, we also developed an "index 1 6 7 of interactions" (II) that estimates the proportion of interactions of a dyad, while accounting 1 6 8 for both received and given behaviours and allowing the use of GI for directed behaviours. Sampling biases are only one aspect of the problem and, as described by Farine & Carter   protocol or to individuals' specificities (e.g., cryptic individuals are more challenging to 1 7 3 observe). The second potential confounding factor that may influence individual associations 1 7 4 refers to non-social biological influences of sociality such as cycle synchrony across 1 7 5 individuals, space use or kinship, among others. The consideration of potential confounding 1 7 6 non-social biological influences on sociality is of major interest in order to correctly evaluate 1 7 7 the effect of sociality. While Farine & Carter 2021 22 proposed some solutions to control for 1 7 8 such confounding factors, these show the same limitations previously discussed (high rates of 1 7 9 false positives and applicable only to association data). Moreover, developing a methodology 1 8 0 allowing to reach beyond the control for putative biological confounding factors, by assessing their magnitude or importance on social interactions, remains to be done. In the third part of 1 8 2 our study, we propose a method to control for non-social biological confounding influences 1 8 3 (e.g., sex, age, body condition). Our approach uses linear regressions to estimate the 1 8 4 relationship between an assumed non-social biological confounding factor and a social 1 8 5 measure. If the relationship is significant, we can then consider that the non-social biological 1 8 6 confounding factor affects the individual social measure. We can "control" for the factor by 1 8 7 computing the residuals from the linear regression and using them as a relative social 1 8 8 measure. This approach, defined as the "residual correction" (RC), has the advantage of being 1 8 9 usable after accounting for sampling biases (after using GI), using permutation approaches to 1 9 0 compute significant relationships, estimating whether one or several non-social biological confounding factor(s) exist, and statistically controlling for these factors. Furthermore, the use of generalized linear mixed models allows accounting for structure of the data (e.g., repeated 1 9 3 measurements and non-Gaussian distributions like Poisson, or zero-inflated distributions). 1 9 4 Finally, it is possible to test for non-linear relationships between the social measure and the 1 9 5 potential non-social biological confounding factor(s) through polynomial regressions. 1 9 6 We perform computer simulations and first demonstrate that the GI approach (that consists in 1 9 7 considering individuals' sampling effort within the indices of associations) is reliable for the 1 9 8 study of undirected behaviors. In a second step, we show that the index of interactions (II) can 1 9 9 be reliably used in combination with the GI to study directed behaviours. Finally, we show 2 0 0 that the RC approach accurately estimates and controls for non-social biological confounding sampling or "gambit of the group" -described below-) and recording protocols 29 (continuous and timed sampling) commonly used to collect ASNA data, highlighting that our methods  protocol, or the type of behaviour studied (directed/interactions or undirected/associations). While indices of associations accurately estimate differences in associations among the individuals from different dyads, numerous confounding factors may affect these associations. Several studies have attempted to control for such confounding factors. For example, as highly gregarious individuals associate more frequently with other highly gregarious     a specific social index (the half weight index-a modification of the SRI), but the logic can be    Here we propose a similar method (global index: GI, ) to control for differences in individual of the dyad. This will weight each proportion of time that two individuals spend together by the inverse of each individual's total time of observation: b. Note that this formula can be only used with weighted social network measures and not 2 3 8 with binary social network measures, such as the degree (i.e., number of social partners), that 2 3 9 only consider the presence/absence of links without considering their weights. Therefore, for computed and those of its social partners. Once the GI is used to construct the social network, the same index can be used as part of a pre-network permutation process in order to test 2 4 4 hypotheses about the network. We hypothesize that this correction will solve many of the 2 4 5 problems with sampling biases described above, and test our hypothesis below using  Concretely, if we consider discrete time sampling rule 29 (instantaneous or one-zero sampling) such as spatial associations collected with "gambit of the group" sampling protocol (i.e. considering spatially clustered individuals associated), or behavioral events without duration recording sampling rule of behavioral state with (e.g. time of grooming) or without (e.g. simulations mimic both cases of behavioural recording: continuous behavioural frequencies 2 5 7 collected through focal sampling and discrete behavioural sampling (e.g., spatial associations 2 5 8 collected through GoG) to highlight how the GI performed under these sampling and recording rules for undirected behaviours using the simple ratio index (i.e.,

‫ܩ‬ ‫ܫ‬
Simulations to validate the GI method for undirected behaviours 2 6 2 To demonstrate the reliability of the Global Index (GI) approach in multiple scenarios, we  . This simulation, inspired by Farine differences in sociality, whether statistically significant or not. In addition, it is possible to mimic a specific amount of sampling protocol bias by simulating oversampling for males. With such a simulated dataset, it is possible to test for differences in sociality between sexes with the 'lhs' R library 3 3 to sample the parameter space (variables a-d in Table 1). producing a total of 2,000 simulations. In addition, we made two modifications to the original simulation. The first consisted of using  second was to compute individuals' degree and eigenvector to validate our method also with regarding the evaluation of GI reliability for focal sampling (simulation 1, Appendix 1).

8 4
However, in order to evaluate the GI reliability for GoG, we made another modification by (simulation 2, Appendix 2).

8 7
Results for simulation 1 (focal sampling) showed important improvement for network 2 8 8 permutations using the GI approach for focal sampling with rates of false negatives below 5% 2 8 9 with or without the presence of observation biases and rates of false positives below 5% with 2 9 0 observation biases and below 10% without observation biases for all social network measures 2 9 1 ( Table 2). As expected, parametric tests showed high rates of false positives related to over- inflation of the degree of freedom used to calculate the significance 35 . Finally, the GI approach did not solve the issue related to pre-network permutations (i.e., it did not address 2 9 4 the null hypothesis that X was distributed randomly with respect to Y or that the effect of X on Y was zero) and thus, as expected, the GI did not solve the issue of high rates of false positives using the pre-network approach. Results for simulation 2 (scan sampling/GoG) showed similar values for reliability (Table 3) 2 9 8 with rates of false positives below 5% with or without the presence of observation biases for all social network measures. Finally, whereas rates of false negatives were under 5% with

Extension of GI approach to directed behaviours
While individuals' associations are mostly used in behavioral ecology in which entire populations are followed over large areas, the study of individuals' social interactions are also an important part of ASNA research. The study of social interactions is usually done in smaller and well-identified groups. In order to enable reliable testing for questions about social interactions, we developed an index of interactions (II) and evaluated the reliability of the combination of II with GI through a third simulation.
The appropriate form of an index of social interactions (the II approach) for directed behaviours depends on the recording rule used to collect data, which depends in part on the nature of the target interaction. Here we showed that for interaction data collected in discrete sampling periods (instantaneous or one-zero sampling), a modified version of the SRI is appropriate, but that if the interaction data were collected using continuous recording, then a simple ratio is more appropriate.
The first possibility is that the target interaction is a behavioural state of a meaningful duration, e.g., bouts of grooming. The researcher might then wish to estimate the proportion of time that each dyad (ab, consisting of individuals a and b) spends engaged in the target interaction. Under such cases, instantaneous sampling may be used 29 , with the data specifying whether the target interaction was occurring for each dyad at uniform time points (e.g., every 5 mins). In this case the II approach is directly analogous to the collection of standard association data, therefore the SRI (Eqn. 1) can be used except,  is the number of sampling points at which a was directing an interaction towards b (e.g., a was grooming b) and ‫ݔ‬ ՜ is the number of sampling points at which b was directing an interaction towards a (e.g., a was grooming b). Therefore, for such data the SRI, or the II approach (Eqn. 2), can be used under the same assumptions as for association data: e.g., failing to observe individuals a and b while they are interacting is as likely as failing to observe them both when they are not interacting together (see Hoppitt & Farine 2018).
Alternatively, the target interaction may be a behavioural event of short duration (e.g., one bird directs a peck at another bird), or the researcher may simply be interested in the rate at which a initiated interactions with b, rather than the proportion of time engaged in that interaction. In such cases one-zero sampling is traditionally used 29 1). For each scenario, we ran 500 different combinations of input parameter values for a total of 2,000 simulations. In this simulation, as well as in the following one, we did not perform pre-network permutations for two reasons. First, because data stream permutations for directed behaviours do not exist, and second because the GI approach does not solve prenetwork permutation reliability issues related to hypothesis testing, as discussed earlier.
Results of simulation showed that the combination of II and GI returns low rates of false positives and false negatives, with or without sampling biases for node label permutations and parametric test ( Table 3). The low rates of false positives and false negatives of parametric tests in these simulations for directed behaviours are supported by the fact that, as these data are independent (an emitted interaction is counted once), over-inflation of the degree of freedom should not occur even when testing hypothesis in scenarios with observational biases. This is why, in such scenarios, parametric tests should be preferred. However, it is quite common that researchers use several directed social network measures (total emitted behaviours and total received behaviours) in a linear regression to evaluate their effect. In such cases, data independence is violated as an individual's emitted behaviours are the received behaviours of its congeners and thus a single behaviour is counted twice. In this scenario, network permutations should be preferred.

Simulation on directed behaviours: Scan sampling and discrete time sampling
In order to simulate datasets of directed interactions collected through discrete time sampling, we created a simulation that mimics scan sampling (simulation 4, Appendix 4). This model creates a population of size N with a predefined number of subgroups (g, 4 subgroups in each simulation) within the population. By creating subgroups, we created cliques with groups of individuals having higher probability of being observed together, yet seldomly interacting, thereby shaping the ‫ݕ‬ term of the II in Eqn.5. A number of scans (x) was predefined at initialization of the simulation. For each observation, a within and between subgroups interaction process was defined. In the within subgroup interaction process, a subgroup (g) was selected randomly, a number of individuals (n) observed within the scan (i) was defined following a Poisson distribution (with alpha of 6). Once the size of the scan was defined, n/2 individuals belonging to the subgroup g were selected, and these individuals (defined as emitters) will emit an interaction based on a fixed probability (ܲ  Table 5. Once the simulation was done, we calculated the GI (see Eqn.4) using the II (see Eqn.5) as social index and the directed versions of the previous social measures were computed for each individual through II and GI: 1) outstrength, outdgree and outeigenvector. To assess the reliability of hypothesis testing, a random continuous variable (x) following a normal distribution was created. This variable, that represents an individual trait, was then ordered and assigned based on individuals' social measures, to create a relationship between x and social measures (scenario 1), and also randomly assigned to individuals' strength to create no relationship between x and social measures (scenario 2).
In order to mimic observational biases (z) proportional to the relationship between the variable x and the social measures, a maximum observational bias was defined (e.g. 20%) as input parameter of the simulations (Table 5) %, and so on. This allowed us to create scenarios with or without biases in combination with scenarios 1 and 2 explained above.
To evaluate the testing reliability of II combined with the GI approach, we performed 500 simulations for scenario 1 and 500 simulations for scenario 2. This was done with and without observational biases for a total of 2,000 simulations. We sampled the input parameter space of the simulations (variables a-e in Table 5) using the Latin hypercube sampling using 'lhs' R library. For each simulation, we assessed the rates of false negatives and false positives of network permutations and parametric tests.
Results of simulation showed that the combination of II and GI returned low rates of false positives and false negatives, with or without sampling biases for node label permutations and parametric test (Table 6). When observation biases were simulated, we observed that parametric test still returned no false positive nor false negative results, whereas we started to observe some false negative and false positive results with network permutations, although the rates still fell within acceptable levels ( Table  6).

Estimating and controlling for non-social biological confounding factors
Finally, to be able to estimate and control for non-social biological confounding factors, we used residuals of the regression between the social measure and the potential non-social biological confounding factors. As residuals represent the difference between the prediction of the linear regression model of the relationship between the predictive variable (the potential non-social biological confounding factor(s)) and the predicted variable (the social measure), they allowed us to adjust the social measure according to the potential non-social biological confounding factor(s), and use it as relative social measure. As discussed in the introduction, the residual correction (RC) method has the advantage of being usable after accounting for sampling biases (after using GI), of using permutation approaches to compute significant relationships, and of estimating whether one or several non-social biological confounding factor(s) exist. Furthermore, it is possible to resort to generalized linear mixed models in order to account for structure of the data (e.g., repeated measurements, non-Gaussian distribution such as Poisson, or zero-inflated distributions). Finally, non-linear relationships can be tested between the social measure and the potential non-social biological confounding factor(s) with the use of polynomial regressions.
To assess the reliability of RC, we used the Farine & Carter 2021 22 simulation (simulation 5, Appendix 5) that mimics association data collected on a population using GoG sampling and a discrete time recording rule. Each individual had a trait value ܶ drawn from a normal distribution. By assigning individuals with the highest trait values to the largest observed groups X (ranging from 1 to 10) or by assigning individuals to observed groups randomly, the simulation created, respectively, scenarios where individuals' traits impacted their spatial associations or scenarios where individuals' traits had no impact on their spatial associations.
In addition, a conceptual modification to the simulation was done to mimic the effect of a

DISCUSSION
In the present study, we developed a Global Index (GI), an approach that weights dyadic association/interaction indices according to their respective sampling effort. Simulations show that this method returns acceptable rates of false negatives and false positives errors, with or without biases of observations. The GI approach can be used for both directed and undirected behaviours using focal sampling, scan sampling or Gambit of the Group data collection protocols, and can be used for first-order (degree and strength) and second-order (eigenvector) social network measures. Our simulations show that pre-network permutations as well as parametric tests for ASNA return unacceptable rates of false negatives and false positives, even using the GI approach, and suggest these should be avoided for ASNA research to ensure reliable hypothesis testing. Finally, we also provide a method to estimate and control for non-social biological confounding factors using the residuals of individuals' social measure values regressed on the estimate of confounding factors, showing reliable results. One major asset of this approach is that it can be combined with the GI to account for multiple confounding factors at once and takes into account the data structure (e.g., repeated measurements, spatial or phylogenetical observation clustering, non-Gaussian distribution). Together with the growing interest and use of graph theory for research on social complexity, variance analysis (e.g., intraclass correlation coefficient for the study of repeatability) is starting to be used in ASNA 37 and, to date, hypothesis testing reliability for those approaches have not been tested and should thus be considered cautiously. Similarly, temporal analyses of individuals' sociality is an important part of ASNA to understand sociality dynamics arising from demographic 38,39 variation, environment 40 , and ontogeny 41 . However, as for variance analyses, those require further testing of the reliability of the mixed models that are used to study them. Nonetheless, our results show that high rates of false negatives and false positives are not related to the permutations themselves but rather to an issue with control of observation time heterogeneity. We expect that "node label" permutations with GI approach we propose here is also reliable for variance analysis or other more complex hypothesis testing approaches, although further tests are needed. By providing a reliable approach for a wide range of scenarios, we propose a novel methodology in ASNA with the aim of better understanding animal sociality and animals' societies from a mechanistic, ecological and evolutionary perspective. AKNOWLEDGMENT