Comparative architectures of direct and social genetic effects from the genome-wide association study of 170 phenotypes in outbred laboratory mice

Social genetic effects (SGE, also called indirect genetic effects) are associations between genotypes of one individual and phenotype of another. SGE can arise when two individuals interact and heritable traits of one individual influence the phenotype of the other. To better understand the architecture of SGE, we re-analysed an existing dataset comprising 170 behavioural, physiological and morphological phenotypes measured in outbred laboratory mice. For all phenotypes and in order to compare SGE with better-known direct genetic effects (DGE, associations between an individual’s genotypes and their own phenotype), we analysed polygenic models with random terms for SGE and DGE and performed the genome-wide association study of both SGE (sgeGWAS) and DGE (dgeGWAS). Our analyses yielded two main insights: first, SGE and DGE acting on the same phenotype generally arise from partially different loci and/or loci with different effect sizes; secondly, individual SGE associations typically explain less phenotypic variance than DGE associations. Our results shed light on the architecture of SGE and have important implications for the design of future studies. Importantly, we detail and validate methods that can be used for sgeGWAS in outbred populations with any levels of relatedness and group sizes, and provide software to perform these analyses.

SGE can be used as a tool to identify traits of social partners affecting a 53 phenotype of interest. For example, a candidate gene study of SGE on plumage 54 condition in laying hens 13 found an indirect association with the serotonin receptor 2C 55 gene. As the serotoninergic system is known to control behaviour, this SGE 56 association is consistent with observations of cage mates influencing the plumage of 57 a focal hen through feather pecking. When traits of social partners affecting the 58 phenotype of interest are unknown, the genome-wide association study of SGE 59 (sgeGWAS) may be a promising avenue. Indeed, similarly to how GWAS of 60 "traditional" direct genetic effects (DGE, effects of an individual's genotypes on its own 61 phenotype) has provided valuable insights into the "within-body" pathways affecting 62 disease and quantitative phenotypes 14 , sgeGWAS could help dissect the "between- Results 109 110 Genome-wide genotypes and 200 behavioural, physiological and morphological 111 phenotypes for 2,073 commercially-available, outbred CFW mice were available from 112 Nicod et al. 26 and Davies et al. 27 . Males were always housed with males and females 113 with females, and mice were left undisturbed in their cages for at least nine weeks 114 before phenotyping started (Figure 1c). We only kept mice that had the same two cage 115 mates over the course of the experiment (1,869 total). Furthermore, we excluded 57 116 mice that formed genetic substructures (see Methods) so that the remaining 1,812 117 mice were as equally related as possible while retaining as large a sample size as 118  Figure 1), so we limited the analysis of r to 28 145 phenotypes for which both aggregate SGE and aggregate DGE explained more than 146 5% of phenotypic variation. The value of r varied between -0.28 (+/-0.35) and 1(+/-147 0.09) across these traits, with an average of 0.42 (Figure 2 and Supplementary Table  148 1). For 10 out of the 28 phenotypes where r could be more precisely estimated, r was 149 significantly different from zero (nominal P < 0.05), suggesting that loci affecting a 150 phenotype directly also sometimes influence the same phenotype of cage mates. The strongest evidence for shared SGE and DGE loci (r ¹ 0 at Bonferroni-corrected P < 152 0.05) was for healing from an ear punch, weight of the adrenal glands, serum LDL 153 cholesterol levels, and mean platelet volume. 154 We also evaluated evidence that |r| was different from one (i.e. r different from 155 one and minus one) in order to empirically evaluate the widely-influential model of 156 "phenotypic contagion". Phenotypic contagion or "spread" is a model for social effects 157 whereby the phenotype of interest of a focal individual is affected by the same 158 phenotype of their social partners. In humans, cognitive susceptibility to depression, 159 alcohol consumption, stress, obesity and educational attainment, for example, have 160 been shown to be "contagious" or "spread" across college roommates, spouses, 161 friends, or parent/offspring 10-12,28-32 . As a result, phenotypic contagion has shaped the 162 way we think about social effects: for example, phenotypes unlikely to spread have 163 been used to cast doubt on social network effects 33 . Here we leveraged the parameter 164 r to test whether phenotypic contagion was sufficient to account for SGE: under a 165 model of pure phenotypic contagion, |r| is expected to be equal to one; on the contrary, 166 if traits of social partners other than the phenotype of interest mediate social effects, 167 |r| is expected to be different from one. We found that |r| was significantly different 168 from one (nominal P < 0.05) for 10 out of 28 phenotypes. The most significant P value 169 (0.00066, significant after Bonferroni correction) was found for immobility in the first 170 two minutes of the Porsolt swim test, a measure of helplessness that is relevant to 171 depression. This latter result suggests that phenotypes that spread may additionally 172 be affected by other traits of social partners. These results motivate the use of 173 sgeGWAS as a tool to more broadly understand social effects. 174 as a covariate. We hereafter refer to this strategy as "conditioning". We found that 192

Figure 2 Correlation r between SGE and DGE random effects (see Methods
conditioning was necessary to avoid spurious associations in the sgeGWAS due to  Table 3). 220 There was no overlap between genome-wide significant SGE and DGE loci 221 acting on the same phenotype. However, variants at genome-wide significant SGE 222 loci were enriched in small P values in the corresponding dgeGWAS (Supplementary 223 Figure 4). Together these results suggest a partially distinct basis for SGE and DGE 224 acting on the same phenotype (i.e. partially different loci and/or effect sizes), which is 225 consistent with the results from the analysis of the correlation parameter r.   point to yet unknown traits of cage mates that influence the five phenotypes above. 257 Of these candidate genes, one, Epha4, has previously been associated with 258 the phenotype of interest. Epha4 expression in the hippocampus was found to be 259 affected by chronic mild stress in mice and responsive to antidepressant treatment 37 . 260 We also found suggestive DGE of Epha4 on helplessness (Supplementary Figure 5), 261 confirming that some level of phenotypic contagion was likely for that phenotype. The 262 other candidate genes did not immediately permit to generate hypotheses on the traits 263 of cage mates mediating the social effects. To gain such insights from the results of 264 sgeGWAS, it is likely that other data types (e.g. gene expression) will need to be 265 integrated. Alternatively, larger sample sizes would permit identification of additional 266 SGE loci, some of which might immediately provide insights into the traits of partners 267 that mediate social effects. SgeGWAS, in that respect, is similar to dgeGWAS 14,38,39 . 268 269

Architecture of SGE and comparison with that of DGE 270
Despite being carried out on the same individuals and phenotypes, and in a perfectly 271 analogous manner, sgeGWAS identified fewer genome-wide significant associations 272 than dgeGWAS (24 associations for 17 phenotypes and 121 associations for 63 273 phenotypes respectively). As the determinants of power for SGE have not been 274 investigated, it is not clear whether we had more or less power to detect SGE 275 associations compared to DGE associations. In order to get a better understanding of 276 this issue, we simulated local SGE or DGE arising from a single causal variant and random groups of two or three mice per cage, and simulated phenotypes arising from 279 the sum of local genetic effects (DGE or SGE), polygenic effects (DGE and SGE), and 280 non-genetic effects. We simulated local SGE according to two alternative generative 281 models, both consistent with the analysis model used for sgeGWAS: an "additive" and MAF of the causal variants, but also by the way SGE arise across cage mates 302 (additively or not) and the number of cage mates. In the real data, the way SGE arose 303 across cage mates is not known, so it is not possible to determine the primary cause 304 for the smaller number of genome-wide significant SGE associations compared to 305

DGE associations. 306
Comparing genome-wide significant SGE and DGE associations in terms of 307 proportion of phenotypic variance explained yielded two main results: firstly, individual 308 genome-wide significant SGE associations explained a maximum of 2.5% of 309 phenotypic variance, while eleven genome-wide significant DGE associations 310 explained more than 5% of phenotypic variance and up to 40% (Figure 3b, SGE and 2.7% for DGE associations. As these results are born from the analysis of 313 170 phenotypes, it suggests that SGE associations will generally be more difficult to 314 detect than DGE associations. Secondly, for each phenotype we compared the 315 variance explained jointly by all genome-wide SGE (respectively DGE) associations 316 to the variance explained by SGE (respectively DGE) in aggregate. We found that 317 genome-wide significant associations explained a large proportion of the 318 corresponding genetic variance for both SGE and DGE (Figure 3c). More precisely, 319 across 5 phenotypes with aggregate contribution of SGE greater than 5% and at least 320 one genome-wide significant SGE association, we found that an average of 32.5% of 321 the aggregate variance was explained by genome-wide significant associations. Variance explained by all genome−wide significant associations for a given phenotype in outbred laboratory mice, using polygenic models and GWAS. Our results provided 353

associations. (c) Comparison, for each phenotype, of the variance explained by social 343 (red) and direct (black) genetic effects in aggregate (x axis) and the total variance
two key insights into the architecture of these complex traits: first, SGE and DGE 354 acting on the same phenotype typically arise from partially different loci and/or loci with 355 different effect sizes; secondly, SGE associations tend to explain less phenotypic 356 variation than DGE associations. As we analysed a broad range of phenotypes, the 357 insights we gained are likely to generalize to other populations and phenotypes. 358 For 10 phenotypes we uncovered evidence that SGE and DGE were 359 significantly correlated. For example, r was significantly different from zero for the two 360 measures of helplessness included in this dataset. This result is consistent with prior 361 evidence that mood spreads across social partners 28,43 . It is also consistent with the 362 observation that, in this study, two out of the three genome-wide significant SGE loci 363 for helplessness have suggestive direct effects on helplessness -direct effect that are 364 further supported by prior reports that Epha4, the candidate gene at one of the loci, is 365 associated with depression and responds to antidepressant treatment 37 . The 366 pathways that mediate non-zero correlations between SGE and DGE for other 367 phenotypes were not always obvious (e.g. healing from an ear punch, serum LDL 368

cholesterol levels) but warrant further investigation of SGE and DGE. 369
A key result from our study is empirical evidence that phenotypic contagion is 370 often not sufficient to account for social effects, even when it does play a role. Indeed, 371 for 10 out of 28 tested phenotypes we found that |r| was significantly different from efforts to discover other traits of social partners that mediate social effects, and points 374 to sgeGWAS as a way to do so. It is important to bear in mind, however, that SGE 375 only capture the genetic component of the traits of partners that mediate social effects. 376 Hence, traits that are mostly non-genetically determined will be missed by SGE 377

studies. 378
Our results on the variance explained by individual SGE loci are an important 379 contribution towards understanding the architecture of SGE and will help design future 380 experiments such as sgeGWAS. In particular, the fact that SGE loci never explained 381 a large fraction of phenotypic variance (max 2.5%), while in comparison 11 DGE loci 382 explained more than 5% of phenotypic variation, shows that sgeGWAS will require 383 larger sample sizes than dgeGWAS to be equally powered.  Mice were four to seven weeks old when they arrived at the phenotyping facility. They 416 were grouped with their cage mates and then spent nine to twelve weeks undisturbed 417 in quarantine. They spent a further four weeks together during phenotyping. Males 418 were always housed with males and females with females. 419 Cage assignments were not included in the publicly available dataset but were 420 provided by the authors upon request and are now provided in Supplementary Table  421 4. Cage assignments were recorded at eleven time points throughout the study and 422 showed that a few mice were taken out of their original cages and singly housed, 423 presumably because they were too aggressive to their cage mates. When this 424 happened, we excluded all the mice in that cage from the analysis. We also excluded 425 cages where some of the mice were "genetically close" (as defined below) to many 426 other mice. Finally, we only retained cages with exactly three mice per cage. Although 427 from the sleep test on all mice were singly housed, we still investigated "persistent" 428 SGE on sleep and tissue phenotypes (persistence over one day for sleep phenotypes 429 and over a few days for tissue measures). where A is the GRM and I the identity matrix. 468 The phenotypic covariance is: ( 1 , C ) 470 = 9 : ; 1,C + 9 := + 9 = ; ( > ) 1,C + ? : ; 1,C + ? := { ( > ) 1,C 471 + ( > ) 1,C } + ? = ; ( > ) 1,C + A ; ( > ) 1,C 472 The variances explained by DGE and SGE were calculated respectively as 473 Significance of variance components was assessed using a two-degree of freedom 485 log likelihood ratio (LLR) test (i.e., the test statistics was assumed to follow a two-486 degree of freedom chi2 distribution under the null). Note that this testing procedure is 487

conservative. 488
The Q value for the aggregate contribution of SGE was calculated for each phenotype 489 using the R package qvalue 49  The correlation between ) andwas calculated as: 495 = 9 := 9 : × 9 = 496 497 r reflects the correlation between SGE and DGE acting on the same phenotype, 498 similarly to how "traditional" genetic correlations measure the correlation between 499 DGE on two traits; r can actually be interpreted as the correlation between DGE on the traits of cage mates mediating social effects and DGE on the phenotype of interest 501

Identification of genome-wide significant associations 549
Because we wanted to compare the architecture of DGE and SGE for each phenotype 550 independently, we adopted the per-phenotype FDR approach used by Nicod et al. 26 . 551 Had we used a study-wide FDR approach instead, the comparison of SGE and DGE 552 loci for a given phenotype would have depended on the SGE and DGE loci identified 553 for the other phenotypes in the dataset. 554 The procedure we used to control the FDR accounts for the fact that we report 555 loci rather than individual variants 53 , where a locus is defined as the 1.5 Mb-wide 556 window around a SNP (this window size is the average 95% confidence interval for 557 DGE QTLs in 26 ). More precisely, for each phenotype and for each type of genetic 558 effect (social and direct), we ran 100 "permuted GWAS" by permuting the rows of the 559 matrix of social (respectively direct) genotypes, and testing each variant at a time using 560 the permuted genotypes together with the un-permuted phenotypes, covariates, GRM 561 and matrix of direct (respectively social) genotypes (for conditioning). See 52,54 for 562 references on this permutation approach. For each permutation we then compiled a 563 list of loci that would be significant at a nominal P value of 0.01. Using the un-permuted 564 data, we similarly compiled a list of loci that would be significantly associated at a 565 nominal P value of 0.01. Ordering the latter in order of decreasing significance and going down the list, we calculated for each locus an associated FDR until the FDR 567 was above 10%. For a given P value x, the FDR was calculated as: 568 We report only those loci whose P value corresponds to an FDR < 10%. 571 572 Definition of candidate genes at associated loci (Table 2) 573 574 At each significantly associated locus we defined a 1.5Mb window centred on the lead 575 variant, identified all the variants that segregate in this window based on the full set of 576 7M variants, and reran the sgeGWAS locally with all the variants at the locus. We 577 highlighted those genes that are located within the most significantly associated 578 segments and whose MGI symbol does not start by 'Gm', 'Rik', 'Mir', 'Fam', or 'Tmem' 579 in order to enrich the reported sets in genes with known function. Phenotypes were simulated based on the real genotypes but random cages for a 629 random subset of 1,800 mice (in order to be able to draw full cages of 2 or 3 mice). 630 Phenotypes were simulated as the sum of random effects, local DGE and local SGE 631 We simulated local SGE and DGE at variants where direct and social 635 genotypes were either lowly correlated (Spearman correlation negative log P value < 636 0.05) or more highly correlated (Spearman correlation negative log P value > 0.2), and 637 had with low MAF (MAF < 0.05), medium MAF (0.225<MAF<0.275) or high MAF 638 (MAF>0.45). We simulated local DGE with an allelic effect of 0 or 1 (1 corresponds to 639 a large effect in the real data). We simulated local SGE under two alternative 640 generative models: an "additive" model by using as in model (2)   Power was calculated at a genome-wide significance threshold of negative log P 5, 651 which is similar to the significance of associations detected at FDR < 10%. 652 The results we show in Figure 3a are based on a subset of simulations with 653 group size 2 and 3, no local DGE, and averaged across high and low genotypic 654 correlations. Power was also calculated at a genome-wide significance threshold of 655 negative log P 5. 656