Testing implications of the polygenic model for the genetic analysis of loci identified through genome-wide association

It has been proposed that many loci with no significant association in GWA studies can nonetheless contribute to the phenotype through modifier interactions with the core genes, implying a polygenic determination of quantitative traits. We have tested this hypothesis by using Drosophila pupal phenotypes. We identified candidate genes for pupal length determination in a GWA and show for disrupted versions of the genes that most are indeed involved in the phenotype, presumably forming a core pathway. We then randomly chose genes below the GWA threshold and found that three quarters of them had also an effect on the trait. We further tested the effects of these knockout lines on an independent behavioral pupal trait (pupation site choice) and found that a similar, but non-correlated fraction of them had a significant effect as well. Our data thus confirm the prediction that a large number of genes can influence independent quantitative traits. Impact statement Quantitative traits are similarly likely influenced by randomly picked loci as by loci identified in a genome-wide association study.


Introduction 40
Organismal phenotypes, i.e., traits measured in whole organisms usually have a quantitative distribution and their 41 genetic architecture can be studied by genome-wide association (GWA) approaches. In the past years, these approaches based phenotyping pipeline (Reeves and Tautz, 2017) for the high-throughput measurement of pupal case length. The 97 same approach allows also to score the independent phenotype of pupation site choice (pupation height) and for both 98 we have shown that a very large number of phenotypes can be measured with high precision (Reeves and Tautz, 2017; 99 Zhang et al., 2020). attached in the wall. A plastic sheet that lines the wall is included inside the vial, and is ready to be taken out for being photographed

178
The DGRP lines were generated in a way that population structure impacts should be minimized, but some genetic 179 relatedness leading to cryptic population structure might still exist (Mackay and Huang, 2018). To examine whether 180 any cryptic population structure could contribute to the observed pupal case length variation of DGRP inbred stocks,

186
In order to correct any potential influence from the cryptic population structure, we decided to use a linear mixed model

188
Approximately 45% of the DGRP strains harbour at least one type of major genomic inversion (Huang et al., 2014) 189 (Supplementary Table S1b , 2020). The mapping revealed a number of associations above the chosen threshold ( Figure 3A) and the qq-plot 199 shows that there is substantial genetic variation above the random expectation ( Figure 3B).

212
Candidate genes from the GWA analysis

213
We found 50 significant SNPs to be associated with pupal case length in the DGRP strains, corresponding to 67 214 associating genes that locate within 5kb up/down-stream (default setting in SnpEff (Cingolani et al., 2012)) of these variation (Table 2-figure supplement 1).

217
To identify possible additional candidate genes associated with the variants, we examined the long-range linkage 218 disequilibrium (LD) between pairs of detected candidate variants and with other genetic variants found in the DGRP 219 strains. LD blocks were then calculated for each significant genetic variant with a commonly used threshold r 2 = 0.8 220 (Pallares et al., 2014), and 17 significant LD blocks were found with average block size of 19.3 kb (Table 2-table   221 supplement 1). This finding is in line with the observation of a rapid decay of LD in the panel strains reported in the 222 original DGRP resource reference (Huang et al., 2014). Combining the additional genes identified in the above LD 9

279
The panel (A) shows the gene expression enrichment fold changes (i.e., the ratios between the fractions of expressed GWA candidate 280 genes and total coding genes, in log2 scale) between GWA candidate genes and total coding genes, and the panel (

310
It should also be noted that these strains may 1) have different co-isogenic backgrounds, 2) have disruption in different

319
Phenotypic effects on an independent phenotype 320 If most genes have an effect on one quantitative trait, one would expect that the same is true for a second trait. As

358
The Drosophila system allows to systematically using both, a GWA approach, as well as the classic knock-out 359 approach to assess the effect of loci on given phenotypes. Furthermore, the availability of automated phenotyping 360 pipelines allows to score a large number of individuals under highly controlled conditions and thus get a statistically 361 sound resolution even for small effects. We have explored this here for two pupal phenotypes that are particularly 362 amenable to analysis through automatic image processing.

363
Using the DGRP lines that represent segregating wildtype alleles and the commonly used corrections for confounding 364 factors, we were able to identify a set of 90 candidate genes within 50 associated genetic loci for the pupal size 365 phenotype (Table 2 and Table 2-table supplement  showed it for the pupation site choice phenotype (Zhang et al., 2020).
are expressed in the relevant developmental stage and organ(s). We thus set out to use gene disruption experimental 380 tests on random genes with expression in the relevant developmental stages. The GWA p-value for these 45 randomly 381 chosen genes is below any threshold that one would normally consider using. Accordingly, none of them would have 382 been identified as GWA candidate genes. However, our results showed that approximately three quarters of them 383 actually had significant effects (p-value 0.05) on the pupal case length phenotype. Intriguingly, a similar fraction, 384 but only partially overlapping loci, showed also an effect on the behavioural pupation site choice phenotype.

385
One can ask why these randomly chosen genes, especially the ones showing significant phenotypic effects on pupal 386 phenotypes, were not picked up by GWA. Actually, there is no observed significantly lower genetic density or allele 387 frequency of random chosen genes with significant pupal case length phenotypic effects compared with GWA 388 candidate genes ( Figure 6-figure supplement 2). Therefore, the reason why they have not shown up in the GWA has 389 to be that they do not include segregating variants of sufficient effect size in the population from which the DGRP was 390 derived. In fact, one can expect that most genetic variants present in a natural Drosophila population are unlikely to 391 represent gene disabling mutations. And the variants that are gene disabling should be rare because they should be 392 selected against, i.e., they should also only seldomly occur as homozygotes. Hence, to test these genes as homozygous 393 gene disruptions is rather unnatural. Still, it is an indication that these genes are involved in some form in the phenotype.

394
This is a general issue when comparing gene effects from a GWA analysis with those from classic genetic analyses.

395
The former trace the effects of naturally occurring variants, the latter the ones of gene disruptions. These converge to control this factor.

415
In total, 14 wild type stocks, 198 DGRP inbred lines, 58 transposon insertion mutagenesis stocks (disruption for 55 416 annotated protein coding genes) and 3 corresponding progenitor stocks were assayed in this study. The detailed 417 description and measurement data of these stocks can be found in Supplementary Table S1.

419
Phenotyping of pupal case length 420 We conducted the measurement of pupal case length by adopting an earlier established image-analysis based automated 421 phenotyping pipeline as described in (Reeves and Tautz, 2017;Zhang et al., 2020). The entire phenotyping pipeline is

429
2) Mating flies. Approximately ten 2-5 days old healthy female flies (fifteen for inbred stocks to compensate the 430 reduced fertility) and five similar-aged male flies were introduced into each vial, under the incubation condition as 431 mentioned above. These adult flies were cleared from the vials after 1-2 days introduction and vials were kept in the 432 same incubation condition for another 8-9 days to allow them to reach pupation stage. In case of observing that most 433 of offspring in the vials were present as pupae attached to the transparent film, the film was then gently taken out from 434 each vial, and the food from the lower part was scrapped away and any viable larvae were also removed.

435
3) Film Photographing. Once removed from the vial, the film (with the barcode label affixed) was then placed into a 436 custom plastic frame, which holds the film flat for further photographing. Frames were then photographed in a light 437 tight box with a sliding door, which provided illumination only from underneath the frame, effectively silhouetting the 438 pupae while minimizing tangential shadows. Every photograph included a 1 euro cent coin (16.25 mm diameter), for

4) Image analysis.
We applied the open-source image analysis software CellProfiler (v2.1.0) (Carpenter et al., 2006) 442 for the recognition of pupae and measurements of a variety of attributes, with a customized pipeline adopted from computed the statistical significance on the pupal case length difference between these two groups via a Wilcoxon 481 rank-sum test.

482
Secondly, we conducted a direct experimental test to compare the changes of pupal case length measurement between

483
Wolbachia infected stocks and Wolbachia-free stocks, which were created though two generations of tetracycline 484 treatment on the infected stocks (by adding an appropriate volume of 100 mg/ml of tetracycline suspended in 99% 485 ethanol to the surface of the solid prepared food) and then reared for at least another two generations with standard 486 food to avoid any detrimental parental effects (Zeh et al., 2012). We firstly ruled out the possibility that tetracycline with <20% missing values and ≥5% minor allele frequency (MAF) were kept (corresponding to 1,903,028 genetic 506 variants) for further analysis. We assessed the possible influence of population structure on the pupal case length 507 measurement in DGRP inbred lines, by using the PCA module from PLINK v1.90 (Purcell et al., 2007) to identify 508 principal components (PCs) from the above filtered genetic variant data. We firstly checked the clustering status on two PCs, and then examined the correlation between pupal case length values and the top 20 PCs by using Pearson's 512 correlation tests.

513
To explore the potential contribution of major genomic inversions to the population structure within DGRP strains, we

525
with pupal case length measurement as the dependent variable and DGRP stock names as a random factor. Meanwhile,

526
we compared the estimates of H 2 computed with the same methodology, but from different Drosophila melanogaster 527 strains datasets (Reeves and Tautz, 2017). The estimates on the narrow sense heritability (h 2 ) of pupal case length 528 based on mid-parent regression were directly retrieved from (Reeves and Tautz, 2017).

529
We partitioned the fraction of pupal case length phenotypic variance explained by the genetic variants from each Where Gv is the variance of the genotype at the focal genetic variant, and Pv is the total variance of the pupal case 550 length phenotype. β is reported as "SNPWeight" for each genetic variant in the FastLMM output.

551
We predicted the effects of GWA significant genetic variants by using SnpEff (Cingolani et al., 2012) with default 552 parameters, and taken all the corresponding affected protein-coding genes as associating genes. In short, all the protein-553 coding genes within 5 kb up/down-stream of focal genetic variant were taken as its associating genes. We also tested 554 the genotypic linkage disequilibrium (LD) for each GWA significant genetic variant and other genetic variants by 555 calculating the squared correlation estimator r 2 with PLINK v1.90 (Purcell et al., 2007

571
We trimmed and filtered the low-quality raw Illumina RNA-Seq reads sequenced from the sample of each in (Zhang et al., 2020). In brief, the pupation height was defined as the distance from the vertical coordinate of pupation 589 site (pupal centre) to the food surface in the vial in millimetre (mm). As the pupation height measurements are sensitive 590 to the pupal density in the vial, the raw measurements for pupation height were further adjusted according to the 591 equations described in (Zhang et al., 2020).

621
Additionally, we also experimentally tested the phenotypic effects of 45 selected genes, on the basis of the following We compared the genetic variant density (# of genetic variants per kb region) pattern between GWA candidate genes 644 and randomly chosen genes (the subset which have significant phenotypic effects on pupal case length, p-value 0.05),