Abstract
Artificial selection has, for decades, provided a powerful approach to study the genetics of adaptation. Using selective-sweep mapping, it is possible to identify genomic regions in populations where the allele-frequencies have diverged during selection. To avoid misleading signatures of selection, it is necessary to show that a sweep has an effect on the selected trait before it can be considered adaptive. Here, we confirm candidate selective-sweeps on a genome-wide scale in one of the longest, on-going bi-directional selection experiments in vertebrates, the Virginia high and low body-weight selected chicken lines. The candidate selective-sweeps represent standing genetic variants originating from the common base-population. Using a deep-intercross between the selected lines, 16 of 99 evaluated regions were confirmed to contain adaptive selective-sweeps based on their association with the selected trait, 56-day body-weight. Although individual additive effects were small, the fixation for alternative alleles in the high and low body-weight lines across these loci contributed at least 40% of the divergence between them and about half of the additive genetic variance present within and between the lines after 40 generations of selection. The genetic variance contributed by the sweeps corresponds to about 85% of the additive genetic variance of the base-population, illustrating that these loci were major contributors to the realised selection-response. Thus, the gradual, continued, long-term selection response in the Virginia lines was likely due to a considerable standing genetic variation in a highly polygenic genetic architecture in the base-population with contributions from a steady release of selectable genetic variation from new mutations and epistasis throughout the course of selection.
Background
Adaptation is a dynamic evolutionary process where populations improve their fitness by accumulating beneficial alleles at loci controlling adaptive phenotypes. The polymorphisms contributing to adaptation can either be present as standing genetic variation at the onset of selection or emerge through mutations. A long-standing challenge in quantitative and evolutionary genetics has been quantification of the relative contributions from standing and emerging variation to long-term selection response [1, 2]. Although, such results are very difficult to obtain in studies of natural populations. Artificial selection provides an approach to study the origin and fate of beneficial mutations during adaptation [2].
Subjecting populations to artificial selection provides an accelerated evolutionary process that may result in extreme phenotypes with accompanying changes across the genome [2-4]. Using such experiments, the contribution by mutational variance to the evolution of quantitative traits can be quantified by, for example, measuring the release of genetic variance during selection from an inbred founder population [5]. Estimating the contribution from standing genetic variation to long-term selection response is, however, more complex. Whereas short-term contributions can be estimated based on the immediate selection response in a selection-experiment starting from outbred founders, long-term contributions of standing variation are more difficult to estimate because of confounding with effects of mutations that emerge over time. Other approaches are therefore needed to disentangle the contributions of standing variation from other sources of selectable additive variation such as new mutations or selection induced additive variation [6-9].
Here, we estimate the contribution of standing genetic variation to long-term selection-response in an experimental population developed by bi-directional selection from a common, segregating founder-population. This experimental design allows for the separation of contributions from standing genetic variation and new mutations even when molecular data on the base-population is missing [3, 4]. The key is to identify “soft sweeps” between the selected populations that result from selection on standing variation [10-13]. Here, the genetic effects of a large collection of such divergently fixed “soft” selective sweeps [3] were estimated to predict their contribution to selection response in the Virginia chicken lines. They provided insights into the dynamic processes involved in shaping the genetics of complex traits during adaptive evolution.
The base-population for the Virginia lines was established in 1957 by intercrossing 7 partially inbred lines of White Plymouth Rock chickens. The genetic variation entering the population thus represents a sample of the polymorphisms present when these partially inbred lines were founded. Since the founder population was established, the high (HWS) and low (LWS) body-weight lines have been bred with one new generation per year by single-trait, bi-directional selection for 56-day body-weight [14-16]. The response to selection has progressed steadily throughout the experiment, resulting in an 8-fold difference in 56-day body-weight after 40 generations of selection and currently, in the 57th generation, there is a 16-fold difference between the lines (Figure 1). Genome-wide comparisons between the divergently selected lines have identified more than 100 candidate “soft” adaptive sweeps between them [3, 4]. The contribution to selection response of these candidate selective-sweep regions that originate from the standing genetic variation is still unknown. Here, we identified which of these candidate selective-sweeps contributed to adaptation and estimated their individual and joint contributions to the adaptive trait. Further, because the confirmed “soft” sweeps originated from the standing genetic variation in the base-population, inferred will also be the amount of new selectable variation released during selection and then contributed to the long-term selection response.
This study describes a genome-wide approach to explore the contributions from a large number of selective-sweeps [3] to selection-response and adaptation in the Virginia chicken lines. A deep intercross population from HWS and LWS from generation S40, and genotyping and phenotyping a large F15 Advanced Intercross Line (AIL) generation for the selective-sweep regions, allowed us to estimate the contribution of these sweeps to the adaptive trait: 56-day body-weight. We found that the standing genetic variation at a large number of loci in the base-population was the major contributor to the selectable additive variance during adaptation. The highly polygenic genetic architecture for the selected trait in the base-population, together with a steady release of selectable genetic variation from new mutations and epistasis, provided a likely explanation for the gradual, continued, long-term response to selection in the Virginia lines.
Results
Evaluated were the contribution by 106 selective-sweeps, where the high and low weight selected Virginia lines were fixed for alternative alleles, to selection response [3, 4]. Using genotypes for 252 markers in these sweeps, 99 clusters of markers that segregated independently were identified in the F15 generation of the AIL. In total, 38 of these regions were covered by a single marker and 61 regions by multiple markers. The physical length of the regions covered by multiple markers varied from 0.02 to 6.7 Mb and were distributed across most autosomes in the chicken genome (Figure 2; Additional file 1).
Many adaptive selective-sweeps have contributed to selection-response
To estimate how many of the independently segregating regions contributed to 56-day body-weight in the AIL F15 generation, we initially selected one representative marker from each of the 99 targeted regions via a within-region, backward-elimination analysis. Then, an across-region analysis was performed to identify the set of regions that jointly contributed to the adaptive trait. A multi-locus, backward-elimination analysis was used where the final set of loci were selected at 5 and 20% False Discovery Rate (FDR) thresholds [17, 18]. The potential influences of population-structure were controlled using bootstrapping [19]. Table 1 summarizes the loci associated with 56-day body-weight at 5 and 20% FDR significance thresholds. Many of the 99 evaluated sweeps were associated with 56-day body-weight in the F15 AIL, 8 at a 5% and 16 at a 20% FDR. Thus, we confirmed that a large number of the candidate selective-sweeps identified previously [3] were adaptive selective-sweeps. Further, the standing genetic variation for 56-day weight in the base-population for the Virginia lines was highly polygenic, and considerable genetic variation originating from the standing variation has been exhausted after 40 generations of selection via the fixation within the divergent lines across these loci.
The individual adaptive selective-sweeps have small allele-substitution effects
To estimate the contribution by the individual adaptive sweeps to the selected trait, additive, allele-substitution effects were estimated using a multi-locus association analysis. We found that the additive effects of the individual loci were generally small, and no individual locus had an allele-substitution effect greater than 29 g (or 0.2 σP) for the selected trait (Table 1; Figure 3). The effects were similar for most of the loci when estimated in the F15 population (Figure 3A). Most of the HWS derived alleles increased weight, however, two regions also had transgressive effects on the trait - i.e. that an allele inherited from the LWS increased weight. The standing genetic variation in the base-population for the Virginia chicken lines thus has contributed with alleles of small effect across a highly polygenic genetic architecture.
Large contribution by standing genetic variants to long-term selection response
When the joint contribution of the 16 adaptive selective-sweeps to the realised selection-response was estimated, none of the intercross chickens were HWS/HWS or LWS/LWS homozygous across all 16 regions. This provides a likely explanation for why the phenotypic range in the F15 intercross does not reach the high and low phenotypes of the HWS and LWS lines (Table 2). Therefore, we predict the joint contributions of the adaptive selective-sweeps to the divergence between the lines based on the average allele-substitution effect for HWS alleles across the selective-sweep regions shown to contribute to 56-day body-weight. The average estimates for the sweeps that were significant at 5/20% FDR were 15.7±2.5 (p = 6.7 × 10-10) /16.9±1.6 (p < 1 × 10-16) g, respectively. The predicted total contribution of these regions to the 1242 g line-difference between HWS40 and LWS40 (Table 2) are then 16 HWS alleles x 15.7 g = 251 g (20.2%) for the 5% set, and 32 HWS alleles × 16.9 g = 542 g (43.6%) for the 20% FDR set. Both of these estimates are biased downwards due to the inclusion of the two transgressive selective-sweep regions in the analysis. Thus, the total contribution by the adaptive selective-sweeps from the standing variation in the base-population to adaptation is likely to be at least 40% of the total realised response at generation 40.
Contribution by standing genetic variation to the selectable additive genetic variance
The next step was to estimate the contribution of the associated sweeps to the genetic variance in the F15 generation of the AIL. The narrow-sense heritability for 56-day weight in the F15, estimated based on the pedigree kinship [20], was . The 8 regions associated at the more stringent 5% FDR threshold together explained nearly one fifth of the residual phenotypic variance in 56-day body-weight (19.1% of residual ). When we included the 8 additional regions selected at a 20% FDR, the estimate increased to 23.4% of . Thus, the associated selective-sweep regions contributed to approximately half of the additive genetic variance in this population (41.5 and 50.9% of for regions significant at 5/20% FDR, respectively). Accordingly, the confirmed selective-sweeps have been major contributors to the selectable additive genetic variance during this selection experiment. Further, this result also illustrates that fixation at these adaptive selective-sweep regions depleted about half of the total additive genetic variance for 56-day body-weight during first the 40 generations of divergent selection.
Relative contributions by standing genetic variation and other sources of variation to selection response
We estimated the amount of additive genetic variance present at onset of selection that was captured by the confirmed adaptive selective-sweeps by comparing the total additive genetic variation in the base-population obtained shortly after onset of selection [21] with the estimate of the additive genetic variance contributed by the adaptive selective-sweeps in the AIL F15 generation (Table 3). In total, the additive variance of the sweeps in the AIL corresponded to 70 to 86% of the total additive variance in the base-population. This result suggests that the adaptive selective-sweeps identified and confirmed here represented a major part of the standing genetic variation in the base-population. It also implies that other sources of variation, such as novel mutations and selection induced genetic variation, have contributed to the gradual long-term selection response in the Virginia chicken lines.
Discussion
Many selective-sweeps associated with 56-day body-weight
In total, 8/16 candidate selective-sweep regions were associated with 56-day body-weight at 5/20% FDR. Most of the loci selected in the 5% FDR set should be true positive associations as < 1 false positive is expected at this significance level, whereas a few false-positive associations could be present among the loci selected at 20% FDR.
In the original selective-sweep mapping study [3], simulations predicted that no less than 40%, but most likely considerably more than half of the candidate selective-sweeps originating from the base-population would result from selection rather than drift. In light of results reported here, this theoretical prediction still appears realistic. First, we confirmed that the genetic-architecture of 56-day weight in the base-population was highly polygenic and that the contributions by individual loci were small. Also, quantitative genetics estimates were in agreement with this because they also predicted that the confirmed sweeps did not represent all of the standing additive genetic variance in the base-population. As the power to detect loci with small individual effects was limited in the association analysis of only 800 F15 individuals, it is likely several loci with similar or smaller effects than those confirmed here contributed to the trait but remained undetected. Second, although the current study covered 99 candidate selective-sweeps, it was only a sample of all candidate selective-sweeps present in the Virginia-lines. Because some candidate sweeps would have been missed in the study [3], which was based on medium-density genotyping from a 60k SNP chip rather than whole-genome re-sequencing. Furthermore, some of the candidate selective-sweep regions previously reported [3] were missing from our dataset due to failed genotyping, whereas the effects of others could not be separated due to the resolution in the AIL F15 generation. To obtain a complete understanding of the genetic architecture of selection-response due to standing variation in the base-population of the Virginia chicken lines will require additional candidate selective-sweeps.
Base-population alleles contribute to selection-response via small individual genetic effects
The individual allele-substitution effects for the confirmed adaptive selective-sweep regions were small (Figure 3). No individual locus had an allele-substitution effect greater than 29 g (0.2 residual σP ; Figure 3; Table 1) corresponding to a contribution of less than 3% of the residual despite the average minor-allele frequencies being rather high (average MAF = 0.21).
The base-population for the Virginia selection experiment involved intercrossing seven moderately inbred lines of White Plymouth Rock chickens [14-16]. During inbreeding, alleles with negative effects on fitness are likely to be removed from a population. From earlier work, it is known that several of the major growth-promoting alleles that have emerged during domestic animal improvement programs also have negative pleiotropic effects on fitness. Extreme examples include the susceptibility to Malignant hypothermia by the Ryanodine receptor mutation that promotes growth in pigs [22] and the calving problems due to muscle-growth promoting Myostatin mutations in cattle [23]. Our finding that most alleles from the base-population have small effects on growth is consistent with the presence of a negative correlation between enhanced or reduced growth and fitness.
That most alleles originating from the standing genetic variation had small effects also follows the expectation from theoretical studies involving soft sweeps. The probability of fixation for alleles with small effects is higher when selection acts on standing genetic variation than on a new mutation as there is a higher likelihood that a weakly selected new mutation will be lost [12].
The alleles originating from the standing genetic variation in the base-population make a large joint contribution to the additive variance and selection-response
The confirmed adaptive sweeps contributed about 40% of the realised selection-response and half of the total additive genetic variance in the F15 population. A significant portion of the phenotypic divergence between the lines was thus due to alleles originating from the standing genetic variation in the base-population and for which the lines are now fixed for alternative alleles. This clearly illustrates the importance of the standing genetic variation in the base-population for the gradual, long-term response to selection observed throughout this experiment. Some of this can be explained by additional loci originating from the base-population that either had individual effects that were too small to be detected in the population-size available for this study, were located on candidate selective-sweeps that were fixed at generation 40 but not included in this study, or were in regions not yet fixed for alternative alleles at generation S40.
Two of the confirmed selective-sweeps were transgressive, i.e. the allele inherited from the low-weight selected line increased weight in the F15 generation. This confirms the theoretical work by Robertson, who in 1960 showed that alleles of small effect can be fixed by chance in the opposite direction to that of selection [24]. Similar findings have also been reported in studies of other artificially selected populations (see e.g. [25, 26]).
Overall, our findings are consistent with those reported for another long-term selection experiment: the Illinois corn selection lines [25, 27], in particular the highly polygenic genetic architectures with many loci of small individual effects as well as the presence of transgressive loci. This suggests that the selection on highly polygenic genetic architectures, where many loci make minor contributions rather than on emerging mutations with large effects, has been important for the long-term success of these selection experiments. Such studies will continue to deliver valuable insights not only to the general features of polygenic architectures contributing to morphological traits in vertebrates, but also the basic processes involved in accelerated evolution and adaptation.
Long-term selection response from the joint contributions of standing genetic variation, novel mutations and epistasis
The adaptive selective sweeps in the F15 generation of the AIL explained from 42 to 51% of the total additive genetic variance in this intercross, and 70 to 86% of the total additive genetic variance present for 56-day body-weight in the base-population (Table 3). Much of the standing genetic variation has been exhausted while producing an 8-fold difference in the selected trait during the first 40 generations of selection. This supports the theoretical expectation that standing variation will be an important contributor to the initial selection response for a population subjected to a novel selection pressure [28]. However, because standing variation was not exhausted even after an intense artificial long-term, single-trait selection, this suggests an adaptive value for standing variation over longer periods of time. This may be especially relevant in natural populations where selection is not as intense.
Further, we estimated that approximately 40% of the total selectable additive genetic variation that contributed to the long-term selection response emerged throughout the experiment from other sources of variation. Some contributions are expected from new mutations [28], however, contributions may involve other sources such as selection induced genetic variation [6]. Although we have not explicitly explored the contributions by these here, previous reports involving the Virginia lines have provided insights on this topic. For example, a major contribution has been reported from a network of interacting loci that via a capacitating epistatic mechanism was likely to have induced considerable selectable additive variation in response to selection [7-9, 29]. Moreover, a recent study of the within-line response to selection has also identified a novel allele that due to its rapid fixation within the high-weight selected line was likely a newly arisen major allele [4]. Further work is, however, needed to quantify these sources of variation in relation to that of the standing variation.
Conclusions
Here, we empirically confirm that the gradual response to long-term, bi-directional, single-trait selection in the Virginia chicken lines was, to a great extent, due to standing genetic variation across a highly polygenic genetic architecture in the common base-population. A large number of loci, each having small allele-substitution effects, were major contributors to the selectable additive variance during adaptation. Much of the variation from the base-population was fixed in the divergent lines after 40 generations of selection. The considerably larger total additive genetic variance present within and between the lines after 40 generations of selection suggests that important contributions have also been made by a steady release of selectable genetic variation from new mutations and epistasis. In summary, these results provide not only novel insights to the genetics contributing to the gradual, continued, long-term response to selection in the Virginia lines, but also to the fundamental genetic mechanisms contributing to selection and adaptation.
Methods
Ethics statement
All procedures involving animals used in this experiment were carried out in accordance with the Virginia Tech Animal Care and Use Committee protocols.
Animals and phenotyping
The animals used in this study were from an Advanced Intercross Line (AIL) bred from generation S40 parents from two lines of chickens divergently selected for juvenile body-weight: the Virginia high (HWS) and low (LWS) lines. The HWS and LWS lines were founded in 1957 from a base-population obtained by crossing seven partially inbred lines of White Plymouth Rock chickens. Since then, they have been subjected to bi-directional selection for a single trait, high or low 56-day body-weight (BW56), respectively, and currently the lines have reached generation 58 (Figure 1). More detailed information on the selected populations is available [14-16].
A complete description of the development of the Advanced Intercross population can be found in the publications describing the analysis of data from the F2-F8 generations of the AIL [29, 30]. The AIL (Figure 1) was founded by F0 parents from generation S40 of the HWS and LWS lines, whose sex-average BW56 at that generation were 1412 g (SE: ± 36 g) and 170 g (SE: ± 5 g), respectively. In total, 907 individuals were hatched in generation F15. Out of these, 852 survived until 56 days of age, when their body-weight was measured. More details on the phenotypes of the founder lines and the AIL are provided in Table 2.
DNA extraction
Blood samples were collected using a sterile needle and syringe, and then transferred to a tube containing disodium EDTA. DNA was extracted from whole blood samples using a Qiagen DNeasy kit.
Marker selection and genotyping
In an earlier study, Johansson et al. [3] identified a large number of SNPs that were fixed for alternative alleles across the genome of the Virginia lines based on 60K SNP chip genotypes obtained for individuals from generations 40 and 50 of the HWS and LWS lines. These SNPs were clustered into selective-sweep regions: 116 clusters containing 998 SNPs in generation 40 and 163 clusters with 1746 SNPs in generation 50. We selected 316 SNP markers to cover the 134 autosomal selective-sweep regions. As some markers were selected for fixation in generation 50, all were not fixed in the founders for the intercross obtained from generation 40. They were, however, all highly informative with an allele frequency difference ≥ 0.9 between LWS and HWS. Samples from all F15 birds with recorded 56-day weights (n=852) were sent for genotyping at the 316 selected SNPs using the GoldenGate assay (Illumina, CA) at the SNP technology platform in Uppsala (Sweden).
In total, 27 individuals and 64 SNPs failed to fulfill one or more of the following quality control criteria: call rate of individuals >0.9, call frequency of SNPs >0.9, minor allele frequency >0.05. They were removed from the subsequent analysis, resulting in a final dataset consisting of genotypes for 252 SNPs in 825 birds. In total, the failed genotyping resulted in a loss of 26 initially targeted regions, resulting in a final coverage for 106 regions [3], 72 of which were present in both generation 40 and 50, and 34 emerged at generation 50.
Reclustering and summaries of targeted selective sweep regions
After ordering the genotyped markers based on their physical locations on the galGal4 assembly, we redefined the clusters based on their genetic linkage in the F15 generation. This was to define clusters that segregated independently in the F15 generation in order to identify the number of independent, targeted selective-sweeps that contribute to 56-day body-weight in the lines. The criteria used for clustering markers were to assign adjacent markers that were no more than 50cM apart into the same cluster and limit the range of each cluster to cover no more than 100cM. According to these criteria, 99 independent regions were defined for use in the association analyses. The linkage between the markers in cM (Haldane mapping function) was estimated using the function est.map from the R/qtl package in R [31] where the function for an inbred intercross population could be used as the markers in this study are fixed, or nearly fixed, in the analysed population. Although a small number of the genotyped markers were not fixed for alternative alleles in the F0 founders, we decided that the obtained estimates of linkage were sufficiently precise for re-clustering of the markers.
In total, we then screened for associations between 252 markers that passed quality control and 56-day body-weight in 825 chickens from the F15 generation of the Advanced Intercross Line (AIL). To estimate how many of the independently segregating regions contributed to 56-day body-weight in the AIL F15 generation, we initially selected one representative marker from each of the 99 independently segregating clusters of markers, as defined above, in the F15 generation (Figure 2; Additional file 1) via a within-region, backward-elimination analysis. As mentioned above, these markers were located in 106 of the selective-sweeps regions detected in the genome in two previous studies [3, 4].
Heritability estimation
To estimate the heritability in the narrow sense for 56-day body-weight, we used a linear mixed model: where y is the phenotype of 56-day body-weight, β0 is an intercept term, β1 is the sex effect and w the associated indicator vector. Furthermore, the random polygenic effects g are normally distributed with correlation matrix given by the pedigree-kinship matrix A, i.e. , Z is the associated incidence matrix, and ε is a normally distributed residual error . The pedigree kinship matrix A was constructed using the R/pedigree package [32]. The linear mixed model was fitted using the R/hglm package [33] and the heritability estimated as .
Association analyses
Defining line-origin for the genotyped marker-alleles
The GoldenGate genotyping assay report F15 genotypes in an “ATCG” basis without information about the line-origin of the respective alleles. Therefore, we first transformed the genotypes into a “-1 0 1” basis by comparing the genotypes in the F15 to that of the HWS/LWS F0 founders. “-1” was used to represent the case when both alleles were of LWS line origin, “0” that the individual was heterozygous, and “1” that both alleles were of HWS line-origin. In this way, positive estimates of the additive effects in the association analysis indicate that the HWS derived allele increased weight, and negative additive effects that the LWS derived allele increased weight (i.e. that it is transgressive).
Selection of markers to represent the independently segregating selective-sweep regions
The dataset was not sufficiently large to separate the effects of the linked markers within the 99 selective-sweep regions. Therefore, the first step in the association analysis was to simplify the subsequent analyses by selecting one marker in each of these regions to represent their joint effects in the subsequent multi-locus analysis. To select these markers, we used a within-locus backward-elimination approach in a linear model framework. The additive genetic effects of all markers in the selective-sweep region to be evaluated were then included together with the fixed effects of the sex for the bird. The analysis was thus based on the following model: where y is the phenotype, β0 is an intercept term, β1 is the sex effect and w the associated indicator vector, β2 is the set of additive sweep effects modeled as fixed effects and X is the associated design matrix coded as -1, 0, 1 for the line origin of the marker genotypes, and ε is a normally distributed residual error. The number of markers within each region was limited and no problem of confounding between the fixed and random effects was detected.
Using backward-elimination from the full model, we then identified the individual marker within each region that had the most significant effect, without requiring a particular significance for it at this stage of the analysis. This analysis was performed using custom written scripts in R.
A multi-locus, backward-elimination analysis to identify adaptive selective-sweep regions
The objective of the confirmation-study was to identify the set of selective-sweep regions that jointly contribute to 56-day body-weight in the F15 generation of the AIL. The statistical analysis was chosen with the background knowledge that the genetic architecture of body-weight in this population was highly polygenic [3, 4, 30, 34] and that potentially as many as half of the genotyped selective-sweep regions contributed to weight [3]. As all individuals in the AIL were progeny of dams of the same age, hatched on the same date and reared separate from their parents, environmental contributions to between-family means in the F15 population may be considered minimal. Thus, a large portion of the difference in family means should be due to the joint effects of the many selective-sweep regions studied. When a large multi-locus mixed model was fitted to the data and fixed effects of markers across multiple selective-sweep regions were included together with a polygenic random effect in the model to account for family-effects, there was a strong confounding between the fixed and random effects. This confounding may be explained by an assumption of linear mixed models being violated. A linear mixed model y = Xβ + Zu + e assumes that there is no correlation between a column in X, or a linear combination of several columns, with the true random effect u. This might occur where the number of columns in X is large, and thus we deemed this analysis as unsuitable for use in the multi-locus analysis. However, as population-structure might be of concern [35, 36], we chose to validate our results using a bootstrap-based approach developed for the same purpose for general deep-intercross populations, including Advanced Intercross Lines, by Valdar et al. [19]. These bootstrap analyses were implemented in custom scripts in the statistical software R [37].
The bootstrap based approach was implemented in a backward-elimination model-selection framework across the genotyped selective-sweeps. Before doing the multi-locus analysis, we evaluated whether it was statistically appropriate to perform this analysis across all the 99 regions that segregated independently in the linkage analysis. For this, a standard measure to identify potential high-order collinearity, the “variance inflation factor” or VIF, was used. Consistent with the linkage analysis, there was no large pair-wise correlation between the 99 markers. However, some marker genotypes could (almost) be written as a linear combination of the genotypes at all the other markers which would lead to collinearity problems in a multi-locus statistical analysis. Therefore the markers for the affected selective-sweeps (21, 22, 80, 89, 90) were removed from the subsequent analyses.
The true size of the model (i.e. the number of contributing regions) is unknown in advance and could range substantially from nearly half of the tested regions to a few individual associations. To compare models with such a wide range of variables to include, we opted to use an adaptive model selection criterion controlling the False Discovery Rate [17, 18] developed for this purpose. The multi-locus models used during backward-elimination were implemented in a standard linear model framework, starting with a full model including the fixed effects of sex and the additive effects of the 99 selected markers from the independently segregating regions. Convergence was based on two alternative FDR levels: 5% and 20%. The analysis was performed both in the original data, and using bootstrapping with 1000 resamples. For the bootstrap analyses, the RMIP (Resample Model Inclusion Probability) was calculated for all regions in backward-elimination analyses with 5 and 20% FDR among the selected loci as termination criteria. A final model was decided for each FDR level by selecting the regions with RMIP > 0.46 as suggested for an AIL generation F18 [19]. The additive genetic effects for each locus was estimated using the multi-locus genetic model described below including regions selected at 5 and 20% FDR.
To produce Figure 3, the allele substitution effects were estimated for all markers selected at 20% FDR. This was done in a multi-locus linear-model framework, where the genotypes of the markers in other significant regions selected in the same analysis were included as co-factors. The residual variances explained by the selected regions at 5 and 20% FDR were computed as the proportion of the residual variance of the null-model including only the sex of the individual.
Predicting the fraction of the HWS40-LWS40 line-difference explained by the associated selective-sweep regions
Two alternative approaches were used to predict how much of the population difference of 1242 g between the parental HWS and LWS from generation 40 (Table 2) could be explained by the selected regions at 5% or 20% FDR. This was done by regressing the individual’s phenotype to the total number of alleles of HWS origin carried across the evaluated regions. Using this approach, the standard-error of the estimate for the average allele-substitution effect was lower than for the individual estimates of the loci. This analysis was performed across the regions selected at 5 and 20% FDR. The obtained estimates were then multiplied by the total number of allele-substitutions between the HWS and LWS lines for this set of loci to obtain an estimate of the contribution to the line-difference.
List of abbreviations
HWS: Virginia High Body-Weight Selected line; LWS Virginia Low Body-Weight Selected line; BW56: Chicken Body-Weight at 56 days of age; AIL: Advanced Intercross Line; GGA: Gallus Gallus Autosome
Authors’ contributions
ÖC and PBS initiated the study and designed the project with MEP; PBS developed the Virginia chicken lines; PBS designed, planned and bred the Virginia Advanced Intercross Line; PBS and CFH designed, planned, bred, bled, phenotyped, and extracted DNA from the F15 Virginia Advanced Intercross Line population; MEP was responsible for selecting polymorphisms in the selective-sweep regions, developing primers for these and sending samples for genotyping; ZS performed the quality control of the genotype data; ÖC designed the statistical analyses; ZS and ÖC contributed analysis scripts; ÖC and ZS performed the data analyses, summarized the results, and wrote the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank Leif Andersson for initiating the intercross experiment with PBS and sharing the QTL mapping data from the F2 intercross. Stefan Marklund and Andreas Lundberg are acknowledged for their valuable contributions during selection of polymorphisms in the selective-sweep regions for genotyping, development of primers for these, and for interacting with the genotyping centre. We also thank Lars RÖnnegård for useful input on the statistical analysis and the manuscript. Genotyping was performed by the SNP&SEQ Technology Platform in Uppsala, which is part of Science for Life Laboratory at Uppsala University and is supported as a national infrastructure by the Swedish Research Council (VR-RFI).
Footnotes
Author emails: ZS: Zheya.Sheng{at}slu.se, MEP: Mats.Pettersson{at}imbim.uu.se, CFH: chonaker{at}vt.edu, PBS: pbsiegel{at}vt.edu