Abstract
Genetic dissection of highly polygenic traits is a challenge, in part due to the power necessary to confidently identify loci with minor effects. Experimental crosses are valuable resources for mapping such traits. Traditionally, genome-wide analyses of experimental crosses have targeted major loci using data from a single generation, often the F2, with additional, later generation individuals being generated for replication and fine-mapping. Here, we aim to confidently identify minor-effect loci contributing to the highly polygenic basis of the long-term, divergent bi-directional selection responses for 56-day body weight in the Virginia chicken lines. To achieve this, a powerful strategy was developed to make use of data from all generations (F2-F18) of an advanced intercross line, developed by crossing the low and high selected lines after 40 generations of selection. A cost-efficient low-coverage sequencing based approach was used to obtain high-confidence genotypes in 1Mb bins across 99.3% of the chicken genome for >3,300 intercross individuals. In total, 12 genome-wide significant and 10 additional suggestive QTL for 56-day body weight were mapped, with only two of these QTL reaching genome-wide, and one suggestive, significance in analyses of the F2 generation. Five of the significant, and four of the suggestive, QTL were among the 20 loci reaching a 20% FDR-threshold in previous analyses of data from generation F15. The novel, minor-effect QTL mapped here were generally mapped due to an overall increase in power by integrating data across generations, with minor contributions from increased genome-coverage and improved marker information content. Significant and suggestive QTL now explain >60% of the difference between the parental lines, three times more than the previously reported significant QTL. Making integrated use of all available samples from multiple generations in experimental crosses is now economically feasible using the low-cost, sequencing-based genotyping strategies outlined here. Our empirical results illustrate the value of this strategy for mapping novel minor-effect loci contributing to complex traits to provide a more confident, comprehensive view of the individual loci that form the genetic basis of the highly polygenic, long-term selection responses for 56-day body weight in the Virginia chicken lines.
Introduction
Quantitative traits remain difficult to analyse and break down into their component loci (Flint and Mott 2001). Effect sizes of individual loci often explain a very small fraction of the phenotypic variance (Boyle, Li, and Pritchard 2017 and references within) - often much smaller than environmental effects - and are regularly dependent on the genetic background (Pettersson et al. 2011; Mackay 2014; Forsberg et al. 2017; Zan and Carlborg 2020). Experimental populations are valuable resources for studying quantitative traits, and by reducing confounding factors such as environmental noise, they have provided a clearer view on the genetic architecture of a wide range of complex traits (Flint and Mott 2001; Andersson 2001; Andersson and Georges 2004). Examples include shank-length in mice (Castro et al. 2019), longevity in Drosophila melanogaster (Curtsinger and Khazaeli 2002), and oil content in corn (Hopkins 1899; Dudley 2007).
Although QTL (Quantitative trait loci) studies in experimental crosses have high power, resolution is limited due to the extensive LD (Linkage disequilibrium) introduced by the crossing design. Historically, sparse or very sparse marker maps have therefore been used, resulting in regions of the genome with less coverage and, in some cases, missing data on small chromosomes and/or chromosomal ends (Mackay 2001). Similarly, it is not uncommon to have regions lacking markers informative for line origin when using experimental crosses between outbred founders (Andersson 2001). To increase resolution and facilitate fine-mapping of detected QTL, follow-up studies in additional crosses have been performed. Generally, these have excluded regions outside of previously observed QTL, therefore leaving much of the genome without further study. As a result, these studies are underpowered or lack the resolution to make inferences on the genetic architecture of the studied traits beyond a few large-effect loci (Flint and Mott 2001).
More complete dissection of highly polygenic complex traits requires large and powerful studies. In natural populations such as humans, hundreds of thousands of individuals have been used to study highly polygenic model traits such as height (Lango Allen et al. 2010; Yang et al. 2010; Wood et al. 2014). In experimental populations, smaller populations are required to detect even minor-effect loci due to the higher power achieved from, for example, segregation of alleles at intermediate frequencies and greater control over environmental influences. New genotyping and imputation approaches based on low-coverage whole genome sequencing (WGS; Altshuler et al. 2000; Andolfatto et al. 2011; Zhang et al. 2015; Pértille et al. 2016; Whalen et al. 2018; Zan et al. 2019), provide opportunities to reanalyse existing individuals generated for different purposes to perform integrated analyses, thus enabling greater insights into the contribution of loci with minor effects on the genetic basis of complex traits in existing experimental populations. The increased genome-wide marker coverage provided by WGS-based genotyping technologies also provides an extended coverage of regions outside of the current consensus linkage maps in species such as the chicken, where, for example, the major focus has been on the large chromosomes, leaving the microchromosomes largely unexplored (Groenen et al. 2000; 2009).
The Virginia body weight lines of chickens were developed by long-term, bi-directional selection for a single trait – body weight at 56 days of age – resulting in a nine-fold difference between the low (LWS) and high (HWS) lines after 40 generations of selection (Dunnington and Siegel 1996; Dunnington et al. 2013; Márquez, Siegel, and Lewis 2010). Genome-wide comparisons showed that the footprint of selection between the LWS and HWS cover hundreds of loci across the genome (Johansson et al. 2010; Lillie et al. 2018; Lillie et al. 2019). Efforts to identify which of these loci contribute to the observed responses include a series of experiments utilizing an intercross developed from individuals of generation S41 (nHWS=29, nLWS=30). Efforts include genome-wide mapping (Jacobsson et al. 2005; Carlborg et al. 2006; Wahlberg et al. 2009), as well as replication and fine-mapping studies (M. Pettersson et al. 2011; Besnier et al. 2011; Sheng et al. 2015; M. E. Pettersson et al. 2013; Brandt et al. 2017; Zan et al. 2017) on different generations in this population. Although these studies agree that the long-term responses are primarily from selection on a highly polygenic genetic architecture where most loci have small effects, the statistical support for individual loci is low. This study aims to overcome this deficiency of statistical power to facilitate the mapping of contributing minor-effect loci with confidence by re-genotyping and performing an integrated analysis of >3,300 individuals from generations F2-F18 of the Virginia lines intercross, identifying and mapping new QTL, confirming earlier reported loci, and explaining more of the selection responses with individually significant loci than previously reported, thus illustrating the value of utilizing new and affordable WGS-based genotyping strategies.
Methods
Deep-Stripes: a pipeline for founder-line genotype estimation in deep intercross populations
Stripes (Zan et al. 2019) is a pipeline for founder-line genotype estimation using low-coverage sequencing data, extending TIGER (Rowan et al. 2015) for use in outbred intercross populations. In deep intercross populations from outbred founders, such as the advanced intercross line (AIL) studied here, there is a generally lower and more variable density of founder-line informative markers. Here, we have further extended the Stripes pipeline to deep intercross populations, including updates to enhance stability, and improve genotype calling quality in later generations.
Deep-Stripes updates are implementations of (a) reverting to hardcoded genotype emission thresholds in cases where the original nonlinear minimisation procedure for determining these (Rowan et al. 2015) failed due to uniform ancestry across an entire chromosome, (b) a modified nonlinear minimisation procedure improves convergence as well as defaulting to hardcoded parameters after 20 unsuccessful tries to determine the genotype emission thresholds, (c) a modified logic for comparing highly similar beta distributions to make results stable across computing platforms, and (d) automation of multiple rounds of genotype estimation for each individual (forward and reverse on each chromosome with an arbitrary number of window sizes - here 50 and 200 markers).
Genotype quality control and filtering
Deep-Stripes implemented genotype estimation in both directions on the chromosome. This facilitated detection of incoherently called genotypes in low-information areas due to a delay of inferred crossovers to the end of such regions. Genotype estimation with two window-sizes was used to reduce the number of false positive crossovers in marker-dense areas resulting from the flat, per-window genotype estimation error rate.
Final genotype estimation was done by transforming the output from each of the four runs described above to a genotype matrix. These contained estimated founder line genotypes for each individual in even-sized (1 Mb) bins across the genome. These four matrices were processed as follows: In bins where no recombination event was inferred, the genotypes were coded as numerical values (1, 0, −1) corresponding to the homozygote for founder-line 1, heterozygote and homozygote for founder-line 2, respectively. Recombination breakpoints were estimated with bp resolution, but if one or more recombination events were detected, resulting in multiple genotypes being called in a 1 Mb window, the genotype was scored on a continuous scale from 1 to −1 by averaging the founder genotypes scores across the base pairs in the segment. Second, the genotype matrices from the forward and reverse runs were filtered by considering bins where estimates differed by more than one recombination event as uninformative and setting them to missing. This procedure was performed separately for the two window-sizes (50 and 200 markers). Third, the forward- and reverse-filtered genotype matrices obtained using 50 and 200 marker window-sizes were combined. This was done by using the genotype derived using 50/200 marker windows, bins with ambiguous genotypes were set to missing. They were defined as bins with genotype scores in the ranges [0.8,0.2] and [−0.8, −0.2]. Finally, all bins genotyped in less than 100 individuals were set to missing.
Genotype estimation in the Virginia lines AIL using deep-Stripes
The deep-Stripes pipeline described above was used to call founder line origin genotypes in 1 Mb bins across the genome of the F2-F18 generations of the AIL. For this, high-coverage sequence data from the outbred founders of the population and low-coverage sequence data from the intercross individuals generated as described below was used.
Founder-line sequencing and variant calling
All 59 high (HWS) and low (LWS) founders of the AIL (nHWS = 29 and nLWS = 30) were whole-genome re-sequenced to ~30X coverage (Guo et al. 2019). The obtained reads were then mapped to the newest reference genome (GalGal6a, Genome Reference Consortium 2018) using BWA (version 0.7.17 (Li 2013)). Variants were called and filtered using GATK (McKenna et al. 2010) according to best-practices recommendations (DePristo et al. 2011; Auwera et al. 2013) modified to accommodate for non-model organisms. The code and parameters used for this analysis are provided in the Supplementary File 1 and the associated Github repository (github.com/CarlborgGenomics/AIL-scan).
Low-coverage sequencing of advanced intercross line individuals
All chickens from generations F2-F18 of the AIL (nF2-F18 = 3,327, Table S2) were sequenced to ~0.4X coverage (Zan et al. 2019). The obtained reads were mapped to the GalGal6 reference genome using BWA (version 0.7.17, Li 2013). Variants were next called using a pipeline implemented using bcftools (1.9,Li 2011), samtools (1.9,Li et al. 2009), biopython (1.70, Cock et al. 2009), and cyvcf2 (0.10.0, Pedersen and Quinlan 2017). The code and parameters used are provided in Supplementary File 2 and the project github repository. Obtained variants were merged for each generation and filtered to only include polymorphisms present in the filtered set of founder genotypes described in the section above. Next, for each individual, only variants informative about the founder line origin (HWS/LWS) were kept and formatted for use as input to the Stripes genotyping pipeline (Zan et al. 2019).
QTL mapping
QTL mapping was performed for body weight at 8 weeks of age. To correct for generational effects, 8-week body weights were standardized within each generation. This was done by subtracting the generation mean from each observation and then dividing it by the within-generation standard deviation. The genome scan was performed using the ‘scanone’ function from the package ‘qtl’ (Broman et al. 2003) in R (R Core Team 2013) using Haley-Knott regression (Knott and Haley 1992) with sex as a covariate. Significance threshold was obtained using permutations (n=10,000), resulting in a 5% genome-wide significance of LOD = 4.01. For suggestive significance, the 5% chromosome-wide significance on Chromosome 4 was taken to be consistent with previous studies (LOD = 2.86; Jacobsson et al. 2005; Wahlberg et al. 2009).
Selection of an extended marker set using FDR approach
In order to obtain a larger, more lenient set of markers, LOD scores were transformed into p-values (Peirce et al. 2006). They were then evaluated against significance thresholds adjusted for multiple testing using the Benjamini Hochberg procedure with a false discovery rate of 10%, as implemented in statsmodels (Benjamini and Hochberg 1995; Seabold and Perktold 2010).
Estimation of the genetic effect and residual variance explained by the mapped QTL
To estimate the residual variance explained by the QTL, we corrected for the sex effect using a linear model and fit either i) all significant or ii) all suggestive and significant QTL using the fitqtl function in r/qtl on the residuals. Estimates for the residual variance explained by each QTL were obtained by fitting all significant and suggestive QTL jointly and using the SSv3 drop-one-term anova in the fitqtl function. Estimates for the effect on body weight in grams for each QTL were determined by fitting each QTL individually with sex as a covariate. Due to the standardisation of the phenotypes, the estimates were multiplied with the population-standard deviation to obtain an estimate in grams. The sum of these estimates was then expressed as a fraction of the between-line difference.
Estimating the effect of increased genotyping information content on statistical power in QTL analyses
The information content (IC) was calculated across the genome in the set of F2 individuals that were common to this study and that of Wahlberg et al. (2009). The measure used was defined as (Knott et al. 1998):
This was calculated at individual marker locations, as well as every cM, across the genome using the a and d indicator regression variables from Wahlberg et al. (2009). For the dataset from this study, the a and d indicator variables were calculated from the genotype estimates in each 1Mb bin as (Knott and Haley 1992):
The information content was compared at the physical locations (Mb, Genome Reference Consortium 2018) of the genotyped markers in Wahlberg et al. (2009) and across all tested locations in the two studies (every cM/Mb; Wahlberg et al. 2009/this study).
Results
Increased coverage in the genome-wide scan via genotyping by low-coverage sequencing
After sequencing and variant calling, genotypes were estimated for nF2-F18 = 3,327 AIL individuals that passed quality control. The average density of informative markers across the genome was 102 markers/Mb, though since the founder lines are outbred and the individual AIL offspring descend from different founders, the number of informative SNPs varies among individuals and decreases over generations, as more ancestors contributed to the genotype of each individual. Founder-line genotypes were obtained for all of the 1,058 1Mb bins defined on the 33 largest chromosomes, identifying an average of 74 recombination breakpoints per Mb across all individuals, before filtering. Compared to the previous genome-wide scan performed in the F2 population by Wahlberg et al. (2009), an additional 100 Mb were covered (+27%) with markers, encompassing two small chromosomes (Chr31, Chr33), several previously uncovered scaffolds/unplaced segments and chromosome ends (Figure 1). Further, the information content at the tested locations across the genome also improved from an average of 0.77 (Wahlberg et al. 2009) to an average of 0.90 (Figure 3, panel F).
More individuals and increased genome-coverage facilitate detection of new QTL
Comparisons to earlier mapped and fine-mapped QTL in the Virginia lines AIL
As illustrated in Figure 2, the 2+1 genome-wide significant and suggestive QTL in the most recent genome-scan of the F2 population (Table 1; Wahlberg et al. 2009) were detected. In addition, nine additional QTL for this trait were mapped with genome-wide significance (Figure 2; Table 1). Using the FDR-adjusted threshold for significance, a total of 42, 33, 21 QTL were identified at false discovery rate of 10%, 5% and 1%, respectively (Fig. S3, Tab. S3). This resulted in 20 QTL in addition to the QTL reaching genome or chromosome wide significance using the permutation approach.
Overlap with suggestive regions
Previously, fine-mapping had been performed in significant and suggestive QTL-regions and identified selective sweep regions (Besnier et al. 2011; Sheng et al. 2015; Zan et al. 2017). In contrast to the original F2 genome-scan (Wahlberg et al. 2009), two of the fine-mapped loci (located on Chromosome 10, Mb 9 and 23, Mb 6) located outside of the 2+1 originally significant and suggestive QTL reached genome-wide significance when analysed across the entire AIL (Figure 2, Table 1), with four more finemapping loci overlapping QTL that reach suggestive significance in this study (4/70, 2/113, 3/35, 10/11 Chr/Mb). When comparing the extended set of 42 QTL to the markers from Sheng et al. (2015) which tag a set of 99 regions under selection that were identified by Johansson et al. (2010), 41 of the regions overlapped the extended list of QTL with at least one tagging marker.
More individuals and increased marker density dissects a QTL on Chromosome 4 into multiple, independent associated regions
Previously, two regions on Chromosome 4 were implicated for body weight, with one of them, Growth6, reaching genome wide significance for 56-day body weight in the latest genome scan (Wahlberg et al. 2009). The population used here provided sufficient power to replicate the previously reported QTL and confirm the association of another region, Growth7, that was previously reported as associated with body weight and growth traits (Figure 3, panel B, compares F2 individuals from Wahlberg et al. (dashed lines) to the AIL population here (solid lines)).
In addition, the approach used here provided sufficient resolution to partition the latter QTL into three independent peaks (4/23, 4/36, 4/70 Chr/Mb, Figure 3, panel B, compare full AIL populations (solid lines) with deep-Stripes markers (empty squares, green) to the marker panel used in Wahlberg et al. (full circles, pink) to see how the resolution helped separate 4/23 and 4/36).
Large contributions by mapped QTL to the selection response
The AIL was produced by intercrossing chickens from HWS and LWS after 40 generations of selection, and the founders for the intercross differed more than eight-fold (1,341g) in 56-day weights (Table 1). Estimated from the AIL F2 - F18, the 12 QTL reaching genome wide significance together explain 501.4g (37.4%) of the difference between the parental lines (Table 1), improving upon the 159g (12%) explained by the two significant QTL in Wahlberg et al. (2009). When also considering suggestive QTL (Table S1), the 25 QTL detected here explain 729g (54.4%) of the parental difference compared to the 227.4g (17%) for the 2+1 significant and suggestive QTL in Wahlberg et al. (2009). Together, the significant (suggestive) QTL mapped here/by Wahlberg et al. (2009) explained 8.3/5.2% (11.1/7.1%) of the residual phenotypic variance in the AIL-F2-F18 population (Table 1), whereas the 42 peak markers derived from the FDR-approach explain 14.6% of the residual phenotypic variance, or 1,130g, representing 84.3% of the difference between the founding lines.
Concordance with previous estimates of effect size
The difference in effect size estimates for Growth1 on Chromosome 1 between Wahlberg et al. (2009) and the approach used here were within the range of the standard error (34.2 ± 9.2g and 31.6 ± 4.1g, respectively). The effect size estimated here for Growth6 (20.7 ± 4.4g) on Chromosome 4 was lower than the previous estimate (36.3 ± 8.3g). However, the current approach found a total of 3+1 significant and suggestive QTL on Chromosome 4, of which two significant and one suggestive QTL were within a QTL region (Growth7) reported in the first analysis of the F2 generation (Jacobsson et al. 2005), but that later showed no significant association with 56-day body weight in the extended analysis of the same individuals with a more comprehensive marker set by Wahlberg et al. (2009). Taken together, these QTL explained a total of 81.2g. In contrast, Growth9 on Chromosome 7 had a reduced effect size estimate (25.6 ± 4.0g), compared to the 43.2 ± 9g estimated by Wahlberg et al. (2009).
Additional power in the analyses facilitates detection of new QTL
A novel QTL on chromosome 1 revealed by combining data across generations
A genome-wide significant QTL was mapped to 27 Mb on Chromosome 1 (Figure 3a; Table 1). This region was covered by multiple SNP and microsatellite markers in the earlier genome-scans (Wahlberg et al. 2009; Figure 1). However, as these markers segregated in the founder lines, the estimation of founder-line QTL genotypes was less precise than the current approach utilizing low-coverage sequencing data (Figure 3c). To evaluate the potential contribution by either improved marker density or increased number of individuals to an increase in statistical power, the genome scan was performed on different subsets of the data. The first subset was F2 individuals from Wahlberg et al. (2009), and second, the subset of bins containing SNP and microsatellite markers from the same study. Thirdly, all combinations with the individuals and markers from this study were used. The results indicated (Figure 3) that the discovery of this QTL was driven by the integration of the AIL and not an increase in marker density.
New QTL revealed in regions with poor marker coverage in the genome
The low coverage sequencing approach implemented here improved the genome-wide marker coverage in this population. In particular, the coverage was improved by reaching further out the ends of chromosomes on almost all chromosomes, but specifically 5, 6, and 8 (Figure 1). As a result, a QTL for 56-day body weight was revealed on Chromosome 8 where an additional 7 Mb (0-7 Mb) was covered on its distal end (Figure 4b). The QTL peak is located on the end of the chromosome and does not extend into the part of the chromosome covered in the earlier studies (Wahlberg et al. 2009). Its effect is relatively small, and hence explains a modest amount of the variance in 56-day body weight (Table S1). Its peak location was further located approximately 20 cM outside of the most distal marker in the linkage map used by Wahlberg et al. (2009). Figure 4 illustrates how the combination of better regional coverage and gain in power from merging individuals from multiple generations in the AIL resulted in its detection.
Discussion
QTL mapping in experimental intercrosses is a valuable strategy for the detection of loci contributing to differences in complex traits between founder populations. Various F2 populations have been developed and analysed (e.g. Wright et al. 2006; Kukekova et al. 2011; Solberg Woods 2013; Ying Guo et al. 2016), and in some cases deep intercross populations were bred to increase resolution of the mapped regions. In crosses between segregating, outbred populations, power is often limited by a shortage of between-population informative markers. As a result, low information regions were often relatively poorly explored. Examples include chromosome ends, microchromosomes, and even lowly differentiated intrachromosomal regions. Unfortunately, few populations have been re-genotyped with high density markers due to the associated high costs. Here, we have implemented and evaluated a cost- and time-efficient genotyping strategy utilizing low-coverage sequencing to increase the genome-wide coverage in QTL scans. A software implementation is provided for genotype estimation in F2 and deep intercrosses between outbred founder populations. A large advanced intercross chicken line was analysed to empirically illustrate the value of increasing genotype density and quality, as well as making use of available samples across generations, to improve the power in QTL mapping.
By doubling the number of genome-wide significant QTL identified, three times as much of the founder-line difference for the selected trait is now explained by confidently identified loci
In this study, the intercross population analysed was bred from the Virginia body weight lines. These two pedigreed populations were divergently selected long-term for a single trait, 56-day body weight, for 41 generations before the intercross was formed. Earlier whole-genome analyses of this intercross were challenged to detect genome-wide significant QTL. Only one QTL affecting the selected trait reached genome wide significance in the first analyses that covered ~80% of the genome with 145 markers (Jacobsson et al. 2005). In a subsequent study, utilizing a denser genetic map covering ~93% of the genome with 434 markers (Wahlberg et al. 2009), two genome-wide significant QTL associated with the same trait were found. Fine-mapping analyses in later generations of the intercross focusing first on significant and suggestive QTL regions (Besnier et al. 2011; Pettersson et al. 2011), and later also on selective-sweep regions detected in comparisons between the divergent founder lines (Johansson et al. 2010; Sheng et al. 2015; Pettersson et al. 2013; Zan et al. 2017), all suggested that the genetic architecture of 56-day body weight in this population was highly polygenic and that individual loci contributed small marginal effects.
The increased power in our study obtained by increasing quality and coverage of markers, as well as increasing population-size by integrating across generational data, facilitated detection of nine additional QTL associated with 56-day body weight at a genome wide significance leve, which had not been identified as significant or suggestive QTL by Wahlberg et al. (2009). The residual phenotypic variance explained by the mapped QTL increased from 5.3% by the two genome-wide significant QTL mapped by Wahlberg et al (2009) to 8.3% by the 12 genome-wide significant QTL detected here. The effect sizes of the QTL estimated in the population analysed here were often lower than earlier estimates, with the exception of Growth1. The reduced effect size for Growth9 was not surprising, given that Chromosome 7, and Growth9 in particular, were previously found to be involved in many epistatic interactions (M. Pettersson et al. 2011) and is likely a complex region. As such, the effect size estimate in the AIL was likely affected by fluctuations in allele-frequency not present in the F2 populations. For Growth6, while the estimate is lower than that in Wahlberg et al. (2009), when taking into account all significant and suggestive QTL found on Chromosome 4, the sum of the effect sizes was much larger and almost identical (81.2g compared to 79.1g) to the estimate for all regions on Chromosome 4, namely Growth6 and Growth7, made by Jacobsson et al. (2005)., This similarity suggests that our study captures and confirms the same effect on body weight where only suggestive or circumstantial evidence from related phenotypes was previously available. Additionally, this study enabled mapping it more precisely onto multiple independent regions and confirms the existence of multiple regions previously associated with 56-day body weight in the wake of selection-scans (Zan et al. 2017). In total, 3 of the selection markers outside of previously identified QTL regions overlap with the significant or suggestive QTL in this study. Additionally, out of eleven total markers significant in Zan et al. (2017), all but two were correlated with elevated LOD scores. This is consistent with the 20% FDR threshold employed by Zan et al. (2017). Evaluating these regions under a more lenient threshold corroborates this, as most, if not all, of these elevated regions retain significance when accounting for multiple testing leniently. Further, the considerable overlap between these QTL and the previously identified sweep regions from Johansson et al. (2005), not only supports the thesis that these are real, marginal QTL, but also lends credence to previous studies.
Overall, the effects of the significant QTL contributed 37.4% of the founder-line difference (Table S2), which is 3.1 times the founder-line difference explained by the significant QTL in Wahlberg et al. (2009). While power was sufficient to triple the founder-line difference explained, for significant and suggestive QTL taken together, these QTL only explain 54.4% of the founder-line difference, indicating that it likely is still too low to capture the full genetic architecture of the trait studied. Relaxing the association threshold further, the QTL peak markers significant at an 10% FDR-threshold explain more than 84% of the difference between the founder lines. While this may suggest that we are getting closer to identifying the majority of loci contributing to the weight difference, given the large fraction of the genome covered by these markers, the approach used here to estimate the effect size of individual loci likely leads to a slight overestimation from residual linkage between QTL and tagging of linked loci.
Similarly, 30.3% of the bins on the covered autosomal chromosomes are statistically associated with the phenotype after lenient adjustment for multiple testing. This means that any overlap between the extended QTL and the selective sweep regions has to be interpreted with caution. Still, this is equally a testament of the polygenicity of the investigated trait, and it is reasonable to believe that these regions were undergoing true and detectable selection, given the strong single trait artificial selection regime in the selected lines sustained over 40 generations (Siegel 1962).
Added value by an increased genome-wide marker coverage
Genotyping by sequencing has provided opportunities to both increase marker density and decrease the cost for genotyping. Strategies based on low-coverage sequencing approaches open cost-efficient opportunities to re-analyse existing experimental populations. Here, we use one such approach targeted to deep outbred intercrosses. This approach increased the overall marker coverage of the chicken genome from 93% in the latest study by Wahlberg et al. (2009), as estimated as coverage of the autosomal genetic map, to 1,058 Mb (99.3% of 1,065 Mb, Genome Reference Consortium 2018) with a mean density of 102 line-informative markers per Mb.
Outbred founder lines are beneficial for the density of line-informative markers, because still segregating markers are line-informative due to specific ancestor combinations. However, as the number of these markers shrinks with the increase in unique ancestors contributing to each individual in later generations, the depth of generations one can investigate with this can be limited by the number of founding individuals as well as the size of each generation. Thus, it is important that the founder lines are sufficiently divergent to provide a minimum resolution via fixed markers. However, because the non-physical window size of the imputation process does not sacrifice higher resolution in divergent regions, it is likely that any cross between lines with a significant genetic component to their phenotypic divergence will have sufficient coverage with informative markers in regions of interest. For the AIL used in this study, it is likely that the resolution is not limited by the marker coverage, but rather by both the 1Mb binning approach and the 50/200 marker sliding window imputation of founder genotypes. This is because there were enough accumulated recombinations in the later generations that were lost through these approaches. While these were chosen for robustness and provided an appropriate resolution/power trade-off across all individuals, modifying these parameters could provide added opportunities for fine mapping.
The increased marker density also increased the information content throughout the genome, further increasing power in the QTL analyses. In particular, coverage was extended at the ends of the larger chromosomes. This led to locating a novel suggestive QTL on Chromosome 8 that was primarily due to increase in coverage, though the increase in power helped elevate and define the peak of the QTL. It was seen in the full marker set using only the F2 population, but without a definitive peak. This is likely due to a lack of recombination events distal to the peak in the F2 population, though the LOD scores for the outermost 3 Mb were above the 5% permutation threshold for chromosome-wide significance.
In addition, two additional small linkage groups were also covered. The minor regions of the genome that remain are present in scaffolds containing too few markers for reliable genotype estimation using the Stripes pipeline. Additional work beyond this study is needed to estimate the genotype in these regions using alternative genotyping or bioinformatics approaches before they can be included in QTL mapping studies.
Integrating across-population data
Integrating data across an AIL population provides value to the QTL scan, by helping to map new QTL and refine the resolution and explanatory power of existing ones. This is particularly so if intermediate generations already exist due to previous attempts in fine mapping. At less than 1 EUR/sample (Zan et al. 2019), the approach demonstrated here provides a cost-effective approach to enhancing the statistical power to dissect complex traits from potentially any experimental population or selection experiment.
For AIL populations with smaller intermediate generations and multiple siblings or half sibs, correcting generation with a mixed or fixed effect model will likely result in overcorrection due to the correlation between generations and kinship. Standardisation using z-scores provides an acceptable trade-off between overcorrecting and confidence in accounting for generational batch effects.
In conclusion, this study represents the most comprehensive study of the individual loci forming the genetic basis of the highly polygenic, long-term selection responses on 56-day body weight in the Virginia chicken lines to date. It contributes not only to our current understanding of the genetic basis of body weight in chickens, but also provides a solid methodological foundation to further investigate the genetic architecture of complex traits in populations with similar design.
Acknowledgements
We thank Lars Rönnegård for the helpful comments and discussions regarding the stability of the minimisation algorithm. The computations and data handling were enabled by resources in projects SNIC 2017/7-53, SNIC 2018-3-170 and SNIC 2020-5-14 provided by the Swedish National Infrastructure for Computing (SNIC) at UPPMAX, partially funded by the Swedish Research Council through grant agreement no. 2018-05973. The work was supported by the Swedish Research Council (grants 349-2005-8628, 621-2012-4634, 2017-3726 and 2018-5991) and FORMAS (grants 2013-450 and 2017-415).