Integrating genome-wide association mapping of additive and dominance genetic effects to improve genomic prediction accuracy in Eucalyptus

Genome-wide association studies (GWAS) is a powerful and widely used approach to decipher the genetic control of complex traits. A major challenge for dissecting quantitative traits in forest trees is statistical power. In this study, we use a population consisting of 1123 samples from two successive generations that have been phenotyped for growth and wood property traits and genotyped using the EuChip60K chip, yielding 37,832 informative SNPs. We use multi-locus GWAS models to assess both additive and dominance effects to identify markers associated with growth and wood property traits in the eucalypt hybrids. Additive and dominance association models identified 78 and 82 significant SNPs across all traits, respectively, which captured between 39 and 86% of the genomic-based heritability. We also used SNPs identified from the GWAS and SNPs using less stringent significance thresholds to evaluate predictive abilities in a genomic selection framework. Genomic selection models based on the top 1% SNPs captured a substantially greater proportion of the genetic variance of traits compared to when all SNPs were used for model training. The prediction ability of estimated breeding values was significantly improved for all traits using either the top 1% SNPs or SNPs identified using a relaxed p-value threshold (p<10-3). This study highlights the added value of also considering dominance effects for identifying genomic regions controlling growth traits in trees. Moreover, integrating GWAS results into genomic selection method provides enhanced power relative to discrete associations for identifying genomic variation potentially useful in tree breeding.


Summary 19
Genome-wide association studies (GWAS) is a powerful and widely used 20 approach to decipher the genetic control of complex traits. A major challenge for 21 dissecting quantitative traits in forest trees is statistical power. In this study, we use 22 a population consisting of 1123 samples from two successive generations that have 23 been phenotyped for growth and wood property traits and genotyped using the 24 EuChip60K chip, yielding 37,832 informative SNPs. We use multi-locus GWAS 25 models to assess both additive and dominance effects to identify markers 26 associated with growth and wood property traits in the eucalypt hybrids. Additive 27 and dominance association models identified 78 and 82 significant SNPs across all 28 traits, respectively, which captured between 39 and 86% of the genomic-based 29 heritability. We also used SNPs identified from the GWAS and SNPs using less and disease resistance using a regional heritability mapping method that helps 65 increase the genomic heritability to 5-15% from 4-6% when using SNPs GWAS studies can also provide tools for accelerating the long breeding cycles 68 in tree breeding (reviewed in (Neale and Kremer, 2011)). For example, although 69 many species of Eucalyptus display unusually fast growth, breeding cycles aimed 70 at developing elite commercial genotypes still take between 12 to 16 years to 71 complete, since identification of elite genotypes require progeny trials followed by 72 two or more sequential clonal trials (Rezende et al., 2014). However, genomic 73 selection based on genome-wide molecular makers is expected to reduce the time 74 required for completing a cycle of developing elite clones to only 9 years mainly 75 due to the shorter time needed for progeny tests when phenotypes can be predicted 76 from the genomic selection models .

77
The rapid development in genomics has opened up opportunities to identify 78 molecular markers that are associated with traits of interest and use these marker-79 trait associations to complement and extend traditional breeding programs. Despite 80 the efforts to discover polymorphisms associated with economically relevant traits, 81 much of the genetic contribution to complex traits in forest trees remains 82 unexplained. One of the main reasons is that GWAS methods normally conduct 83 tests on one marker at a time, for instance using a generalized linear model (GLM) 84 or a mixed linear model (MLM). When dealing with complex traits such as growth 85 and wood qualities, where the effect size of individual loci is likely small to 86 moderate, these methods suffer from limited statistical power to detect loci of small 87 effects (Muller et al., 2017). One potential approach to increase the power and to 88 accurately identify more causal variants is so called 'multi-locus mixed models' 89 (MLMM), which simultaneously test multiple markers by including them as 90 covariates in a stepwise MLM to partially remove confounding between tested 91 markers and kinship (Segura et al., 2012). One such method is the 'fixed and 92 random model circulating probability unification' (FarmCPU) that performs 93 marker tests using other associated markers as covariates in a fixed effect model 94 (Liu et al., 2016). Optimization across the associated covariate markers using a 95 random effect model is then performed separately. This approach has been reported 96 to simultaneously reduce computational complexity, remove confounding between 97 population structure, kinship and quantitative trait loci, prevent model over-fitting 98 and control the number of false positives (Liu et al., 2016).

99
Most GWAS analyses to date have been undertaken by implicitly assuming a 100 genetic architecture consisting of additive effects. However, non-additive effects, 101 including dominance (Bruce, 1910), over-dominance (Crow, 1948) and epistasis 102 (Hill, 1982) are known to also play important roles in controlling some traits. One 103 trait where non-additive effects are likely to be pronounced is heterosis, or hybrid 104 vigor, which is the near universally observed phenomenon of phenotypic 105 superiority of hybrid progeny relative to their parents (Charlesworth and Willis, 106 2009). Not surprisingly, heterosis has been and continues to be of great importance 107 in most plant breeding schemes (Duvick, 2001

123
Another genomic-based approach that has become widely used in plant and 124 animal breeding in recent years is genomic selection (GS) or alternatively known 125 as genomic prediction. Unlike GWAS, GS refers to marker-based selection where 126 total genetic variance is captured using genome-wide markers without a prior step 127 of identifying trait-associated markers. GS aims to predict the genetic potential (e.g. and by also considering non-additive effects . In this paper we 136 assess methods for improving genomic prediction accuracy by integrating results 137 from GWAS studies into GS to predict the genetic potential of breeding targets. It 138 is well known that using only associated SNPs identified from a GWAS is usually

159
All growth traits were moderately variable at the different assessment ages 160 (Table 1). We observed a lower phenotypic variation for height at 3 years of age, as 161 judged by the coefficient of variation ( Table 1). The F1 population underwent 162 selection based on height in order to identify trees to use for genotyping and this 163 selection process likely contributed to the lower phenotypic variation we see in 164 height at 3 years of age. We also observed low phenotypic variation for basic 165 density and pulp yield, which is commonly observed in many wood quality traits.

166
Generally, variation in CBH was greater than in height but both mean and variance 167 for both traits increased as the trees aged. Growth traits generally had low 168 heritabilities (h 2 < 0.2) whereas wood quality traits showed moderate heritabilities 169 (Table 1) (Table S2). Comparing significant SNPs 264 identified from the additive and dominance effects models, a total of 10 SNPs 265 overlap between two models for different traits. This result suggest that the two 266 genetic effects are not completely independent. Nine out of ten SNPs that overlap 267 between additive and dominance effects were identified for growth traits and with 268 the remaining SNP observed for pulp yield.    (Table S5) Eucalyptus hybrid breeding population and we collectively show that up to 10% of 381 the genomic-based heritability can be explained by associated SNPs that were 382 identified using a dominance model (Table S4).

384
Even if we capture a substantially larger number of associated SNPs by 385 considering both additive and dominance effects, a large fraction of the genomic-386 based hertiabilies (14%-62%) cannot be explained by only considering 387 significantly associated SNPs (Table S4)

399
In order to assess if the GWAS results could be used to enhance genomic 400 prediction in our breeding population, we also tried to identify possible 'candidate' 401 SNPs that were not detected as significant using the stringent significant threshold 402 we applied in our GWAS. The rational here is that, as outlined above, most GWAS 403 methods fail to detect loci of small effect but that the GWAS would nevertheless 404 serve as a useful 'filter' for ranking SNPs for their possible effects on the traits of 405 interest. We therefore selected two categories of SNPs using two different criteria 406 of relaxed significance and used these to estimate genomic heritabilities and 407 perform genomic prediction. The first category, 'putative' SNPs include all SNPs 408 that were found to be associated with the traits of interest based a more relaxed p-  (Table S4). When we performed genomic prediction using 421 these two categories of SNPs we also observe a substantial increase in the 422 prediction ability for all traits compared to predictions based on all available SNPs.

423
This suggests that using all available SNPs introduce noise in the prediction models 424 that negatively affects our prediction ability. Our method for analysing genomic 425 selection and increasing prediction accuracy clearly benefited from integrating 426 results from the GWAS analyses, but the number of associated SNPs that needs to 427 be incorporated depends on the study trait in questions.  (Table S6) (Hamamoto et al., 2015) and provides a key mechanism for protecting leaves from 496 Na+ over-accumulation and salt stress (Berthomieu et al., 2003;Maser et al., 2002

547
The phenotypic and genotypic data utilized in this study has been previously Weinberg equilibrium (>1e-7). Any missing data remaining in the 37,832 SNPs 558 were subsequently imputed using BEAGLE 4.1 (Browning and Browning, 2007

638
Two separate GBLUP models were evaluated that included i) either only 639 additive (A) or ii) both additive and dominance (AD) genetic effects using the four 640 SNP categories described above to create marker-based relationship matrices. The

641
A and AD models have been well described earlier in Tan et. al. (Tan et al., 2018).