Incorporating genome-wide association study results into genomic prediction enhances the 1 predictive ability of growth and phenology traits in Norway spruce 2

: 13 Genome-wide association (GWA) has been used to detect quantitative trait loci (QTLs) in plant and animal species. 14 Genomic prediction is to predict an accumulative effect of all QTL effects by capturing the linkage disequilibrium 15 between markers and QTLs. Thus, marker preselection is considered a promising method to capture Mendelian 16 segregation effects, especially for an oligogenic trait. Using QTLs detected in the GWA study could improve genomic 17 prediction (GP), including informative marker selection and adding a QTL with the largest effect size as a fixed effect. 18 In this study, we performed GWA and GP studies in a population with 904 clones from 32 full-sib families planted in 19 four trials using a newly developed 50k SNP Norway spruce array. In total, GWAS identified 41 SNPs associated 20 with the budburst stage (BB) and the SNP with the largest effect size explained 5.1% of the phenotypic variation 21 (PVE). For other traits like height at tree ages six and 12, diameter at breast height, frost damage and Pilodyn 22 penetration, only 2 – 13 SNP associations were detected and the PVE of the strongest effects ranged from 1.2% to 23 2.0%. GP with approximately 100 preselected SNPs based on the smallest p -values from GWAS showed the largest 24 predictive ability (PA) for the oligogenic trait BB. But for the other polygenic traits, approximate 2000-4000 25 preselected SNPs, indicated by the smallest Akaike information criterion (AIC) to offer the best model fit, still resulted 26 in PA being similar to that of GP models using all markers. Analyses on both real-life and simulated data also showed 27 that the inclusion of a large QTL SNP in the model as a fixed effect could improve the PA and accuracy of GP provided 28 that the PVE of the QTL was ≥2.5%.


30
Genomic prediction (GP) using genome-wide dense markers has been widely adopted in animal breeding and 31 extensively studied in crops (Hickey et al, 2017) and tree plant species (Grattapaglia, 2022)   48 Thumma et al, 2022). It has also been reported that marker preselection could slightly to moderately improve genomic 49 prediction accuracy in tree species (Cappa et al, 2022;Resende et al, 2012;Tan and Ingvarsson, 2022). In theory, GP 50 would like to capture all QTL effects by LD between markers and causative loci. Provided that one or two of several 51 strongly linked markers for each QTL is selected, the model should almost capture the QTL effects. Meanwhile, if 52 one can select the tightly linked markers or mutants themselves through GWAS for GP, it is possible to reduce the 53 cost of genotyping while maintain or improve the accuracy of GP. Recently, following the development of high 54 throughput genotyping techniques and tools, more than several hundred thousand markers have been commonly and 55 easily produced by several genotyping platforms, such as SNP array, exome capture, and genotyping-by-sequencing 56 (GBS). Thus, marker preselection could become a very useful and common pre-step for GP. four sites in this breeding population were presented in the published paper . Generally, the field 88 site series were established using a randomized incomplete block design with single-tree plots. Meanwhile, 98% of 89 clones were replicated among the four sites. In this study, we selected all available clones from ten families (ca. 50 90 clones per family) and 20 clones from each of the remaining 22 families (a total of 904 clones) from a single site 91 S1389 (Rössjöholm) for genotyping (Table 1).

93
Tree height was measured at field age six (HT6) at all four sites and at twelve years (HT12) at one of the sites. Diameter 94 at breast height (DBH) was measured at field age twelve at all four sites. Budburst stage (BB) was scored at field age 95 six at three sites based on eight categories (Krutzsch, 1975). Two further traits were measured only in a single site.

96
Pilodyn penetration (PILO) as a proxy of wood density was measured by Pilodyn 6J Forest (PROCEQ, Zurich, 97 Switzerland) at field age ten. Frost damage (FD) was quantified after the site was exposed to a severe frost event at 98 field age six. FD was scored as a categorical variable from zero (without frost damage) to three (the most severe 99 damage). The detailed descriptions of traits measured in each of the four sites are shown in Table 1.

103
Assay Kit (Qiagen, Oregon, USA). Genotypic data were generated using the Norway spruce Piab50K SNP array chip 104 (Bernhardsson et al, 2021). Genotype calling of the 50K Axiom array was performed using the Axiom analysis suite Ga and the number of families also increased from the original 32 to 56 (Table S1). Finally, the number of clones per 116 family after correction varied from 1 to 56.

Spatial analysis
118 Spatial analysis based on a two-dimensional separable autoregressive (AR1) model was used to fit the row and column 119 directions for phenotypic data from each site using ASReml v4.1 (Gilmour et al, 2015). Block effects were estimated 120 simultaneously and all significant block and spatial effects were removed from the raw data. The spatially adjusted 121 phenotypic data were used for downstream analysis.

182
Similarly, the dominance variance to the total phenotypic variance ratio was calculated as 2 = 2 2 ⁄ , the first-order 183 epistatic variance to the total phenotypic variance ratio was calculated as 2 = 2 2 ⁄ , the residual genotypic variance 184 to the total phenotypic variance ratio was calculated as 2 = 2 2 ⁄ , and the broad-sense heritability was estimated 185 as 2 = 2 2 ⁄ , where 2 = 2 + 2 is based on equation    remainder 4178 could not be correctly mapped to any particular chromosome, therefore they were instead grouped 205 together in an assumed "chromosome 13" (Table S2). LD values for Pair-wise SNPs within each chromosome were

208
Due to the negligible effects of dominance, dominance-by-site, and also first-order epistatic effects for all traits (Table   209 S3), clone means calculated across sites were used as phenotype values to perform cross-validations. Ten sets of 10-210 fold cross-validations were performed. When a major-effect SNP was included as a fixed effect, the GWAS-analyses 211 underlying the SNP-selection was performed on the training-portion of the cross-validation performance as well. In 212 summary, we employed a model as below: Where y' is the vector (904, 1) of the clonal means across sites, is the vector of fixed effects, including a grand mean 215 and a single-locus effect for the SNP showing the most significant association (smallest p-value) to the trait (when 216 included). Furthermore, is the vector of additive effects, is the vector of random residual effects. X and Z1 are the 217 matrices related to the and a. The random additive effects ( ) were assumed to follow ~(0, 2 ) with 2 the 218 additive variance in PBLUP-A whereas the matrix A will be replaced by Ga in the GBLUP-A model.

219
The prediction ability (PA) was defined as the Pearson correlation between predicted breeding values (EBVs) and

249
In addition to the four models mentioned above, we also performed several extra strategies.

258
To test if an increase in family size could improve the PA of PBLUP-C and GBLUP, we randomly selected five to 30 259 clones per family as a training set for the largest ten families with 48-56 clones per family (Fig. S1), using the 260 remaining clones in the ten families as a validation set. After pedigree correction, the number of clones for each of the 261 ten families varied from 32 to 56 (Table S1). Thus, the cross-validation was performed based on the corrected pedigree.

262
We also performed GP of within-family phenotypic values by using the clonal means as y' in equation

302
For the study of conventional genomic selection, 1500 loci, randomly selected among the total 15,000 loci, were 303 "genotyped" and were subsequently subjected to the same type of GBLUP-analysis and cross-validation procedure as 304 described previously for the real-life data (see equation [5]). However, for the simulated data, phenotypic data was 305 used directly for training and as a validation benchmark because the simulations did not include clonal replicates.

306
The evaluation of genomic prediction PA was done as previously described for the real-life data whereas prediction

337
Using all these genomic architecture setups, with the PVE of the major locus varying from 0% (fake) to 5% (true 338 major QTL), conventional genomic prediction procedure was first simulated. As the major locus was not assigned as

366
( 2 ) were substantial in size relative to the additive genetic variance estimates ( 2 > 0.2 2 ) under the best-fit model found that the additive variance under GBLUP-ADR was comparable to that estimated under GBLUP-AR (Table S3).  SNPs (Table S2. The physical extent of LD (r 2 > 0.2) within each chromosome varied from 33.7 kb in chromosome 2 382 to 54.6 kb in chromosome 9, with an average of 42.9 kb for the whole genome based on the SNP array in the studied full-sib family population (Fig. S2). The family clusters was clearly separated by the first two principal components 384 (Fig. S3)

391
To confirm the previous observations that GBLUP models in general offered a better fit to the data than PBLUP, and 392 to evaluate the efficacy of genomic prediction in general, we initially compared predictive abilities (PAs) estimated 393 from cross-validation under PBLUP and GBLUP models with the square root of the clone mean narrow-sense 394 heritability (hc, Table 3 and Table S5). Estimates of hc are interpreted as the theoretical PA and accuracy when applying     with the smallest p-values obtained a higher PA than that using all other numbers of SNPs (Fig. 3a). To check if GP 412 with a marker preselection by GWAS captured more Mendelian segregation effects, we also calculated the correlation 413 between estimated breeding values (EBVs) and within-family phenotypic variation (clone mean-family mean). We 414 found that GBLUP-S using 100-200 informative markers captured more of the phenotypic variation and exhibited the 415 lowest AIC value (Fig. 3c). This suggests that using all available SNPs for a trait indicated to be oligogenic may 416 introduce misleading noise in the prediction models.

417
Second, for the other five traits (DBH, FD, HT12, HT6 and PILO), we found that GP using a preselected set of SNPs 418 of the smallest p-value was better than GP with random marker selection if the number of pre-selected SNPs ranged 419 from 25 to a few hundred. However, when more SNPs were preselected, GP did not outperform random marker 420 selection. Furthermore, preselecting more markers beyond a few hundred still increased the PA of GP for these five 421 polygenic traits with PA being the highest when all 46,482 markers were used (Fig. 3a). When evaluating GP for

426
AIC values for all traits except BB showed a similar trend (Fig. 3c) where the lowest AIC value occurred at 2000-427 4000 p-value preselected markers.

428
Predictive ability comparisons between advanced GBLUP models

429
As already indicated in Fig. 3, only BB showed appreciable increases in overall PA from 0.61 for GBLUP to 0.72 for 430 GBLUP-S when the 100 SNPs exhibiting the smallest p-values in GWAS were included in the Ga-matrix (Table 3).

437
GBLUP divided by hc. We found that the overall accuracy of GP for all traits ranged from 0.58 for HT6 to 0.78 for

438
BB which was the highest observed accuracy.

440
To verify whether including a major gene locus as a fixed effect in the GBLUP model (GBLUP-F) would increase PA 441 of GP, we also performed 10 repeats for 10-fold cross-validation on simulated data where single SNPs exerting a 442 range of effect sizes (in terms of PVE) were included (

449
For the within-family accuracy (

457
To test the effect of the number of clones per family available to the training dataset on the PA using PBLUP and 458 GBLUP (Fig. 4), we sampled 5, 10, 15, 20, 25, or 30 clones from each of the ten largest families as a training data set 459 and the rest of clones in those families as a validation set. We found that the PA of both PBLUP and GBLUP

466
To test further whether combining GWAS p-value based marker preselection and the number of clones per family 467 would affect the predictive ability (PA) of within-family variation, we performed 10-fold cross-validation using

468
GBLUP based on clonal means as training phenotypic value (y') within the largest ten families (Fig. 5). We found that 469 the PA within-families of the ten largest families using GBLUP with GWAS p-value based preselection of a number 470 of markers produced similar trends (Fig. 5a) as the corresponding PA values estimated for the population as a whole 471 (Fig. 3a). In similarity to the results of the whole population, GBLUP of BB with 100-200 preselected SNPs produced 472 higher predictive ability and lower AIC than using a lesser or greater number of SNP for Ga-matrix calculation (

482
Based on the marker preselection results for within-family prediction (Fig. 5c), we also found that PA increased  regression effect in the model (GBLUP-F). We found that marker preselection increased pAs for some traits (HT12,

493
PILO, BB, and FD), but not for others (HT6 and DBH). The PA-estimates across families were very low for most 494 traits (0.03 to 0.11) but for BB the GBLUP PA was 0.19 and increased to 0.25 when using GBLUP-S (Table 5). To 495 fit the most powerful SNP as a fixed effect (GBLUP-F) was generally not helpful except for BB where the PA was 496 increased to 0.24 for GBLUP-F.

498
Genomic-based BLUP model could enhance the accuracy of the estimated additive and non-additive genetic 499 variances 500 In tree or conifer breeding programs, a control-pollinated clonal test provides an opportunity to simultaneously 501 estimate the additive, non-additive, dominance, and epistatic variances using pedigree. However, most heritability   . We also observed two to 13 SNPs significantly associated to the other five traits (aside from BB), which 533 makes the possibility to test if including the most significant SNP in GP models could improve the PA.

535
For budburst stage, we found that a Ga-matrix built from 100 preselected markers resulted in GBLUP-models having 536 lower AIC values and being better at predicting the genetic value in the absence of phenotypic data than did a Ga-537 matrix model using all available markers. Such preselected SNPs also were observed to capture a considerable amount 538 of within-family variation/mendelian segregation effects. However, for other more polygenic traits, genomic models 539 using ca. 4000 preselected markers showed a comparable PA compared to the model using all markers, even though 540 such models exhibited lower AIC values. This indicates that the preselection of influential markers is more likely to 541 be successful when applied to a limited set of markers showing highly significant associations to an oligogenic trait 542 where a relatively limited number of QTL are likely to exhibit major individual effects (PVE > 1%).

543
In this study, we also investigated a realized GP model for the budburst stage where a large-size QTLwith a PVE of 544 ca. 5% was explicitly included as a fixed regression effect and we observed a 4.4% improvement in overall PA (Table   545 3) in comparison to a GP model without such a modification. Such enhancement was also observed in several empirical  (Table 4). However, if the QTL effect size 551 was less than ca. 1.25% of the phenotypic variation, the simulated data analyses did not indicate any advantage for 552 the model including the locus as a fixed effect in terms of PA or accuracy. Based on more than dozens GWAS results

553
in tree species (Hall, et al. 2016), SNPs detected with a PVE >1.25% is not uncommon, indicating that genomic 554 prediction adding a QTL of large size should be a very useful approach to improve PA and accuracy. This may indicate 555 that genomic prediction using GWAS results would be more efficient for oligogenic traits and maybe less effective 556 for polygenic traits in which GWAS only detected a few SNP-trait associations with a small effect size.

557
Family size matters for the efficiency of genomic prediction

558
A larger family size (number of clones per family) is usually expected to capture more Mendelian segregation effects 559 (Arenas et al, 2021) and improves the PA and accuracy of genetic parameter estimates (Perron et al, 2013). In this 560 study, we observed that increasing the family size is important to improve the PA, both with respect to overall 561 phenotypic prediction (Fig. 3) but also to the prediction of within-family variation (Fig. 5). This is especially important 562 when the family size is small (less than 10 or 15 clones per family). For example, in Fig 5, within-family PA appeared 563 to be even less than 0 for DBH when the number of clones per family was less than 15. We again observed that GP 564 using marker preselection was generally better than that using all markers for capturing within-family variation based 565 on Mendelian segregation.

566
Relationship between training and validation datasets highly important for genomic prediction 567 Forward-selection tree breeding usually entails the selection of a few unrelated progenies within a segregation test 568 population (F1). A few new controlled crosses among the selected elite trees produce a new batch of progenies (F2).

569
Thus, the GP using existing F1 as a training dataset to predict F2, the relationship between F1 to F2 will decrease 570 compared with relationships within the same F1 generation as was used for 10-fold entirely random cross-validation 571 in this study. We therefore also investigated the situation where training and validation datasets contained individuals 572 from separate families and we could produce a 32-fold cross-validation scheme by removing phenotypic data for one 573 particular family at the time. The relationships between the validation family and the training dataset were thus 574 restricted to the level of half-sibs or even weaker. Thus, the PA for predicting a particular validation family was 575 considerably lower than for conventional random cross-validation (

599
On the other hand, LD matters when genomic prediction is conducted using a constant number of markers and for the 600 estimation of marker-based narrow-sense heritability. A few hundred thousand random sampled markers usually only 601 capture a limited proportion of the heritability when estimated using a population with unrelated individuals (the 602 missing heritability phenomenon), especially for polygenic traits in conifer species . The extent of LD (based on the threshold r 2 ≥ 0.2) was observed to be 42.9kb when based on analyses using the 50k SNP array in 604 this studied population (Fig. 1). However, based on such an extent of LDs between markers and QTLs, the 50k SNP 605 array only covered ca. 25% of the total genomic size of Norway spruce (20G) (Nystedt et al, 2013). This could be a 606 reason why the PA of GP did not reach the standard value of the square root of additive clone mean heritability (Table   607 3) and why within-family and across-family PA was very low for most of the studied traits (Table 5 and Fig. 5). In 608 order to capture more Mendelian segregation between QTL and makers, we would then need more markers for marker    -1355.0 2 , 2 , 2 , 2 , ̅ 2 represents variances of additive, additive-by-site, residual of genetic, residual of genetic-by-site, and the average of residual effects, respectively. ℎ 2 and 2 represent the narrow-sense and broad-sense heritabilities, respectively. AIC represents Akaike information criterion. * represents that the model showed the smallest AIC value compared to the same other four GBLUP or PBLUP models. Table 3 Predictive ability (PA) of different types of pedigree-and genome-based prediction models following a 10fold cross-validation procedure.       If the parents only cross once, then dark goldenrod means male and dark skyblue means female. Otherwise, the color of parents could be any of both colors.