Multi-trait ensemble genomic prediction and simulations of recurrent selection highlight importance of complex trait genetic architecture in long-term genetic gains in wheat

Cereal crop breeders have achieved considerable genetic gain in genetically complex traits, such as grain yield, while maintaining genetic diversity. However, focus on selection for yield has negatively impacted other important traits. To better understand selection within a breeding context, and how it might be optimised, we analysed genotypic and phenotypic data from a diverse, 16-founder wheat multi-parent advanced generation inter-cross (MAGIC) population. Compared to single-trait models, multi-trait ensemble genomic prediction models increased prediction accuracy for almost 90% of traits, improving grain yield prediction accuracy by 3-52%. For complex traits, non-parametric models (Random Forest) also outperformed simplified, additive models (LASSO), increasing grain yield prediction accuracy by 10-36%. Simulations of recurrent genomic selection then showed that sustained greater forward prediction accuracy optimised long-term genetic gains. Simulations of selection on grain yield found indirect responses in related traits, which involved optimisation of antagonistic trait relationships. We found multi-trait selection indices could be used to optimise undesirable relationships, such as the trade-off between grain yield and protein content, or combine traits of interest, such as yield and weed competitive ability. Simulations of phenotypic selection found that including Random Forest rather than LASSO genetic models, and multi-trait rather than single-trait models as the true genetic model, accelerated and extended long-term genetic gain whilst maintaining genetic diversity. These results suggest important roles of pleiotropy and epistasis in the wider context of wheat breeding programmes and provide insights into mechanisms for continued genetic gain in a limited genepool and optimisation of multiple traits for crop improvement.

. Abbreviations of traits phenotyped in the NDM, as described by Scott et al. (2021). All data are from 147 field trials, except where noted. Nursery = data collected from 1×1m unreplicated plots. Field = data collected 148 from 2×6m replicated plots. Trait groups indicate groups of strongly positively or negatively correlated traits that 149 grouped together through hierarchical clustering as shown in Figure 1. Some traits were phenotyped at multiple 150 time points (GLA) and in both trail years so that a total of 72 traits were included. 151 These included generalised linear models including the Lasso penalty (LASSO) implemented in the 'glmnet' R 162 package (Friedman et al., 2010) where the majority of SNP effects are shrunk to zero. For comparison, a non-163 linear, statistical learning approach was also used which generally included much larger numbers of SNPs with 164 non-additive effects in each model: Random Forest (RF), implemented in the 'randomForest' R package 165 (Breiman, 2001). For LASSO models, the value of lambda (a shrinkage penalty) used for each prediction was 166 optimised using 8-fold cross validation. For RF models, 300 trees were run per model and default parameters of 167 one third of variables randomly sampled at each split, and a minimum of five observations in terminal leaf nodes 168 were used. Previous work by Scott et al. (2021) found that ridge regression models that include all marker effects 169 with a small, but non-zero effect, did not have as high prediction accuracy as LASSO in the MAGIC population, 170

Abbreviation
and so were not further tested here. For this, a two-step model was used where each trait was first predicted from genomic data with the same genomic 179 models as for ST predictions, and then all trait predictions were used as explanatory variables (features) in a 180 second multi-trait ensemble model to again predict each response trait. Both first and second stage predictions 181 were fitted only on data from the training fraction and predictions were independently made for test lines with 182 only genetic marker data. Either LASSO or RF models were used to fit first stage ST models, but only RF models 183 were used to flexibly include non-linear multi-trait relationships for the second stage ensemble models. 184 Information from related traits is therefore used in a model, trained only on the training fraction to adjust single-185 trait predictions made directly from genomic data. Unlike trait-assisted genomic prediction, such as used by 186 Fernandes et al. (2018), no observed trait data are used in the tested cross-validation fraction. As both MT 187 prediction approaches can be applied with any genomic prediction model for each ST prediction or for SVD 188 vectors, we were able to compare LASSO and RF genomic models for both ST and MT approaches. 189 among all lines in the dataset and averaging the three Pearson's correlation coefficients between observed and 191 predicted trait values across all cross-validation folds. Valid comparisons of prediction accuracy were ensured by 192 testing all prediction models using the same cross-validation fold assignments. After model cross-validation, full 193 prediction models were fitted using the entire dataset for combinations of both ST and MT models with both 194 LASSO and RF genomic models. Variable importance scores for each SNP marker in RF genomic prediction 195 models and for each trait covariate in MT ensemble models were derived from the full models as the mean 196 decrease in mean square error using the 'importance' function in the 'randomForest' R package. Effect sizes for 197 each SNP marker were also derived from full LASSO models where the majority of SNP effects were shrunk to 198 zero. 199 200 We first simulated a recurrent genomic selection programme within the NDM to assess the performance of 201 different prediction models to achieve long-term genetic gain in grain yield. This was done within a framework 202 of assuming a true inherited genetic model based on the MT ensemble RF genetic model as outlined above and 203 trained on the observed genetic and phenotype data. True phenotypes were derived from predictions from this 204 model for genotypes at each cycle of simulations and the different genomic prediction models outlined above 205 were trained on the individuals in the first cycle. New cycles of genotypes derived from crossing selected fractions 206 of lines were simulated using a genetic map of ~55,000 SNPs. 207

Simulations of recurrent genomic selection
The genetic map (Supplementary Table S1 and Supplementary Figure S1) was constructed using the 208 'qtl2' R package (Broman et al., 2019) with the marker data ordered by physical map position (RefSeq v1,0, 209 IWGSC, 2018). The genetic map distance was then re-estimated using the 'est_map' function with 1,000 210 maximum iterations and an assumed genotyping error probability of 0.001. The cross object was considered as a 211 16-way multi-parent recombinant inbred line population, so the differing local recombination effects for each 212 founder haplotypes were preserved for subsequent simulations. 23 markers were removed from the full set which 213 caused genetic map distortion. 214 Selection of lines at each generation were made based on predicted phenotypes from the genomic 215 prediction model. To reduce excessive inbreeding and loss of genetic variance, 15 lines from different 16-way or 216 bi-parental families with the highest selection index values were selected. 30 offspring inbred line genotypes were 217 then simulated for each of 105 possible pairwise cross combinations among the selected lines so that the following 218 generation comprised of 3,150 lines from 105 biparental families. The phenotypes of these were again predicted 219 from the genomic prediction models trained on the true phenotypes of the first generation and the process repeated 220 for 20 cycles of recurrent selection. 20 iteration repeats of the simulations were run for each genomic prediction 221 model. Genomic prediction models were fitted as detailed above for ST and MT, LASSO and RF models, but 222 additionally RF models were run that were restricted to a tree depth of one (RF1) to completely limit the degree 223 of marker interaction effects. 2,000 trees were used for RF1 models. 224 Genetic gain for each trait over the selection simulations were determined by comparing the mean true 225 trait value of all lines at each generation to the mean true trait value in the first generation. The divergence from 226 this mean among different traits was standardised to the standard deviation of trait values in the first generation. 227 the true and predicted trait values among genotypes at each simulation cycle. 229

230
In addition to simulations of different genomic selection procedures within a simulated true genetic model, we 231 also compared simulations with different true genetic models to assess both phenotypic and genomic response to 232 selection with different genetic model assumptions of trait genetic architecture. Simulations were run as above 233 but selections of individuals were based on the true phenotypes derived from different genetic models so that it 234 was assumed that the simulated breeder could make perfect estimates of trait values from phenotypic selection. 235 Different simulations were run for ST and MT as well as RF and LASSO models as outlined above and for 236 different selection indices as outlined below. Genetic response to selection was also characterised as the change 237 in allele frequency for all ~55,000 SNPs at each generation, and the genetic diversity was calculated as the number 238 of polymorphic SNPs at each generation. For each set of simulations, traits or SNP markers were considered under 239 selection rather than drift if their response to selection was significantly different to 0 considering all 20 simulation 240 repeats using a t-test. 241  264 We analysed data from a genetically diverse and highly recombined 16-founder wheat MAGIC population for a 265 wide range of agronomically important traits over multiple trial years (72 traityear combinations) to investigate 266 complex trait-trait relationships and their implications for breeding. Correlation analysis across all traits and years 267 revealed complex trait relationships and substantial differences in patterns between the two trial years (Figure 1  Differential relationships between yield and other developmental stage and plant architecture traits 276 between the two trial years were also found. In year 1, taller and later flowering genotypes were generally higher 277 yielding (yieldheight to ear tip correlation = 0.20; yieldheading date correlation = 0.32), whereas in year 2, 278 the correlation between height and yield was negative (correlation = -0.11) and between yield and heading date 279 was non-significant. Therefore, strong genotype-by-environment interaction (G×E) effects on yield means that 280 selection for yield, or related adaptive traits, in any single environment may have limited potential to increase 281 yield in another environment. However, contrasting patterns of rainfall and temperature between the two yield 282 trial years ( Other plant architecture traits, such as green leaf area (GLA) in the development phase, juvenile growth 302 habit and flag leaf morphology had weak but positive correlations with yield, suggesting potential relevance of 303 these traits as mechanisms to increase yield, or as valuable traits themselves to select for in combination with yield 304 to increase crop competitive ability with weeds. However, the strong positive correlations between GLA traits 305 and plant height traits mean that increasing these traits without increasing lodging risk may be problematic. 306

Grain yield is correlated with multiple traits in the observed population
Optimising combinations of important traits therefore requires consideration of correlated responses due to 307 pleiotropy and linkage. 308 309 We tested the accuracy of several genomic prediction approaches to determine the genetic architecture of the 310 multiple related traits, using both single-trait (ST) and multi-trait (MT) models that take into account relationships 311 among correlated traits. LASSO represents models including trait genetic architecture controlled by a minimal 312 number of additive genetic effects across the genome, while Random Forest (RF) represents models including a 313 much greater number of interacting genetic effects. RF outperformed LASSO for most traits in ST models and 314 was particularly advantageous for traits with generally low genomic prediction accuracy, such as GLA and grain 315 yield in both years (Figure 3a). Prediction accuracy was increased from 0.34 and 0.20 in ST LASSO models to 316  Forest multi-trait ensemble models for prediction of grain yield in each year (GY_1 and GY_2). All trait 367

Genomic prediction of complex traits
abbreviations are as listed in Table 1 and colour coded according to trait group, as in Figure 1. The _1 and _2 368 designations used after trait abbreviations refer to trial year 1 and trial year 2, respectively. 369 370 371 simulation of recurrent genomic selection 372 We then investigated the potential for different genomic prediction models to achieve genetic gain in yield through 373 simulation of a recurrent genomic selection programme within the NDM. MT ensemble RF models were found 374 to be the most accurate model from cross validation within the observed population (Figure 3) and so were used 375 as the true genetic model to define the true phenotypes of simulated lines. The genomic prediction models were 376 then trained on the simulated true phenotype data of lines in the first generation and genomic predictions of 377 phenotypes were used to make selections for subsequent cycles of lines. 378

Accurate genomic prediction models increase long-term genetic gain in
The accuracy of genomic prediction models over the course of the selection simulations generally 379 reflected those in cross validated of the observed data. Prediction accuracy of all models decreased in later cycles 380 of the simulations, but models that were more accurate in the observed data and maintained accuracy for longer, 381 such as MT RF and ST RF (Figure 5a), achieved greater long-term genetic gain in yield (Figure 5b). RF models 382 that included restricted trees with an interaction depth of one so that genetic marker interaction effects could not 383 be included were much less accurate and led to less genetic gain, particularly for grain yield in year 1 (Figure 5). 384 This suggests an important role for prediction of non-additive genetic effects in RF models for continued accuracy 385 of genomic prediction models, particularly as breeding cycles become more distantly related to the training set.  year was achieved in simulated selection for yield in either year. This G×E effect for yield was reflected by how 425 the yield component traits were co-selected with yield between the two years. Grains per ear (GPE) and grains 426 per spikelet (GPS) were increased when selection was for grain yield in year 2 but remained mostly neutral for 427 grain yield in year 1 (Figure 6). Further to the differential importance of traits in the multi-trait ensemble models 428 outlined above, differential selection responses of yield component traits according to yield in differing 429 environments highlights the capacity for G×E interactions to buffer response to selection for grain yield. However, 430 when G×E is predictable, in certain target environments, contrasting yield component strategies could be used to 431 adapt the crop to the environment. For example, it may be supposed that a genotype that can be high yielding by 432 producing many grain sites per ear throughout an extended development phase due to being later flowering would 433 be better adapted to environments without terminal drought stress. in Figure 1, while grey points indicate trait pairs from different groups. 492 493 genetic gain in grain yield for each of the selection indices: mean grain yield across both trial years increased 495 when using both selection indices, but at a slower rate, particularly for grain yield in year 1 under GYPD selection, 496 compared to when grain yield was selected for per se (Figure 8). However, in comparison to gain in grain yield 497 in either one of the two years when selection was for grain yield in the other year, both GYPD and the yield + 498 weed competition selection indices achieved generally comparable gains for yield whilst also increasing other 499 favourable traits (Figure 8). These results show that antagonistic trait relationships are generally possible to 500 optimise through appropriate selection. However, while this may slow genetic gain to some extent in the primary 501 traits of interest, such as grain yield, this more realistically represents the balance of selection for multiple traits 502 that occurs in breeding programmes.

Different true genetic models affect long-term response to simulated
513 selection 514 After comparing simulated response to different selection indices, we then tested how using different true genetic 515 models that are based on the different genomic prediction models trained on the observed NDM population had comparable variances in prediction values to the observed grain yield data, indicating their realistic prediction 521 of phenotypic variation. While this would be expected from an overfitting model, cross validation with the 522 observed data showed an increased accuracy of these models. 523 Simulations with MT RF true genetic models tended to have the largest and longest increase of genetic 524 gain over the course of simulated selection of grain yield (Figure 9b). Genetic gain in grain yield plateaued, at a 525 relatively low level, after only around six cycles of selection using ST LASSO genetic model simulations but both 526 cycle time to plateaux and plateaux height (maximum genetic gain) were both extended by either using a RF or 527 MT model. This pattern of faster and higher genetic gain in RF or MT models was accompanied by the retention 528 of higher phenotypic (Figure 9c) and genetic (Figure 9d) variance, particularly over long-term selection in the 529 MT models. Almost all non-zero LASSO SNP effects were fixed after 8 cycles of selection in any simulation 530 repeat for selection for grain yield in either year, limiting further genetic gain (Figure 9). Continued loss of genetic 531 diversity once all genetic effects that affect phenotypic variance were fixed was down to genetic drift. Many of 532 the SNP with highest variable importance in RF models were in common with the largest LASSO SNP effect 533 coefficients, and the largest of these were fixed in the first few cycles of selection at a similar rate for both ST RF 534 and ST LASSO (Figure 9e), where almost all of the ten SNPs with the largest LASSO effect or RF variable 535 importance were fixed after five cycles of selection for both models. However, RF models included many more 536 SNPs with non-zero importance (~20,000) than non-zero LASSO effects (61 and 87 for grain yield in years 1 and 537 2, respectively) and many more of these small or non-additive genetic effects in RF models remained polymorphic 538 for longer (Figure 9e). For example, a significant proportion of these (14.3 and 13.8% for grain yield in years 1 539 and 2, respectively) remained polymorphic after 10 cycles of selection while genetic gain in yield still continued 540 to increase (Figure 9b). This suggests that accumulation of the SNP effects, that were too small to be included in 541 LASSO, or complex non-additive SNP by SNP epistatic genetic effects, made a large contribution to continued 542 long-term genetic gain in RF models even after large effect QTL are fixed. 543 Furthermore, simply adding a MT second step to LASSO models to include indirect pleiotropic effects 544 also increased and extended long-term genetic gain to a similar or greater extent to ST or MT RF models (Figure  545 9b). Using MT models, LASSO SNP effects were fixed at a much slower rate (Figure 9e) and phenotypic and 546 genetic variance was maintained for much longer (Figure 9c-d), where on average 12% of the ten largest LASSO 547 SNP effects for each single trait were polymorphic after 10 cycles of selection for across all simulation repeats 548 with selection for grain yield in both years. This also suggests that the greater degree of pleiotropy present in MT 549 models, which increased prediction accuracy for low accuracy LASSO models in particular (Figure 3), meant 550 that the number of small effect loci involved in each trait was greatly increased. However, the number of indirect 551 pleiotropic LASSO SNP effects across non-additive ensemble models could not be quantified. Selection could 552 therefore act on more complex trait relationships driven by pleiotropy and/or linkage. 553 Linkage among antagonistic genetic effects could be shown to partly limit genetic gain. On average, only 554 0.8 and 5% of non-zero ST LASSO model SNP coefficients were negatively fixed resulting in an average of 0.28 555 and 2.15% loss of the maximum yield after 20 cycles of selection for yield in each year respectively. However, 556 this was exacerbated in MT LASSO genetic models where 16.4 and 28.7% of ST LASSO SNP effects with 557 negative effects on the trait under selection, were incorrectly fixed resulting in 14.1 and 23.5% loss of genetic 558 gain. This further indicates insufficient recombination to completely decouple antagonistic linked QTL that were 559 not directly involved in ST LASSO models for grain yield directly but pleiotropically linked through MT models. 560 proportional to SNP effect size in LASSO models and variable importance score for RF models. 573 574

575
A complex structure of trait relationships that interact with environmental conditions were found to be involved 576 in prediction of grain yield. Through simulation of recurrent selection within a genetically diverse highly 577 recombined multi-founder wheat population, and based on observed genomic and phenotypic data, we tested 578 several contrasting genetic models and quantitative genetic approaches to recurrent selection. We found that, in 579 comparison to a simplifying LASSO genetic model where each trait was predicted directly from a minimal subset 580 of markers with additive effects, prediction accuracies were increased both by using a multi-trait ensemble 581 approach and Random Forest prediction models, which potentially incorporate pleiotropy and epistatic effects 582 respectively. This was particularly so for complex traits with low prediction accuracy, and in simulations of 583 recurrent selection these models also increased the rate and extent of long-term genetic gain, whilst maintaining 584 phenotypic and genetic variance. Thus, genomic prediction models that include more complex genetic effects 585 such as epistasis, and pleiotropy may better reflect how continued genetic gain is achieved through breeding. 586 587 We showed that modelling relationships among traits is valuable for increasing genomic prediction accuracy. 588

The value of multi-trait models
Traditional multi-trait genomic prediction models consider the covariance structure of related traits across 589 multiple environments and replicates and increase genomic prediction accuracy for cross-validation schemes 590 when test fractions include partially phenotyped individuals in the test environment (Jia and Jannink, 2012). 591 However, other studies often do not find an advantage to multi-trait models for untested genotypes in real datasets 592 consistently outperformed single trait models, while a contrasting approach using single vector decomposition of 595 the multi-trait matrix performed poorly and more variably across traits. Although the increase in prediction 596 accuracy was small for most traits, the advantage of multi-trait ensemble models was particularly great for traits 597 that were poorly predicted by LASSO models, suggesting that ensemble methods efficiently incorporate additional 598 information from large numbers of small pleiotropic genetic effects among related traits, which ST LASSO 599 models would otherwise overlook when each trait is considered independently. Traits such as grain yield are 600 polygenic and few genetic markers with large and consistent effects have been identified and applied in breeding 601 (Bernardo, 2016). However, predictions of component traits of yield, many of which have simpler genetic 602 architectures (Scott et al., 2021), can improve the ensemble prediction model for yield. We found that many highly 603 correlated traits were used as covariates with high importance in multi-trait models. Furthermore, the covariate 604 importance scores of traits in the ensemble models highlight physiological mechanisms for trait improvement and 605 enable optimisation of antagonistic trait relationships (Figure 4). Where yield components correlate negatively 606 with each other, the multi-trait ensemble model is able to optimise the interplay among these traits to increase the 607 prediction accuracy of yield as the primary trait of interest. Similar to the approach taken by Powell et al. (2022) 608 who modelled multiple systems biology development processes to bridge the gap between genotype to complex 609 phenotype, we used multiple physiological traits in more agnostic models without defined crop growth parameters 610 to aid prediction of the complex processes behind grain yield. 611 breeding, any single strategy to achieve high yield may be hampered by unpredictable year-to-year environmental 618 variation, and thus limit response to selection and reduction in genetic variance. While our simulations of future 619 genetic gain cannot account for unmeasured environments in the future, commercial wheat breeders often take 620 this into account and make selections of promising lines with a diversity of phenological or plant height traits to 621 ensure adaptive potential. 622

The potential to optimise trait trade-offs that conventional breeding has
623 neglected 624 Using multi-trait data from a MAGIC population that controls for confounding effects of population structure 625 (Scott et al., 2020), we found that pleiotropy and/or tight genetic linkage are significant causes of correlated trait 626 responses to selection. These data also shed light on the combination of traits that would be required to be co-627 selected or optimised to achieve continuous gains in grain yield as a primary trait under selection. Furthermore, 628 we find antagonistic trade-offs among traits that have been problematic for wheat crop improvement. We suggest 629 that historic enhancement of grain yield by breeders at the cost of key traits such as weed competitive ability, or 630 grain protein content, has been due to the over-riding value placed on grain yield as a primary selection criterion 631 during variety testing, as well as market pressures, leaving little scope for compromise with other traits. Integration 632 of novel trait variation to optimise these trade-offs within an elite wheat genepool, which has been under such 633 strong directional selection, would therefore be difficult. However, simulations presented here show that with 634 appropriate selection indices, genetic gain in both yield and other valuable but negatively correlated traits was 635 possible to some extent. 636 We propose that pleiotropic and epistatic genetic effects and G×E interactions have played a major role in 665 maintaining wheat genetic diversity despite strong selection and will be particularly important for applied genomic 666 selection of elite varieties in already highly selected breeding populations. suggested that multi-parent crossing schemes may be valuable for maintaining genetic diversity. However, the 671 diverse MAGIC wheat population described here is unlikely to generate commercially competitive varieties due 672 to the broad genetic basis and historic founders. Instead, this MAGIC population samples and recombines genetic 673 diversity across 70 years and can therefore be considered a microcosm of past and future selective breeding. In 674 this context, our simulations rerun alternate histories to test different selection models and approaches and reveal 675 physiological and genetic mechanisms for future breeding. We suggest that this approach, including multi-trait 676 ensembles, could be further integrated with environmental information to inform crop models (Cooper et al., 2021; 677

The value of complex genomic prediction models for continued genetic
Stöckle and Kemanian, 2020). Considering that traditional wheat breeding programme cycles generally extend 678 over at least five years, our simulations of twenty cycles of recurrent selection represent an equivalent of over one 679 hundred years of traditional wheat breeding (albeit it with no further input from genotypes outside of the 16 680 founders from which the MAGIC population was constructed). Current wheat breeding programmes are also 681 likely to be at a point towards the later stages of selection simulations presented here where the majority of large 682 effect QTL are either fixed or well accounted for. Further genetic gain in current breeding programmes will 683 therefore likely be achieved through optimisation of small and complex genetic effects (Gorjanc et al., 2018). 684

698
In summary, we demonstrated the value of multi-trait ensemble models for genomic prediction of complex traits 699 and simulated recurrent selection using these genetic models based empirically on an extensively genotyped and 700 phenotyped NDM population. We consider this a microcosm of wider wheat breeding programmes working with 701 the wider pool of wheat germplasm so that our results provide insights into the trends and mechanisms by which 702 the considerable progress and genetic gain has been made in modern wheat breeding without apparent genetic 703 diversity loss. These findings highlight the importance of models and approaches that take into account these 704 mechanisms to maximize further genetic gain in the future.