A hybrid optimal contribution approach to drive short-term gains while maintaining long-term sustainability in a modern plant breeding program

Plant breeding programs must adapt genomic selection to an already complex system. Inbred or hybrid plant breeding programs must make crosses, produce inbred individuals, and phenotype inbred lines or their hybrid test-crosses to select and validate superior material for product release. These products are few, and while it is clear that population improvement is necessary for continued genetic gain, it may not be sufficient to generate superior products. Rapid-cycle recurrent truncation genomic selection has been proposed to increase genetic gain by reducing generation time. This strategy has been shown to increase short-term gains, but can quickly lead to loss of genetic variance through inbreeding as relationships drive prediction. The optimal contribution of each individual can be determined to maximize gain in the following generation while limiting inbreeding. While optimal contribution strategies can maintain genetic variance in later generations, they suffer from a lack of short-term gains in doing so. We present a hybrid approach that branches out yearly to push the genetic value of potential varietal materials while maintaining genetic variance in the recurrent population, such that a breeding program can achieve short-term success without exhausting long-term potential. Because branching increases the genetic distance between the phenotyping pipeline and the recurrent population, this method requires sacrificing some trial plots to phenotype materials directly out of the recurrent population. We envision the phenotypic pipeline not only for selection and validation, but as an information generator to build predictive models and develop new products.

tion methods therefore do not necessarily select the top individuals, and the contribution of 126 individuals is not typically equal, nor is the number of contributors constant across genera-127 tions. With proper constraints, optimal contribution can either drive means for short-term 128 gains while exhausting genetic variability, or achieve modest gains while maintaining genetic 129 variation for long-term sustainability. Use of optimal contributions is widespread in animal 130 breeding applications, and has recently been adopted for a few plant breeding applications Ideally, the breeder would prefer to achieve superior products in the short-term, without 154 sacrificing the genetic variability necessary for long-term product development. In many crop 155 species, inbreeding reduces V g while having little to no measurable inbreeding depression, 156 allowing for the creation of genetically uniform commercial products. Given high prediction accuracy, a recurrent population could be maintained to produce new meiotic events with 158 lasting genetic variability, while branching out on a yearly basis to drive the genetic values 159 of materials destined for the VDP. Inbreeding within the branch then has no effect on long-160 term potential. This strategy could allow for reduction in the size of the VDP, potentially 161 recovering costs of genotyping materials. 162 Through simulation, we explore the potential of branching out to drive means for short-163 term gains while maintaining genetic variability for long-term sustainability with optimal 164 contributions. We also investigate the interaction between these methods and the size of the 165 VDP, both as a selection and validation pipeline, as well as an information generator. sampling 100 sites per chromosome to serve as QTL, and 100 sites per chromosome to serve 174 as markers. By chance, 108 markers were assigned to QTL loci across the genome. The 175 founder population was then sampled to produce 100 individuals as the starting population 176 for each of 100 simulation runs, and were identical across all selection schemes. To initiate 177 the simulation, the sampled founder population was then phenotyped with a single plot 178 observation of the trait, using a heritability of h 2 = 0.3.  (Table   181 1). Inbred lines formed by a doubled haploid process were fed into the first year trials in 182 each VDP at the start of each year. Phenotypic performance was used to advance lines 183 through four years of phenotype trials, allowing the top 10 lines that advanced through 184 the fourth trial to be considered "varieties". Phenotypes, as opposed to estimated breeding values that include both phenotypic and marker information, were used for line advancement.

186
This allows for direct comparison to the traditional program, which does not use any marker 187 information. The number of replicates at each stage of selection was increased, corresponding 188 to a single replicate in one location, two replicates in two locations and three replicates in 189 five locations, with a final validation of three replicates in five locations. The mean of 190 the 10 varieties at the end of each phenotype cycle was used to determine the merit of a 191 given selection scheme, while also providing numerical stability. Trial sizes, replications and 192 selection intensities for the small, medium and large VDPs are indicated in Table 1. 193 To simulate phenotypes for each VDP, the error variance was set to produce a plot level 194 heritability of h 2 = 0.3 (i.e. V e = 7/3) for the founder population (V g = 1), and was 195 held constant such that the realized heritability would decrease as V g decreased through 196 time. No G×E variability was introduced for simplicity, meaning the genetic correlation of 197 locations and years is 1. Therefore, multiple environments (e.g. locations) are equivalent to 198 replications within a single environment.

199
A ridge regression genomic prediction model (Whittaker, Thompson, and Denham 2000),    of the selection is the inbreeding coefficient in the following generation, and is calculated as 1/2c Ac, where A is the additive genetic covariance. As A and b are typically unknown, 235 we substitute an additive genetic covariance estimate derived from genome-wide markers or 236 a pedigree,Â, and breeding value estimates (EBVs or GEBVs),b, derived from a mixed 237 model for these parameters respectively. When centered genome-wide markers are used to 238 calculate the additive genetic covariance (VanRaden 2008), it is the change in the average 239 co-ancestry that is calculated by ∆ f = 1/2c Â c.

240
Given some desired genetic gain, ∆ g , the increase in the inbreeding coefficient ∆ f can be 241 minimized, or conversely, ∆ g can be maximized given some acceptable increase in inbreeding, 242 ∆ f (Meuwissen 1997). This problem can be formulated as function, F , of c.
Given some value of λ such that 0 ≤ λ ≤ 1, this equation can readily be solved for

252
Here, the optimal contribution selection scheme was implemented to maximize genetic 253 gain, ∆ g , given a set level of increase in inbreeding, ∆ f . To achieve short-term success without sacrificing long-term gain, we modify the optimal 262 contribution scheme by branching the mating scheme each year into two paths: one constant 263 path that maintains genetic variability in the recurrent population, and yearly branches 264 that maximize genetic gain while relaxing the limitations on inbreeding within the branch, Branches were initiated in the year prior to when materials will be phenotyped, either 0, 1, 267 or 2 cycles into the RCRS cycling for that year. A branch at cycle 3 is equivalent to the 268 optimal contribution scheme, as no time remains to make crosses before the inbreeding step.

269
The branching increases of genetic distance between the recurrent population and the 270 phenotypic information that is used to make decisions, and thus reduces prediction accuracy  The first four years were required to populate the VDP, causing some instability in the first 281 few years of each selection program. Once the VDP is populated, the system stabilizes. We 282 left these burn-in years for transparency, but focus on the effects of selection schemes after 283 the first 4 or 5 years for discussion.  , optimal contribution (∆ f = 0.005), and optimal contribution with branching (∆ fg = 0.1) , and phenotyping 0.6f n RCRS inbred lines A) compared to the traditional selection scheme, and B) expressed as a proportion of the traditional selection scheme for three VDP sizes (f × n) across 30 years.
for variety means initially, but quickly exhausted genetic variability in later generations (Sup-305 plementary Figure S1). An intensity ofĩ = 0.52 (corresponding to a 30% selection intensity) 306 appeared to balance short-and long-term gains and was used for all further breeding scheme 307 comparisons.

308
The traditional breeding program produces accurate estimates of breeding values, but 309 takes a relatively long time to recycle good material. Even in the expedited traditional 310 scheme used here, good lines required at least two years of evaluation before they were 311 deemed candidates for crossing. Despite the reduced accuracy of selection, the threefold 312 increase in the number of cycles, and sixfold decrease in cycle time allows the rapid-cycle RT 313 scheme to dominate until V g is exhausted. The aggressive turnover rate fixes many beneficial 314 alleles quickly, but in doing so also fixes many deleterious alleles (Jannink 2010).

315
In this simulation, the products arising from rapid-cycling rarely outperformed the prod-316 ucts from a traditional scheme by more than 10-20%. While the mean genetic value,μ, of 317 recurrent population may increase faster under a rapid cycling scheme, this did not translate 318 directly to similar increases in the genetic value of the products released. The majority of 319 the selection intensity occurs in the VDP ( Figure 4B), emphasizing the role of the VDP as 320 a selection and validation machine to mine the tails of the distribution.

321
The recurrent population was often close, if not better in its average genetic value than 322 the varieties released during the same year. This suggests that the validation in the VDP is 323 a hindrance to expedited product development. While reduction in the number of years of 324 performance trials before release may be feasible, it is unlikely going to be less than two or 325 three. The risk of releasing a poor performing product is so much more costly than failing to 326 release a good one, that breeding programs are unlikely to adopt a strategy without extensive 327 evaluation. However, this does present opportunities to restructure the VDP to maximize 328 the rate of product development.  Figure S2). The traditional scheme maintained enough variability to outperform the optimal 333 contribution scheme at the end of 30 years in the largest VDP. In practice, the optimal or 334 acceptable level of ∆ f is typically unknown, and will likely be trait dependent. We assumed 335 an infinitesimal model for calculatingb andÂ, which is likely safe for complex traits, but Prediction accuracy Year Figure 3: Prediction accuracy of the recurrent population for optimal contribution (OC), optimal contribution with branching (OCB), and optimal contribution with branching while using 0.6f n of the first year trials to phenotype materials out of the recurrent population (OCBpR).

339
The naïve branching scheme failed spectacularly for all values of ∆ f and ∆ f b tested (Supple-340 mentary Figure S3). Earlier branches resulted in lower prediction accuracy of the recurrent 341 population due to a greater genetic distance (i.e. more mieotic events) between the pheno-342 typic information source and the target decision materials (Supplementary Figure S4). The 343 lower prediction accuracy in the recurrent population led to lower gain, eventually resulting 344 in lower varietal means. Here, it is the failure to improveμ in the recurrent population 345 that leads to poor performance, demonstrating that population improvement is necessary 346 for long-term gain.

347
Sacrificing some first year trial plots to phenotype random inbred lines out of the RCRS 348 drastically improved the performance of the branching scheme (Supplementary Figure S5).

349
This was due primarily to recovery of prediction accuracy by providing more useful phe-350 notypic information for decision making within the RCRS (Figure 3). Therefore, we refer 351 to these plots dedicated to obtaining useful phenotypic information as "information plots".

352
While family sizes were reduced to phenotype RCRS material without changing the total 353 number of plots, this had no adverse effect on the varietal means. Because branching and 354 increasing ∆ f b reduces genetic variability within each family, fewer lines per family must be phenotyped to find good ones. We discuss this in more detail in section 4.3.

356
Earlier branches were able to capitalize on genetic variation and multiple rounds of selec-357 tion to push means higher, especially in earlier years (Supplementary Figure S6). Generally, 358 sacrificing more plots to phenotype lines out of the RCRS resulted in better varieties, espe- ), but certainly warrants further investigation for rapid-cycle programs.

365
As a control, we also used some first year plots as information plots for the truncation 366 and optimal contribution breeding schemes. Neither of these schemes benefited significantly  can be seen in the recurrent population of the truncation scheme, which had the highest 394 mean value for the first 10 to 15 years ( Figure 5), yet failed to produce the best varieties.

395
The branching scheme was superior in varietal production during this period, despite having 396 a lower recurrent population mean, and continued to produce better varieties well after V g 397 was exhausted in the recurrent truncation scheme. Here, we chose to evaluate the success of 398 a breeding program not on the genetic value of its germplasm, but on its ability to output 399 superior products in both the short-and long-term. We believe most applied plant breed-400 ers would agree that this criterion is most appropriate. This emphasizes that population 401 improvement may not be the best indicator of a breeding program's performance, a metric 402 that has also been shown to be biased by environmental trends (Rutkoski 2019).  Figure S8), but did shift where selection occurs ( Figure 4A). Increasing ∆ f b 419 allowed more selection to occur in the branch, leading to a higher µ and lower σ for materials 420 entering the VDP. When σ is small, less is gained from increasing i, thus providing the room 421 to phenotype random inbred material out of the recurrent population. In smaller VDPs (i.e.

422
smaller i) that cannot effectively mine the tail if σ is large, pushing µ high in the branch 423 should be the most effective strategy.

424
In this study, we did a small grid search across ∆ f and ∆ f b , but in reality, the threshold 425 values that maximize product output will not be known, and may differ considerably depend-426 ing on the trait architecture. Branching also requires additional genotypes to be collected 427 and additional crosses to be made in the genomic selection portion of the program, so it is 428 not a zero sum gain; however, this is a small fraction of the total genotypic budget (25%,

429
11%, 5% increase for the small medium and large VDP respectively), as the lines that enter 430 the VDP comprise the bulk of the genotyping cost. The ability to reduce the VDP size while maintaining high gain, could help recover these costs, but may also reduce accuracy.

440
In the best branching scheme, most of the lines entering the VDP are never destined to 441 become potential products. While we stopped short of increasing the number of information 442 plots beyond 0.6f n, it may be that the vast majority of early VDP trials might be leveraged to 443 generate information, rather than select lines. This strategy may also be useful for recovering 444 genetic distances between genotypes and phenotypes introduced for other reasons. This could 445 include the movement of unrelated materials into the breeding program, which presents a 446 very similar problem: the genetic distance between newly introduced materials and the 447 phenotypic information is large relative to the current breeding materials.

448
On the surface, dedicating most of the early VDP plots solely for information seems 449 counter-intuitive, and we believe this type of strategy will be a hard sell to veteran breeders.

450
However, this highlights a potential future paradigm shift in how the VDP is constructed.

451
Instead of merely serving as a selection and validation tool, the VDP may be built to maxi- Year Figure S3: Variety means of an optimal contribution with branching (OCB) selection scheme compared to the traditional (TR) selection scheme, with no plots sacrificed to phenotype lines directly out of the recurrent population. Prediction accuracy Year Figure S4: A) Variety means, B) genetic value (line) and genetic standard deviation (shaded) of the recurrent population, and C) prediction accuracy of the recurrent population for three optimal contribution with branching (OCB) schemes, with ∆ f = 0.005 and ∆ f b = 0.1, compared to an optimal contribution (OC) and traditional selection schemes for three VDP sizes (f × n) across 30 years. Mean selection branches started at either 0, 1, or 2 cycles into the next years recurrent program. Cycle 3 does not branch, as it has reached the next year, and is equivalent to the OC scheme.  Figure S5: Effects of phenotyping materials directly out of the recurrent population (RCRS) on variety means of optimal contribution with branching (OCB and OCBpR) at cycle 0, for three VDP sizes (f × n) across 30 years. Between 0.0f n and 0.6f n first year trial plots were sacrificed to phenotype random materials directly out of the RCRS population. Year Proportion of Traditional Figure S6: Variety means of three branching schemes, with ∆ f = 0.005 and ∆ f b = 0.1, where 0.6f n plots were sacrificed to phenotype inbred lines pulled directly out of the recurrent population (RCRS), compared to an optimal contribution and traditional selection schemes for three VDP sizes (f × n) across 30 years. Mean selection branches started at either 0, 1, or 2 cycles into the next years recurrent program. Cycle 3 does not branch, as it has reached the next year, and is equivalent to the OC scheme. Year Figure S7: Effects of phenotyping materials directly out of the recurrent population (RCRS) on variety means of three breeding schemes A) recurrent truncation (RT) and B) optimal contribution (OC) for three VDP sizes (f × n) across 30 years. Between 0.0f n and 0.6f n first year trial plots were sacrificed to phenotype random materials directly out of the RCRS population.