A similarity matrix for preserving haplotype diversity 1 among parents in genomic selection

Abstract


Introduction
varieties of haplotypes in play over succeeding generations to avoid unintentional losses of 8 6 genetic variability and to allow new favorable haplotypes to emerge through recombination in 1 0 2 0 1  where w is the index of the segregation pattern, p is the probability of the segregation 2 1 6 pattern w , and 1 w g is the additive value of gamete transmitted from the first parent. Following the same procedure, the MSV for the second parent is calculated as Similarly, the probability distribution of segregation patterns allows us to compute the 2 2 0 covariance between the additive genetic values of matching gametes from both parents. This Enumerating all segregation patterns and probabilities of occurrence becomes difficult with 2 2 5 hundreds to thousands of markers. However, using the previously mentioned equivalent was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 In our example (Table 1), the first haplotypes of both parents give rise to parent- is independent of haplotype order. Therefore, a unique similarity measure valid for any For a genome with several chromosomes, the similarity is calculated as the sum of absolute contributions from all chromosomes the parent-specific additive marker effects vector for parent i ( j ) on the same chromosome. An individual's similarity to itself (i.e., i j = ) equals that individual's MSV. Therefore, it is 2 5 3 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ; 1 3 possible to assemble a trait-specific similarity matrix S with MSVs for each parent on the 2 5 4 diagonal and pairwise similarities , i j s as off-diagonal elements for any set of individuals.

5 5
We emphasize that expectations are based on the univariate conditional distribution of 2 5 6 segregation patterns rather than the bivariate joint distribution of gametes from two parents. The latter would result in a zero covariance because of the independence of the Mendelian 2 5 8 sampling processes in different individuals. When considering multiple traits, there is also a similarity between potential parents i and j 2 6 2 with respect to the aggregate genotype. Each element ij s of the respective similarity matrix is given by the following equation: We propose computing the MSV of progeny produced by parents i and j in the same way as 2 6 9 described in Eq. (4): Mendelian covariance matrix, and MSCs between traits as off-diagonals. Eq. (12) can be extended to multiple chromosomes 2 8 1 as follows: Since ij V is independent of the chosen order of haplotypes, the MSV for the aggregate 2 8 4 genotype is unequivocally given by the following quadratic form: Similarities between zygotes of two pairs of parents 2 8 8 A common breeding interest is determining the optimal parents for mating and the number of 2 8 9 mates to assign to each parent, which can be determined using a gametic similarity matrix.
However, in some cases, such as multiple ovulation and embryo transfer, determining the 2 9 1 optimal parent pairs or the optimal number of offspring that particular parent pairs should 2 9 2 produce is of interest. Consequently, a measure of the similarity between the genetic values 2 9 3 of zygotes produced by two parent pairs can be useful. Similar to gametes, this measure can 2 9 4 also be calculated from the probability distribution (see B.1 in File S1 for the derivation). This similarity can also be expressed in matrix notation, which is much more convenient for 2 9 6 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ; zygotes produced by parent pairs ij and uv in the case of a single chromosome is: chromosome by chromosome and then summing them: provided in File S1 (B.1).

Similarities for multiple traits and the aggregate genotype
For two pairs of parents ij and u v , the similarity measure , ij uv s for the aggregate genotype, and v , becomes Then, the MSVs of the aggregate genotype define the diagonal elements, and the similarities 3 1 7 , ij uv s define the off-diagonal elements of the similarity matrix S between pairs of parents.

1 8
See Appendix 3 for the extension of the similarity measure to monoecious species and the 3 1 9 order of parents in dioecious organisms. The similarity matrix S of either gametes or zygotes can be standardized by pre-and post- diagonal elements:  Potential application of the similarity matrices for optimum mate allocation The similarity matrices may be used to optimize mate allocation (OMA) such that optimal . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made To demonstrate the use of the similarity matrix for hedging haplotype diversity, we simulated 3 6 4 a base population of 1,000 cattle (500 males and 500 females) with 10 chromosomes using  Then, we selected and mated parents based on various selection schemes (Table 2)    offspring, sexes were systematically assigned (i.e., male, female, male, etc.).

8 9
For OMA, we derived t S for all potential parents in each generation t using the varying constraints on average haplotype similarity 1 t Q + (see Table 2) by solving the and LCC 2022). The constraints were selected based on the 60%, 40%, and 30% quantiles of    All statistical analyses were performed using programs written by the authors and   . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ; https://doi.org/10.1101/2023.06.01.543227 doi: bioRxiv preprint 1

2 5
We implemented a computationally faster approach for estimating MSV, building on the 4 2 6 method described by Bonk et al. (2016). We benchmarked the time required to compute 4 2 7 MSVs for 265 parents with 10,304 markers for a single trait (File S2) and found our approach Furthermore, the use of parent-specific i m vectors was essential for measuring the similarity   The diagonal elements of each matrix were the MSV of each parent, and the off-  The effects of heterozygosity and trait genetic architecture were observed in  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ; 2 2 6 matrix significantly preserves genetic variability without negatively affecting genetic gain in 4 9 9 the long term.

0 0
To better understand the effects of similarity matrices on genetic gain and variance, we 5 0 1 quantified favorable QTL allele loss, SNP allele loss, the frequency of favorable alleles, and 5 0 2 the number of selected sires (Figures 5 and S8). The OMA schemes involving the 5 0 3 standardized similarity matrix preserved significantly more favorable QTL alleles than TS beyond those of their TS counterparts ( Figure 5B). In addition, the OMA schemes retained 5 0 7 significantly more SNPs than the TS schemes ( Figure 5C), reflecting the selection and drift alleles than TS in the index ( Figure 8C). The TS on BV resulted in a 1% higher rate of 5 1 0 inbreeding than the TS on the index, which had an 8 -26% higher rate of inbreeding than 5 1 1 OMA schemes ( Figure 5D). Although less pronounced, the unstandardized matrix similarly 5 1 2 affected these quantities ( Figures S9 and S10 in File S3).

1 3
As the constraint on the average haplotype similarity of the parents increased, the 5 1 4 number of selected sires for mating increased (Table S3 in File S3). Consequently, the 5 1 5 diversifying effects on haplotypes in a group of selected parents also increased, preserving 5 1 6 more favorable haplotypes. On average, OMA using the standardized similarity matrix 5 1 7 selected significantly more sires than the non-standardized version. presented in Figure S6 in File S3. Results are reported for 100 simulation runs.  . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023   The novel similarity matrix described in this paper fills a crucial methodological gap by the practical relevance of the approach. Notably, the derived similarity measure has a 5 5 1 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023. ; https://doi.org/10.1101/2023.06.01.543227 doi: bioRxiv preprint 3 0 similarities for all traits in the empirical data of the present study ranged almost across the 5 7 7 entire possible parameter space ( Figure 2). However, we expect the species and populations sampling values should be investigated in future studies.

8 0
To demonstrate the application of the similarity matrices in hedging haplotype the expense of lower short-term genetic gain (Figure 3 and Figure S6 in File S3), was also frequency of favorable alleles, OCS may not efficiently maximize short and mid-term genetic 6 2 7 gain and could accumulate deleterious alleles in the genome, thereby reducing population 6 2 8 fitness (de Cara et al. 2013). Consequently, a similarity matrix indicating the degree to which 6 2 9 parents share heterozygous QTL segments used in OMA schemes may be a more appropriate 6 3 0 similarity measure when selection aims to maximize genetic gain while preserving genetic 6 3 1 diversity.

3 2
To test this hypothesis, we compared our results with those of optimizations involving 6 3 3 GRM, henceforth referred to as OCS schemes. We developed additional selection schemes 6 3 4 with t Q as GRM and constraints of 1% and 0.5% on the average rate of inbreeding (see Eq. placed such that a minimum of 5 males would be selected for mating. As with OMA 6 3 7 schemes, only males were optimized in OCS schemes, while females were selected based on 6 3 8 either TS on BV or index. We found that the OCS schemes generally resulted in greater long-6 3 9 term genetic gain and preserved more genetic variability than the OMA schemes at the  variance, we added a 0.5% constraint on the average inbreeding rate to OMA schemes, i.e., and preserved more variability relative to the OMA schemes. It also preserved equal or 6 4 7 greater long-term genetic variance relative to OCS schemes ( Figure S11C and D in File S3),  (Table S4 in File S3).

4 9
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted June 5, 2023.