Impact of mixing between parallel year groups on genomic prediction in Atlantic salmon breeding programmes under random selection

A commercial breeding programme in Atlantic salmon utilises a four-year generation interval with four parallel breeding populations. In this study, we develop a computer simulation of a salmon breeding programme and explore the impact of gene flow between the parallel year groups on the accuracy of genomic prediction within and between breeding lines. We simulated four parallel lines for 10 discrete generations with random selection and different mixing rates between parallel year groups. The genetic distance between fish (as a measure of diversity) and the accuracy of estimated genomic breeding values were used as criteria of comparison. With no mixing the genetic distance increased between populations, the genetic variation within populations decreased and there was no increase in accuracy when combining data across populations. Even a low percentage of mixing decreased the genetic distance between populations and increased the genetic variation within populations. The higher the percentage of mixing the faster the lines became more similar. The accuracy of prediction climbed as the percentage of mixing increased. The increase in accuracy from the combined evaluation approach compared to the within evaluation approach was greater with an increased percentage of mixing. In conclusion, if there is no gene flow between populations the lines drift apart and there is no value in combining information across populations for genomic breeding value prediction. Only a low amount of mixing between lines brings the lines closer together and facilitates the use of information across lines to improve breeding value prediction. Optimising gene flow between lines should be an integral part of salmon breeding programme design.

(5 males and 5 females) from line 2 will be used as candidates to create the next 1 3 5 breeding cycle of line 1. In each chromosome 100 segregating SNPs were randomly sampled to be used as 1 3 8 QTL controlling the trait, and 1000 SNPs were randomly selected to be used as  The simulated true breeding value (TBV) for an individual was calculated as: where m is the total number of QTL, g ij is the genotype score of the QTL j for 1 4 5 individual i, coded as 0,1,2 when the genotype is AA, AB and BB, respectively.

4 6
The genotype score is the number of the reference alleles (B) in the genotype and ܽ 1 4 7 is the additive effect for QTL j . The additive effects were sampled from a normal The phenotypic value, y i of individual i was obtained by adding a normally distributed the genetic value (TBV) equal to: The year effect was sampled as a random variable, uniformly distributed between 0 1 5 4 and 10, which is equivalent to a range of 1 phenotypic standard deviation. GEBV: where y is the vector of observations, X the design matrix for the vector of fixed 1 6 1 effects b (e.g., year cycle effect), Z is the incidence matrix that relates animal effects ) . The matrix G is the genomic relationship matrix calculated 1 6 6 using the VanRaden algorithm as follows: Where W is a centralised genotype matrix with rows as individuals and columns as new genomic relationship matrix was estimated, and the P i values were recalculated, 1 7 0 8 rather than re-using elements from a big across-population matrix. This genomic 1 7 1 relationship matrix is derived from allele frequencies as suggested by VanRaden and 1 7 2 the evaluation was done with animals from the same cycle (VanRaden 2008).

7 3
In order to explore different scenarios, the above model was applied by using data in 1 7 4 the three following ways: of that population (same line), referred as GBLUP-W. 2. Using data from one population (e.g., line 1) to calculate the GEBVs for the (phenotypes), referred as GBLUP-B. population (e.g., line 1), referred as GBLUP-C. rates were used in order to investigate how the genetic distance and the accuracy of  In order to explore the impact of different mixing rates between populations, we use 1 9 1 two criteria of comparison between the different scenarios: the genetic distance 1 9 2 between the individuals and the accuracy of estimated genomic breeding values. The results were based on 10 replicates for each tested rate of mixing and the 1 9 4 average of the replicates was reported. Genetic distance: To calculate the genetic distance within and between the 1 9 7 populations in each cycle, a genomic relationship matrix (GRM), that includes all the 1 9 8 individuals, was calculated at each cycle. An Eigen decomposition was done on the 1 9 9 GRM to obtain its Eigenvalues and Eigenvectors. Since the Eigenvectors are 2 0 0 orthogonal among themselves, they can be used to calculate the Euclidean distance 2 0 1 between two individuals.
is the value of the eigenvector i for animals x and y respectively, ߣ the 2 0 3 eigenvalue i and n is the total number of the eigenvectors with non-zero associated 2 0 4 eigenvalue.

0 5
The genetic distance is the average of all pairwise Euclidean distances between year difference and/or direct gene flow because of mixing) is referred to as 2 1 0 "Consecutive" and used only data from the lines in question, while the average 2 1 1 between individuals that belong to non-consecutive lines (i.e. with two years 2 1 2 difference and no direct gene flow between them) is referred to as "Non-2 1 3 Consecutive" (Figure 3). values (one estimate of accuracy for each line) is referred as GBLUP-C (Figure 3).

7
The evaluation of the combined data (GBLUP-C) will allow to estimate the impact of it uses jointly the four lines) and compare it with estimates when using data from the between individuals that belong to non-consecutive lines is referred to as "Non-2 3 7 Consecutive". The accuracy of the genomic was estimated based on three different  of individuals between the lines (Table A1, Appendix). and non-consecutive lines ("Non-Consecutive").

5 4
The average genetic distance, between all pairs of individuals within a line ("Within") 2 5 5 and between consecutive ("Consecutive") and non-consecutive lines ("Non-2 5 6 Consecutive"), was calculated for 10 cycles with no mixing between the lines. The  The average accuracy of genomic prediction, for the 10 replicates, was calculated under three different schemes as described above (Materials and Methods, Figure   2 8 0 3). The GBLUP-W and GBLUP-C accuracy increased through the breeding 2 8 1 programme and at cycle 10 they had increased by 8.05% (from 0.641 in C1 to 0.698 2 8 2 in C10) and 8.30% (from 0.644 in C1 to 0.702 in C10) respectively but there was not 2 8 3 a significant difference in accuracy between the GBLUP-W and GBLUP-C schemes.

8 4
The accuracy of the GBLUP-B scheme remained very low, close to zero through the 2 8 5 10 cycles ( Figure 5). Several rates (4, 10, 20, 30, 40 and 50%) of mixing individuals between the four lines 2 8 8 were simulated and studied for 10 cycles. Different mixing rates were used to test how different gene flows affect genetic distance and accuracy of prediction. presented for all pairs of individuals within a line ("Within"), between consecutive 2 9 7 ("Consecutive") lines and non-consecutive lines ("Non-Consecutive").

9 8
The average genetic distance was calculated for each mixing rate scenario between 2 9 9 individuals of the same line ("Within") and between individuals of different lines 3 0 0 ("Consecutive" and "Non-consecutive") ( Figure 6). A low mixing rate (4%) increased wider with a lower than with a higher mixing rate after 10 cycles.  The points indicate the overall average value and the vertical bars the range of (GBLUP-C).

1
As the percentage of mixing increased, the accuracy of GBLUP-B increased at a after 10 cycles.

8
The trend in accuracy of each scheme across the different mixing rates through the 3 3 9 10 cycles is determined by the data used by each model (Figure A1, Appendix). We for all mixing rates and this scheme provides greater accuracy compared to GBLUP-  In this study, we investigate by simulation the impact of various mixing rates on the Atlantic salmon breeding scheme. Thus, the main objective was to investigate how 3 5 2 much the accuracy of genomic prediction can be increased by using data from  Our results show that with no mixing, throughout the programme, the genetic 3 5 6 distance increased between the lines and decreased within the lines (Figure 4).

5 7
Hence, the genetic diversity within the lines decreased and the differentiation 3 5 8 between the lines increased. Therefore, the accuracy of genomic prediction between