Epistasis and covariance: how gene interaction translates into genomic relationship

Martini, Johannes W. R.; Wimmer, Valentin; Erbe, Malena; Simianer, Henner

doi:10.1007/s00122-016-2675-5

Epistasis and covariance: how gene interaction translates into genomic relationship

Original Article
Published: 16 February 2016

Volume 129, pages 963–976, (2016)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Johannes W. R. Martini¹,
Valentin Wimmer²,
Malena Erbe^1,3 &
…
Henner Simianer¹

1724 Accesses
45 Citations
2 Altmetric
Explore all metrics

Abstract

Key message

Models based on additive marker effects and on epistatic interactions can be translated into genomic relationship models. This equivalence allows to perform predictions based on complex gene interaction models and reduces computational effort significantly.

Abstract

In the theory of genome-assisted prediction, the equivalence of a linear model based on independent and identically normally distributed marker effects and a model based on multivariate Gaussian distributed breeding values with genomic relationship as covariance matrix is well known. In this work, we demonstrate equivalences of marker effect models incorporating epistatic interactions and corresponding mixed models based on relationship matrices and show how to exploit these equivalences computationally for genome-assisted prediction. In particular, we show how models with epistatic interactions of higher order (e.g., three-factor interactions) translate into linear models with certain covariance matrices and demonstrate how to construct epistatic relationship matrices for the linear mixed model, if we restrict the model to interactions defined a priori. We illustrate the practical relevance of our results with a publicly available data set on grain yield of wheat lines growing in four different environments. For this purpose, we select important interactions in one environment and use this knowledge on the network of interactions to increase predictive ability of grain yield under other environmental conditions. Our results provide a guide for building relationship matrices based on knowledge on the structure of trait-related gene networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Phenotype Prediction Under Epistasis

Bayesian reversible-jump for epistasis analysis in genomic studies

Article Open access 09 December 2016

Marcio Balestre & Claudio Lopes de Souza Jr

Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions

Article Open access 09 November 2015

Gustavo de los Campos, Yogasudha Veturi, … Paulino Pérez-Rodríguez

References

Abdollahi-Arpanahi R, Pakdel A, Nejati-Javaremi A, Moradi Shahrbabak M, Morota G, Valente BD, Kranis A, Rosa GJM, Gianola D (2014) Dissection of additive genetic variability for quantitative traits in chickens using SNP markers. J Anim Breed Genet 131(3):183–193
Article CAS PubMed Google Scholar
Clifford D, McCullagh P (2006) The regress function. R News 6(2):10
Google Scholar
Clifford D, McCullagh P (2014) The regress package. R package version 1.3-14
Cockerham CC (1954) An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39(6):859–882
CAS PubMed PubMed Central Google Scholar
Crossa J, de Los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun HJ (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724
Article CAS PubMed PubMed Central Google Scholar
de los Campos G, Perez-Rodriguez P (2014) BGLR: Bayesian Generalized Linear Regression. R package version 1.0.3. http://CRAN.R-project.org/package=BGLR
Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. England, Benjamin Cummings
Google Scholar
Gianola D, Rosa GJM (2015) One hundred years of statistical developments in animal breeding. Annu Rev Anim Biosci 3:19–56
Article PubMed Google Scholar
Gianola D, Morota G, Crossa J (2014) Genome-enabled prediction of complex traits with kernel methods: What have we learned?. In: Proceedings, 10th World Congress of Genetics Applied to Livestock Production
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177(4):2389–2397
CAS PubMed PubMed Central Google Scholar
Hallgrímsdóttir IB, Yuster DS (2008) A complete classification of epistatic two-locus models. BMC Genet 9(1):17
Article PubMed PubMed Central Google Scholar
Hayes BJ, Visscher PM, Goddard ME (2009) Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res 91(1):47–60
Article CAS Google Scholar
He D, Wang Z, Parida L (2015) Data-driven encoding for quantitative genetic trait prediction. BMC Bioinform 16(Suppl 1):S10
Article Google Scholar
Henderson CR (1984) Application of linear models in animal breeding. University of Guelph, Guelph
Google Scholar
Henderson CR (1985) Best linear unbiased prediction of nonadditive genetic merits in noninbred populations. J Anim Sci 60(1):111–117
Google Scholar
Henderson CR, Quaas RL (1976) Multiple trait evaluation using relatives records. J Anim Sci 43:1188
Google Scholar
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2):423–447
Article CAS PubMed Google Scholar
Hill WG, Goddard ME, Visscher PM (2008) Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet 4(2):e1000008
Article PubMed PubMed Central Google Scholar
Hu Z, Li Y, Song X, Han Y, Cai X, Xu S, Li W (2011) Genomic value prediction for quantitative traits under the epistatic model. BMC Genet 12:15
Article PubMed PubMed Central Google Scholar
Jiang Y, Reif JC (2015) Modelling epistasis in genomic selection. Genetics 201:759–768. doi:10.1534/genetics.115.177907
Article CAS PubMed Google Scholar
Kempthorne O (1954) The correlation between relatives in a random mating population. In: Proceedings of the Royal Society of London. Series B-Biological Sciences 143, vol 910, pp 103–113
Mackay TFC (2013) Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet 15:22–33. doi:10.1038/nrg3627
Article PubMed PubMed Central Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
CAS PubMed PubMed Central Google Scholar
Morota G, Gianola D (2014) Kernel-based whole-genome prediction of complex traits: a review. Front Genet 5:363. doi:10.3389/fgene.2014.00363
PubMed PubMed Central Google Scholar
Morota G, Koyama M, Rosa GJM, Weigel KA, Gianola D (2013) Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data. Genet Sel Evol 45:17
Article PubMed PubMed Central Google Scholar
Muñoz PR, Resende MFR, Gezan SA, Resende MDV, de los Campos G, Kirst M, Huber D, Peter GF (2014) Unraveling additive from nonadditive effects using genomic relationship matrices. Genetics 198(4):1759–1768
Article PubMed PubMed Central Google Scholar
Ober U, Huang W, Magwire M, Schlather M, Simianer H, Mackay TFC (2015) Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait. PLoS One 10(5):e0126880. doi:10.1371/journal.pone.0126880
Article PubMed PubMed Central Google Scholar
Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:209–228
Article Google Scholar
R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book Google Scholar
Strandén I, Christensen OF (2011) Allele coding in genomic evaluation. Genet Sel Evol 43:25
Article PubMed PubMed Central Google Scholar
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLOS One 7(9):e45293
Article CAS PubMed PubMed Central Google Scholar
Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melchinger AE (2014) Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics 197(4):1343–1355
Article PubMed PubMed Central Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423
Article CAS PubMed Google Scholar
Varona L, Vitezica ZG, Munilla S, Legarra A (2014) A general approach for calculation of genomic relationship matrices for epistatic effects. In: Proceedings, 10th World Congress of Genetics Applied to Livestock Production
Wang D, El-Basyoni IS, Baenziger PS, Crossa J, Eskridge KM, Dweikat I (2012) Prediction of genetic values of quantitative traits with epistatic effects in plant breeding populations. Heredity 109(5):313–319
Article CAS PubMed PubMed Central Google Scholar
Wittenburg D, Melzer N, Reinsch N (2011) Including non-additive genetic effects in Bayesian methods for the prediction of genetic values based on genome-wide markers. BMC Genet 12:74
Article PubMed PubMed Central Google Scholar
Zeng Z, Wang T, Zou W (2005) Modeling quantitative trait loci and interpretation of models. Genetics 169(3):1711–1725
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

The authors thank Daniel Gianola and another unknown reviewer for helpful suggestions. The comments helped to improve the manuscript immensely. JWRM thanks KWS SAAT SE for financial support and Camila Fabre Sehnem for helpful discussions.

Author information

Authors and Affiliations

Department of Animal Sciences, Animal Breeding and Genetics Group, Georg-August University, Göttingen, Germany
Johannes W. R. Martini, Malena Erbe & Henner Simianer
KWS SAAT SE, Einbeck, Germany
Valentin Wimmer
Institute of Animal Breeding, Bavarian State Research Centre for Agriculture, Grub, Germany
Malena Erbe

Authors

Johannes W. R. Martini
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Wimmer
View author publications
You can also search for this author in PubMed Google Scholar
Malena Erbe
View author publications
You can also search for this author in PubMed Google Scholar
Henner Simianer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes W. R. Martini.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical standards

This manuscript constitutes a first submission to a scientific journal and neither the entire manuscript nor any part of its content has been published or has been accepted by another journal.

Additional information

Communicated by J. Crossa.

Supporting information

Dominance in this marker effect model and in classical quantitative genetics

Different ways of incorporating dominance into effect models exist in literature (see, e.g., Zeng et al. 2005). The most famous approach uses a linear regression of the three genotypic values of a diploid organism and adds a dominance term $d_\mathrm{aa}, d_\mathrm{Aa}, d_\mathrm{AA}$ which describes the difference between the linear fit and the respective genotypic value (Falconer and Mackay 1996). Figure 4 compares this standard approach and the way dominance is modeled by the pair epistasis and dominance model (a fit by a polynomial of degree two). The scheme illustrates that the linear coefficient of the polynomial is not necessarily identical to the coefficient of the linear regression. Moreover, the term $h_{k,k}$ which is the second coefficient of the polynomial regression cannot be identified with $d_{\bullet \bullet }$. The parameterization of the two approaches is different. Analogously, the parameterization of the interactions in a two-locus model differs from the classical subdivision into additive by additive, additive by dominance and dominance by dominance effects. Regarding the effects of the pairs in a two-locus model, the framework used in this work does not give the freedom to choose any arbitrary effect for each allele combination of the two loci: The nine possible combinations of the states of two loci of a diploid organism (each with two alleles) are parameterized by less than nine parameters. Moreover, since the additive parameter of marker j is also present in all other two-loci effect models of j with any other locus k, the individual two-locus models are not independent but connected. For more information on two-locus models and their different parametrization, see the work of Hallgrímsdóttir and Yuster (Hallgrímsdóttir and Yuster 2008).

Derivation of the statistical equivalence of Eq. (3) and Eqs. (6, 7)

We consider the distribution of $\mathbf {y}$. The vector $\mathbf {1}\mu$ is nonrandom (fixed effect), the second summand of Eq. (3) translates into $\mathbf {g}$ of Eq. (6) which has a multivariate Gaussian distribution (with mean zero and covariance matrix $\sigma ^2_{\beta }\mathbf {MM}')$. The vector of errors is multivariate Gaussian, too. What has to be considered in more detail, is the third summand of Eq. (3)

$$\begin{aligned} \sum \limits _{k=1}^{p}\sum \limits _{j=k+1}^{p} M_{i,j} M_{i,k} h_{j,k}. \end{aligned}$$

We rewrite this equation for all n genotypes simultaneously in matrix notation

$$\begin{aligned} \underbrace{\begin{pmatrix} M_{1,1} M_{1,2} & M_{1,1} M_{1,3} & \cdots & M_{1,1} M_{1,p} & M_{1,2} M_{1,3} & \cdots & M_{1,p-1} M_{1,p} \\ M_{2,1} M_{2,2} & M_{2,1} M_{2,3} & \cdots & M_{2,1} M_{2,p} & M_{2,2} M_{2,3} & \cdots & M_{2,p-1} M_{2,p} \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\ M_{n,1} M_{n,2} & M_{n,1} M_{n,3} & \cdots & M_{n,1} M_{n,p} & M_{n,2} M_{n,3} & \cdots & M_{n,p-1} M_{n,p}\\ \end{pmatrix}}_{=\mathbf {N}} \begin{pmatrix} h_{1,2} \\ h_{1,3} \\ \vdots \\ h_{1,p}\\ h_{2,3} \\ \vdots \\ h_{p-1,p} \\ \end{pmatrix}, \end{aligned}$$

(19)

and name the left hand matrix $\mathbf {N}$. This presentation shows that the third summand of the covariance matrix analog of Eq. (3) is also multivariate Gaussian distributed with mean zero and covariance matrix $\mathbf {N N}' \sigma ^2_h$ (according to a definition of the multivariate Gaussian distribution, since the interactions are assumed to be i.i.d. Gaussian distributed random variables). To see which structure the covariance has, we compare $\mathbf {NN}'$ to $\mathbf {MM}'$ by regarding the entries $(\mathbf{NN'})_{i,l}$ and $(\mathbf{MM'})_{i,l}$.

$$\begin{aligned} (\mathbf{NN'})_{i,l}=\sum \limits _{k=1,j>k}^p M_{i,k}M_{i,j}M_{l,k}M_{l,j}=\sum \limits _{k=1}^p \left( M_{i,k} M_{l,k} \sum \limits _{j>k}^p M_{i,j} M_{l,j} \right) \end{aligned}$$

(20)

and

$$\begin{aligned} (\mathbf{MM'})_{i,l}=\sum \limits _{k=1}^p M_{i,k}M_{l,k}. \end{aligned}$$

(21)

Eqs. (20) and (21) show that the structure of the additional covariance matrix $\mathbf {NN}'$ of the epistasis model resembles $\mathbf {G}$, but the summands of each entry are weighted by products of entries of matrix $\mathbf {M}$.

Derivation of the statistical equivalence of Eq. (4) and Eqs. (6, 8)

Analogously, to the procedure we applied to the first model, we regard the third term

$$\begin{aligned} \sum \limits _{j=1}^{p}\sum \limits _{k=1}^{p} M_{i,j}M_{i,k} h_{j,k}, \end{aligned}$$

and write

$$\begin{aligned} { \underbrace{\begin{pmatrix} M_{1,1}^2 & M_{1,1} M_{1,2} & \cdots & M_{1,1} M_{1,p} & M_{1,2} M_{1,1} & M_{1,2}^2 & \cdots & M_{1,p}^2 \\ M_{2,1}^2 & M_{2,1} M_{2,2} & \cdots & M_{2,1} M_{2,p} & M_{2,2} M_{2,1} & M_{2,2}^2 & \cdots & M_{2,p}^2 \\ \cdots & \cdots & \cdots & \cdots & \cdots & \cdots & \cdots \\ M_{n,1}^2 & M_{n,1} M_{n,2} & \cdots & M_{n,1} M_{n,p} & M_{n,2} M_{n,1} & M_{n,2}^2 & \cdots & M_{n,p}^2 \\ \end{pmatrix}}_{=\mathbf {Q}} \begin{pmatrix} h_{1,1} \\ h_{1,2} \\ \vdots \\ h_{1,p}\\ h_{2,1} \\ h_{2,2} \\ \vdots \\ h_{p,p} \\ \end{pmatrix}} \end{aligned}$$

(22)

where the letter $\mathbf {Q}$ is defined to be the left hand matrix. Thus, we are interested in $(\mathbf{QQ'})_{i,l}$ which is given by

$$\begin{aligned} (\mathbf{QQ'})_{i,l}= \begin{pmatrix} M_{i,1}^2&M_{i,1} M_{i,2}&\cdots&M_{i,1} M_{i,p}&M_{i,2} M_{i,1}&M_{i,2}^2&\cdots&M_{i,p}^2 \end{pmatrix} \begin{pmatrix} M_{l,1}^2 \\ \\ M_{l,1} M_{l,2} \\ \\ \vdots \\ \\ M_{l,1} M_{l,p} \\ \\ M_{l,2} M_{l,1} \\ \\ M_{l,2}^2 \\ \\ \vdots \\ \\ M_{l,p}^2 \end{pmatrix} =\sum \limits _{k,j=1}^p M_{i,k}M_{i,j}M_{l,k}M_{l,j} = \left( \sum \limits _{k=1}^p M_{i,k} M_{l,k} \right) ^2 \end{aligned}$$

(23)

which means that $\mathbf {QQ}'$ represents the Hadamard product $\mathbf {MM}' \circ \mathbf {MM}'$.

Proof of Eq. (9)

From Eq. (20) we know that

$$\begin{aligned} 2 (\mathbf{NN'})_{i,l}=2 \sum \limits _{k=1}^{p} \left( M_{i,k} M_{l,k} \sum \limits _{j > k}^p M_{i,j} M_{l,j}\right) \mathop {=}\limits ^{*}\sum \limits _{k=1}^{p} \left( M_{i,k} M_{l,k} \sum \limits _{j \ne k}^p M_{i,j} M_{l,j}\right) = \left(\mathbf{ MM'} \circ \mathbf{MM'} \right) _{i,l} -\sum \limits _{k=1}^p {M_{i,k}M_{l,k}M_{i,k}M_{l,k}}=\left(\mathbf{ MM'} \circ \mathbf{MM'} \right) _{i,l} -\left( (\mathbf{M}\circ\mathbf{ M}) (\mathbf{M \circ M})'\right) _{i,l} \end{aligned}$$

To see equality $*$, consider the left side as sum over products defined by all tuples $\{(k,j)|j>k\}$ with fixed i, l. Since j and k can be exchanged with each other, this is equal to the sum over all products defined by $\{(k,j)|j<k\}$. Thus, multiplying the sum defined by the tuples $\{(k,j)|j>k\}$ by 2 equals adding the sum defined by $\{(k,j)|j<k\}$.

Derivation of Eq. (5)

We maximize

$$\begin{aligned} F(\mu ,\mathbf {g},\mathbf {y}_\mathrm{train}):{=}- \left( \left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf {y}_\mathrm{test} \end{pmatrix}- \ \mathbf {1} \mu -\mathbf {g}\right) ' \mathbf {I}_n \frac{1}{\sigma ^2_\epsilon } \left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf {y}_\mathrm{test} \end{pmatrix}- \ \mathbf {1} \mu -\mathbf {g} \right) + \mathbf {g}' \frac{1}{\sigma _\beta ^2} \mathbf {G}^{-1} \mathbf {g} \right) \end{aligned}$$

with respect to $\mu$, $\mathbf {g}$ and $\mathbf {y}_\mathrm{test}$ by calculating the partial derivatives and the corresponding zeros. Let m denote the number of genotypes in the test set. Thus, we have to solve

$$\begin{aligned} (i) \qquad \frac{\partial F}{\partial \mu } \text{ gives: } \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf {1} _{m} \end{pmatrix} '\left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \hat{\mathbf {y}}_\mathrm{test} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf {1} _{m} \end{pmatrix} \hat{\mu } - \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}\right) =\mathbf{0} \end{aligned}$$

$$\begin{aligned} (ii) \qquad \frac{\partial F}{\partial \mathbf {g}} \text{ gives: } \left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \hat{\mathbf {y}}_\mathrm{test} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf {1} _{m} \end{pmatrix} \hat{\mu } - \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix} \right) - \sigma _\epsilon ^2 \left( \frac{1}{\sigma ^2_\beta } \mathbf {G}^{-1}\right) \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}=\mathbf{0} \end{aligned}$$

$$\begin{aligned} (iii) \qquad \frac{\partial F}{\partial \mathbf {y}_\mathrm{test}} \text{ gives: } \begin{pmatrix} \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf {I}_m \end{pmatrix}\left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \hat{\mathbf {y}}_\mathrm{test} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf {1} _{m} \end{pmatrix} \hat{\mu } - \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix} \right) =\mathbf{0} \end{aligned}$$

$\mathbf {1}_i$ here denotes the vector of length i with every entry equal to 1 and $\mathbf {I}_{m}$ the m-dimensional identity matrix. Eq. (iii) can be rewritten to $\hat{\mathbf {y}}_\mathrm{test}=\hat{\mathbf {g}}_\mathrm{test} + \mathbf {1}_{m} \hat{\mu }$. Plugging this into Eq. (i) gives $\hat{\mu }= \mathbf {1}_{n-m}'(\mathbf {y}_\mathrm{train}-\hat{\mathbf {g}}_\mathrm{train})(n-m)^{-1}$. Using the rewritten version of (iii) in (ii) gives

$$\begin{aligned} (iii) \text{ in } (ii): \qquad &\left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} + \mathbf {1}_m\hat{\mu } \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf {1} _{m} \end{pmatrix} \hat{\mu } - \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix} \right) - \sigma _\epsilon ^2 \left( \frac{1}{\sigma ^2_\beta } \mathbf {G}^{-1}\right) \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}=\mathbf{0} \\ &\Longleftrightarrow \\ &\left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf{0} \end{pmatrix} \hat{\mu } - \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} \right) - \sigma _\epsilon ^2 \left( \frac{1}{\sigma ^2_\beta } \mathbf {G}^{-1}\right) \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}=\mathbf{0} \\ &\Longleftrightarrow \\ &\left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{n-m} \\ \mathbf{0} \end{pmatrix} \hat{\mu } \right) - \left( \underbrace{ \mathbf {I}_n - \begin{pmatrix} \mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf {I}_m \end{pmatrix}}_{=\mathbf {T}_\mathrm{train}}+ \sigma _\epsilon ^2 \left( \frac{1}{\sigma ^2_\beta } \mathbf {G}^{-1}\right) \right) \begin{pmatrix}\hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}=\mathbf{0} \end{aligned}$$

The latter equivalence uses the equality $\begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \mathbf{0} \end{pmatrix}=\mathbf {T}_\mathrm{train}\begin{pmatrix}\hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}$. The left summand can be rewritten to

$$\begin{aligned} \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{s} \\ \mathbf{0} \end{pmatrix} \hat{\mu } = \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{s} \\ \mathbf{0} \end{pmatrix} \mathbf {1}_{s}'(\mathbf {y}_\mathrm{train}-\hat{\mathbf {g}}_\mathrm{train})s^{-1} = \begin{pmatrix}\mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {J} _{s \times s} \\ \mathbf{0} \end{pmatrix} (\mathbf {y}_\mathrm{train}-\hat{\mathbf {g}}_\mathrm{train})s^{-1}=\begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{s} \bar{y}_\mathrm{train}\\ \mathbf{0} \end{pmatrix} +\begin{pmatrix} \mathbf {J} _{s \times s} \\ \mathbf{0} \end{pmatrix} \hat{\mathbf {g}}_\mathrm{train}s^{-1}. \end{aligned}$$

by plugging in the rewritten version of (i) into (ii), using $s=n-m$ for the number of genotypes of the training set, and defining the empirical mean $\bar{y}_\mathrm{train}=s^{-1}\sum \mathbf {y}_\mathrm{train}$. Moreover, writing $\hat{\mathbf {g}}_\mathrm{train} = (\mathbf {I}_{s},\mathbf{0}) \begin{pmatrix} \hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}$, gives

$$\begin{aligned} \left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{s} \bar{y}_\mathrm{train}\\ \mathbf{0} \end{pmatrix} \right) - \left( \mathbf {T}_\mathrm{train} -s^{-1} \begin{pmatrix} \mathbf {J}_{s \times s } & \mathbf{0} \\ \mathbf{0}& \mathbf{0} \end{pmatrix} + \sigma _\epsilon ^2 \left( \frac{1}{\sigma ^2_\beta } \mathbf {G}^{-1}\right) \right) \begin{pmatrix}\hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}=\mathbf{0} \end{aligned}$$

and thus

$$\begin{aligned} \begin{pmatrix}\hat{\mathbf {g}}_\mathrm{train} \\ \hat{\mathbf {g}}_\mathrm{test} \end{pmatrix}= \left( \mathbf {T}_\mathrm{train} - s^{-1} \begin{pmatrix} \mathbf {J}_{s \times s } & \mathbf{0} \\ \mathbf{0}& \mathbf{0} \end{pmatrix} + \sigma _\epsilon ^2 \left( \frac{1}{\sigma ^2_\beta } \mathbf {G}^{-1}\right) \right) ^{-1} \times \left( \begin{pmatrix} \mathbf {y}_\mathrm{train} \\ \mathbf{0} \end{pmatrix} - \begin{pmatrix} \mathbf {1} _{s} \bar{y}_\mathrm{train}\\ \mathbf{0} \end{pmatrix} \right) \end{aligned}$$

which represents Eq. (5).

Proof of the statement of “Equivalent linear models for higher order polynomial functions of the markers”

We know that the statement is true for $D=2$. What has to be shown is that if it is true for $D=j-1$ then it is also true for $D=j$ (mathematical induction). Let j be the degree. Analogously to Eq. (22), we consider the matrix $\mathbf {Q}^{(j)}$ which has all products of j factors in the respective row of matrix $\mathbf {M}$ as entries. The i-th row $\mathbf {Q}^{(j)}_{i,\bullet }$ of matrix $\mathbf {Q}^{(j)}$ can be written as

$$\begin{aligned} \mathbf {Q}^{(j)}_{i,\bullet }= \mathbf {M}_{i,\bullet } \otimes \mathbf {Q}^{(j-1)}_{i,\bullet } \end{aligned}$$

if the $\beta _{\varvec{\kappa },j}$ is ordered appropriately ($\otimes$ denotes the Kronecker product). Then the (i, l)-th entry of $\mathbf {Q}^{(j)} {\mathbf {Q}^{(j)}}^\prime$ is the matrix product

$$\begin{aligned} \left( \mathbf {Q}^{(j)} {\mathbf {Q}^{(j)}}^\prime \right) _{i,l}= \left( \mathbf {M}_{i,\bullet } \otimes \mathbf {Q}^{(j-1)}_{i,\bullet } \right) \left( \mathbf {M}_{l,\bullet } \otimes \mathbf {Q}^{(j-1)}_{l,\bullet } \right) ^\prime \end{aligned}$$

which is equal to

$$\begin{aligned} \left( \mathbf {M}_{i,\bullet } \otimes \mathbf {Q}^{(j-1)}_{i,\bullet } \right) \left( \left( \mathbf {M}_{l,\bullet }\right) ^{\prime } \otimes {\left( \mathbf {Q}^{(j-1)}_{l,\bullet }\right) }^{\prime } \right) = \underbrace{\left( \mathbf {M}_{i,\bullet } \left( \mathbf {M}_{l,\bullet } \right) ^{\prime } \right) }_{=G_{i,l}} \otimes \underbrace{\left( \mathbf {Q}^{(j-1)}_{i,\bullet } {\left( \mathbf {Q}^{(j-1)}_{l,\bullet } \right) }^{\prime } \right) }_{=G_{i,l}^{j-1}}=G_{i,l}^j, \end{aligned}$$

according to the calculation rules for the Kronecker product and matrix multiplication and the induction hypothesis.

Proof of Eq. (17)

Assuming one interaction between marker positions k and j leads to the equation

$$\begin{aligned} \mathbf{y}= \mathbf {1}\mu + \mathbf{M}\varvec{\beta} + \underbrace{\begin{pmatrix} M_{1,k}M_{1,j} \\ M_{2,k}M_{2,j} \\ \vdots \\ M_{n,k}M_{n,j} \end{pmatrix}}_{=\mathbf {R}} h_{k,j} + \varvec{\epsilon} , \end{aligned}$$

where we call the vector $\mathbf {R}$. According to the calculation rules for multivariate normal distributions, the covariance matrix of the corresponding interaction effect $\mathbf {g_2}$ is $\sigma _h^2 \mathbf {RR}^\prime$. We have to show that this expression is equal to Eq. (17). As before, we consider the entry (l, i) of $\sigma _h^2 \mathbf {RR}^\prime$ which is given by $M_{l,k}M_{l,j}M_{i,k}M_{i,j} \sigma _h^2$. Moreover the (l, i)-th entry of $\left( \mathbf {M}_{\bullet , k} \mathbf {M}^\prime _{\bullet , k} \right)$ is $M_{l,k} M_{i,k}$ and the (l, i)-th entry of $\left( \mathbf {M}_{\bullet , j} \mathbf {M}_{\bullet , j}^\prime \right)$ is $M_{l,j} M_{i,j}$ which proves the statement.

Table 1 Average correlation between predicted and measured phenotypes obtained by 200 independent draws of a test set consisting of 60 lines in the respective environment

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martini, J.W.R., Wimmer, V., Erbe, M. et al. Epistasis and covariance: how gene interaction translates into genomic relationship. Theor Appl Genet 129, 963–976 (2016). https://doi.org/10.1007/s00122-016-2675-5

Download citation

Received: 02 June 2015
Accepted: 16 January 2016
Published: 16 February 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s00122-016-2675-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Epistasis and covariance: how gene interaction translates into genomic relationship