Abstract
In this article, we propose a covariance based method for combining impartial data sets in the genotype to phenotype spectrum. In particular, an expectation-maximization algorithm that can be used to combine partially over-lapping relationship/covariance matrices is introduced. Combining data this way, based on relationship matrices, can be contrasted with a feature imputation based approach. We used several public genomic data sets to explore the accuracy of combining genomic relationship matrices. We have also used the heterogeneous genotype/phenotype data sets in the https://triticeaetoolbox.org/ to illustrate how this new method can be used in genomic prediction, phenomics, and graphical modeling.
Key message Several covariance matrices obtained from independent experiments can be combined as long as these matrices are partially overlapping. We demonstrate the usefulness of this methodology with examples in combining data from several partially linked genotypic and phenotypic experiments.
Footnotes
This research was supported by WheatSustain.
3 In certain instances, the union of the genotypes in the parts did not recover all of the NTotal genotypes, therefore this calculation was based on the recovered part of the full genomic relationship matrix
↵4 Σ = diag(b + 1) + .21n×n where bi for i = 1, 2, …, n are i.i.d. uniform between 0 and 1.
↵5 Σ0 = diag(.5b + 1) + .3 b01n×n where bi for i = 0, 2, …, n are i.i.d. uniform between 0 and 1.