PT - JOURNAL ARTICLE AU - S. Hong Lee AU - Sam Clark AU - Julius H.J. van der Werf TI - Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship AID - 10.1101/119164 DP - 2017 Jan 01 TA - bioRxiv PG - 119164 4099 - http://biorxiv.org/content/early/2017/03/22/119164.short 4100 - http://biorxiv.org/content/early/2017/03/22/119164.full AB - We present a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as ‘unrelated’ individuals from the wider population, assuming they all come from the same homogeneous population. The various sources of information were modeled as different populations with different effective population sizes (Ne). With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. However, the number of individuals from the wider population can be far greater than that of close relatives. We validate our theory with analysis of real data, and illustrate that the variation in genomic relationships with the target, rather than the variation in genomic relationship as a deviation for the expected relationship, is a predictor of the information content of the reference set and information from pedigree relationships is then naturally included in the prediction framework. Both the effective number of chromosome segments (Me) and Ne are considered to be a function of the data used for prediction rather than being population parameters. We illustrate that when prediction also relies on closer relatives, there is less improvement in prediction accuracy with an increase in training data or marker panel density. We release software that can estimate the expected prediction accuracy and power when combining different reference sources with various degrees of relationship to the target, which is useful when planning genomic prediction (i.e. before collecting data) in animal, plant and human genetics.