Abstract
Uncovering the genes governing host-parasite coevolution is of importance for disease management in agriculture and human medicine. The availability of increasing amounts of host and parasite full genome-data in recent times allows to perform cross-species genome-wide association studies based on sampling of genomic data of infected hosts and their associated parasites strains. We aim to understand the statistical power of such approaches. We develop two indices, the cross species association (CSA) and the cross species prevalence (CSP), the latter additionally incorporating genomic data from uninfected hosts. For both indices, we derive genome-wide significance thresholds by computing their expected distribution over unlinked neutral loci, i.e. those not involved in determining the outcome of interaction. Using a population genetics and an epidemiological coevolutionary model, we demonstrate that the statistical power of these indices to pinpoint the interacting loci in full genome data varies over time. This is due to the underlying GxG interactions and the coevolutionary dynamics. Under trench-warfare dynamics, CSA and CSP are very accurate in finding out the loci under coevolution, while under arms-race dynamics the power is limited especially under a gene-for-gene interaction. Furthermore, we reveal that the combination of both indices across time samples can be used to estimate the asymmetry of the underlying infection matrix. Our results provide novel insights into the power and biological interpretation of cross-species association studies using samples from natural populations or controlled experiments.