Abstract
qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p-values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.
Article Summary Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called qpAdm algorithm became disconnected from performance testing: the only extensive study of qpAdm on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by qpAdm on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of qpAdm protocols in archaeogenetic studies.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Discussion of false discovery rates of high-throughput qpAdm protocols and variables influencing them (pre-study odds, false positive and false negative rates) now occupies a much more prominent place in the paper and unites all the sections and simulation types. The influence of average distance between "left" and "right" deme sets in a model on qpAdm performance was also added. To walk the reader through our diverse results, we added a guide in the beginning of the Results chapter. Four figures were removed and one figure (Fig. 7) was added.