PT - JOURNAL ARTICLE AU - Samuele Soraggi AU - Carsten Wiuf AU - Anders Albrechtsen TI - Improved D-Statistic for Low-Coverage Data AID - 10.1101/127852 DP - 2017 Jan 01 TA - bioRxiv PG - 127852 4099 - http://biorxiv.org/content/early/2017/04/16/127852.short 4100 - http://biorxiv.org/content/early/2017/04/16/127852.full AB - The detection of ancient gene flow between human populations is an important issue in population genetics. A commonly used tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups.When working with high throughput sequencing data is it not always possible to accurately call genotypes. When genotype calling is not possible the D-statistic that is currently used samples a single base from the reads of one chosen individual per population. This method has the drawback of ignoring much of the information in the data. Those issues are especially striking in the case of ancient genomes, often characterized by low sequencing depth and high error rates for the sequenced bases.Here we provide a significant improvement to overcome the problems of the present-day D-statistic by considering all reads from multiple individuals in each population. Moreover we apply type-specific error correction to combat the problems of sequencing errors and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this method leads to an estimate of the admixture rate.We prove that the improved D-statistic, as well as the traditional one, is approximated by a standard normal. Furthermore we show that our method overperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low/medium sequencing depth (1-10X) and performances are as good as with perfectly called genotypes at a sequencing depth of 2X. We also show the reliability of error correction on scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to verify the correctness the estimation of the admixture rates.Author Summary Inferring ancient gene flow between populations has been the basis to uncover demographic histories of many species, including humans. A recently used tool to detect the presence of gene flow is the D-statistic which is based on individuals from four different populations. The test uses counts of how many times a pair of populations shares the same allele across the genome. We can determine if there is an excess of shared genetic material between them revealing gene flow, and then accept or reject the hypothesized relation.We developed an improved version of the D-statistic aimed at multiple individuals sequenced at low depth. In addition we can accomodate genomes subject to external gene flow, as well as genomes with an excess of errors such as ancient genomes. We show that the current method will reject a correct demographic history because of sequencing errors, but that error adjustment makes the result consistent with the theory.In general, our method has higher power than the current method, for example to detect Neandertal gene flow into human populations, and is less affected by data quality issues, such as low depth.