PT - JOURNAL ARTICLE AU - Umberto Esposito AU - Ranajit Das AU - Mehdi Pirooznia AU - Eran Elhaik TI - Ancient ancestry informative markers for identifying fine-scale ancient population structure in Eurasians AID - 10.1101/333690 DP - 2018 Jan 01 TA - bioRxiv PG - 333690 4099 - http://biorxiv.org/content/early/2018/05/30/333690.short 4100 - http://biorxiv.org/content/early/2018/05/30/333690.full AB - The rapid accumulation of ancient human genomes from various places and time periods, mainly from the past 15,000 years, allows us to probe the past with an unparalleled accuracy and reconstruct trends in human biodiversity. Alongside providing novel insights into the population history, population structure permits correcting for population stratification, a practical concern in gene mapping in association studies. However, it remains unclear which markers best capture ancient population structure as not all markers are equally informative. Moreover, the high missingness rates in ancient, oftentimes haploid, DNA, may distort the population structure and prohibit genomic comparisons. In past studies, ancestry informative markers (AIMs) were harnessed to address such problems, yet whether AIMs finding methods are applicable to aDNA remains unclear. Here, we define ancient AIM (aAIMs) and develop a framework to evaluate established and novel AIMs-finding methods. We show that a novel principal component analysis (PCA)-based method outperforms all methods in capturing ancient population structure and identifying admixed individuals. Our results highlight important features of the genetic structure of ancient Eurasians and the choice of strategies to identify informative markers. This work can inform the design and interpretation of population and medical studies employing ancient DNA.Author summary Ancient DNA studies aim to identify geographical origin, migration routes, and disease susceptibility genes through the analysis of genetic markers such as single nucleotide polymorphisms (SNPs) in growing cohorts of ancient data. In addition to the existence of sub-structure in the studied population (i.e., differences in ancestry), ancient DNA suffers from high missingness rates and is oftentimes haploid, which may distort the inferred population structure and lead to spurious results. It is thereby imperative to address this possible bias by identifying the most accurate population structure. Due to the success of past studies in addressing similar problems using ancestry informative markers (AIMs), we defined ancient ancestry informative markers (aAIMs) that like AIMs can be used to interrogate ancient population structure. To find aAIMs, we designed a framework to evaluate established and novel AIMs-finding methods. We developed a database of 150,278 autosomal SNPs from 302 ancient genomes and 21 populations recovered from Europe, the Middle East, and North Eurasia dated to time periods from 14,000 to 1,500 years ago. We then applied two existing and three novel AIMs-finding methods and compared their performances against the complete dataset. We found that a novel principal component analysis (PCA)-based method captured the ancient population structure most accurately. Importantly, we introduce here a novel concept of aAIMs, a novel method that effectively identifies aAIMs, and a framework to compare the performances of AIMs. The outcome of our studies can improve the accuracy of genetic studies employing ancient DNA.