TY - JOUR T1 - A hidden Markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy JF - bioRxiv DO - 10.1101/064238 SP - 064238 AU - Russell Corbett-Detig AU - Rasmus Nielsen Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/07/15/064238.abstract N2 - Admixture—the mixing of genomes from divergent populations—is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require diploid genotypes, which is not feasible to obtain for many sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy—i.e. 100 or more chromosomes. We apply our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of low recombination show steeper clines than regions of high recombination, suggesting that selection against foreign ancestry has had the largest effect in these regions presumably due to increased linkage between neutral and selected sites. We also identify numerous outlier loci associated with behavior suggesting selection associated with prezygotic reproductive isolation. Finally, we identify candidate genes associated with reproductive isolation between ancestral subpopulations of D. melanogaster. Our results illustrate the potential of local ancestry inference for elucidating fundamental evolutionary processes.Author Summary When divergent populations hybridize their offspring obtain a portion of their genome from each parent population. Although the average ancestry proportion in each descendant is equal to the proportion of ancestors from each of the ancestral populations, the contribution of each ancestry type is variable across the genome. Estimating local ancestry within admixed individuals is a fundamental goal for evolutionary genetics, and here we develop a method for doing this that circumvents many of the problems associated with existing methods. Briefly, our method can use short read data, rather than genotypes and can be applied to samples with any number of chromosomes. Furthermore, our method simultaneously estimates local ancestry, and the number of generations since admixture—the time that the two ancestral populations first encountered each other. Finally, in applying our method to data from an admixture zone between ancestral populations of Drosophila melanogaster, we find many lines of evidence consistent with natural selection operating to against the introduction of foreign ancestry into populations of predominantly one ancestry type. Because of the generality of this method, we expect that it will be useful for a wide variety of existing and ongoing research projects. ER -