TY - JOUR T1 - Detecting Long-term Balancing Selection using Allele Frequency Correlation JF - bioRxiv DO - 10.1101/112870 SP - 112870 AU - Katherine M. Siewert AU - Benjamin F. Voight Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/05/23/112870.abstract N2 - Balancing selection occurs when it is beneficial for multiple alleles to be present in a population, which can result in the preservation of variants over long evolutionary time periods. A characteristic signature of this long-term balancing selection is an excess number of intermediate frequency polymorphisms near the balanced variant. However, the expected distribution of allele frequencies at these loci has not been extensively detailed, and therefore existing summary statistic methods do not explicitly take it into account. Using simulations, we show that new mutations which arise in close proximity to a site targeted by balancing selection accumulate at frequencies nearly identical to that of the balanced allele. In order to detect balancing selection, we propose a new summary statistic, β, which detects these clusters of alleles at similar frequencies. Simulation studies show that compared to existing summary statistics, our statistic has improved power to detect balancing selection, and is reasonably powered in non-equilibrium demographic models or when recombination or mutation rate varies. We compute β on 1000 Genomes Project data to identify loci potentially subjected to long-term balancing selection in humans. We report two balanced haplotypes - localized to the genes WFS1 and CADM2 - that are strongly linked to association signals for complex traits. Our approach is computationally efficient and applicable to species that lack appropriate outgroup sequences, allowing for well-powered analysis of selection in the wide variety of species for which population data is rapidly being generated. ER -