Abstract
Trans-species polymorphism has been widely used as a key sign of long-term balancing selection across multiple species. However, such sites are often rare in the genome, and could result from mutational processes or technical artifacts. Few methods are yet available to specifically detect footprints of transspecies balancing selection without using trans-species polymorphic sites. In this study, we develop summary- and model-based approaches that are each specifically tailored to uncover regions of long-term balancing selection shared by a set of species by using genomic patterns of intra-specific polymorphism and inter-specific fixed differences. We demonstrate that our trans-species statistics have substantially higher power than single-species approaches to detect footprints of trans-species balancing selection, and are robust to those that do not affect all tested species. We further apply our model-based methods to human and chimpanzee whole genome sequencing data, and identify the most outstanding candidates to be the MHC locus and the malaria resistance-associated FREM3/ GYPE region, consistent with previous findings. A number of other noteable genomic regions are involved in barrier-formation and innate immunity. Our findings echo the significance of pathogen defense in establishing balanced polymorphisms across human and chimpanzee lineages, and suggest that non-coding regulatory regions may also play an important role. Additionally, we show that these trans-species statistics can be applied to and work well for more than two species, and integrate them into open-source software packages for ease of use by the scientific community.