Abstract
Centromeres and their surrounding pericentric heterochromatic regions remain enigmatic and poorly understood despite critical roles in chromosome segregation 1,2 and disease 3,4. Their repetitive structure, vast size, low recombination rates and paucity of reliable markers and genes have impeded genetic and genomic interrogations. The potentially large selective impact of recurrent meiotic drive in female meiosis 5,6 has been proffered as the cause of evolutionarily rapid genomic turnover of centromere-associated satellite DNAs, rapid divergence of centromeric chromatin proteins 7, reduced polymorphisms in flanking regions 8 and high levels of aneuploidy 9. Addressing these challenges, we report here the identification large-scale haplotypic variation in humans 10 that spans the complete centromere, centromere-proximal regions (CPR) of metacentric chromosomes, including the annotated 'CEN' modeled arrays comprised of Mbps of highly repeated (171 bp) α-satellites 11,12. The dynamics inferred by the apparent descent of cenhaps are complex and inconsistent with the model of recurrent fixation of newly arising, strongly favored variants. The surprisingly deep diversity includes introgressed Neanderthal centromeres in the Out-of-Africa (OoA) populations, as well as ancient lineages among Africans. The high resolution of cenhaps can provide great power for detecting associations with other structural and functional variants in the CPRs. We demonstrate this with two examples of strong associations of cenhaps with α-satellite DNA content 13 on chromosomes X and 11. The discovery of cenhaps offers a new opportunity to investigate phenotypic variation in meiosis and mitosis, as well as more precise models of evolutionary dynamics in these unique and challenging genomic regions.








