Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has led to its incomplete assembly and systematic omission from genomic analyses. Here, we present long-read de novo assemblies of 43 diverse Y chromosomes spanning 180,000 years of human evolution, including two from deep-rooted African Y lineages, and report remarkable complexity and diversity in chromosome size and structure, in contrast with its low level of base substitution variation. The size of the Y chromosome assemblies varies extensively from 45.2 to 84.9 Mbp and include, on average, 81 kbp of novel sequence per Y chromosome. Half of the male-specific euchromatic region is subject to large inversions with a >2-fold higher recurrence rate compared to inversions in the rest of the human genome. Ampliconic sequences associated with these inversions further show differing mutation rates that are sequence context-dependent and some ampliconic genes show evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, the Yq12, is composed of alternating arrays of DYZ1 and DYZ2 repeat units that show extensive variation in the number, size and distribution of these arrays, but retain a 1:1 copy number ratio of the monomer repeats, consistent with the notion that functional or evolutionary forces are acting on this chromosomal region. Finally, our data suggests that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kbp distal to the currently established boundary. The availability of sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of specific traits with Y-chromosomal variants and garnering novel insights into the evolution and function of complex regions of the human genome.
Competing Interest Statement
E.E.E. is a scientific advisory board (SAB) member of Variant Bio. C.L. is an SAB member of Nabsys and Genome Insight. The following authors have previously disclosed a patent application (no. EP19169090) relevant to Strand-seq: J.O.K., T.M., and D.P.; the other authors declare no competing interests.
Footnotes
Additional analyses have been added to the manuscript and include changes to the main text, figures and supplementary.