The complete sequence of a human Y chromosome

Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications1–3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4, 5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY, DAZ, and RBMY gene families; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome4 and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Competing Interest Statement
S.N. is now an employee of Oxford Nanopore Technologies; S.K. has received travel funds to speak at events hosted by Oxford Nanopore Technologies; A.F. is an employee of DNAnexus; C.-S.C. is an employee of GeneDX Holdings Corp.; N.-C.C. is an employee of Exai Bio; L.F.P. receives research support from Genetech; F.J.S. receives research support from Pacific Biosciences, Oxford Nanopore Technologies, Illumina, and Genetech; K.S. is an employee of Google LLC and owns Alphabet stock as part of the standard compensation package; W.T. has two patents (8,748,091 and 8,394,584) licensed to Oxford Nanopore Technologies; E.E.E. is a scientific advisory board member of Variant Bio, Inc. All other authors declare no competing interests.
Footnotes
↵# Retired
The manuscript has been updated to reflect updates in gene annotations and to make the manuscript more succinct.
Subject Area
- Biochemistry (10798)
- Bioengineering (8045)
- Bioinformatics (27310)
- Biophysics (13987)
- Cancer Biology (11127)
- Cell Biology (16062)
- Clinical Trials (138)
- Developmental Biology (8788)
- Ecology (13294)
- Epidemiology (2067)
- Evolutionary Biology (17364)
- Genetics (11689)
- Genomics (15925)
- Immunology (11034)
- Microbiology (26093)
- Molecular Biology (10654)
- Neuroscience (56568)
- Paleontology (418)
- Pathology (1732)
- Pharmacology and Toxicology (3005)
- Physiology (4547)
- Plant Biology (9630)
- Synthetic Biology (2689)
- Systems Biology (6979)
- Zoology (1510)