The complete sequence of a human Y chromosome

Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY, DAZ, and RBMY; 42 additional protein-coding genes, mostly from the TSPY gene family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Competing Interest Statement
S.N. is an employee of Oxford Nanopore Technologies; A.F. is an employee of DNAnexus; C.-S.C. is an employee of Sema4 OpCo Inc.; N.-C.C. is an employee of Exai Bio; L.F.P. receives research support from Genetech; F.J.S. receives research support from Pacific Biosciences, Oxford Nanopore Technologies, Illumina, and Genetech; K.S. is an employee of Google LLC and owns Alphabet stock as part of the standard compensation package; W.T. has two patents (8,748,091 and 8,394,584) licensed to Oxford Nanopore Technologies; E.E.E. is a scientific advisory board member of Variant Bio, Inc.
Footnotes
↵# Retired
Subject Area
- Biochemistry (8766)
- Bioengineering (6480)
- Bioinformatics (23346)
- Biophysics (11751)
- Cancer Biology (9150)
- Cell Biology (13255)
- Clinical Trials (138)
- Developmental Biology (7417)
- Ecology (11370)
- Epidemiology (2066)
- Evolutionary Biology (15088)
- Genetics (10402)
- Genomics (14012)
- Immunology (9122)
- Microbiology (22050)
- Molecular Biology (8780)
- Neuroscience (47375)
- Paleontology (350)
- Pathology (1420)
- Pharmacology and Toxicology (2482)
- Physiology (3704)
- Plant Biology (8050)
- Synthetic Biology (2209)
- Systems Biology (6016)
- Zoology (1250)