The complete sequence of a human Y chromosome

Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY, DAZ, and RBMY; 42 additional protein-coding genes, mostly from the TSPY gene family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Competing Interest Statement
S.N. is an employee of Oxford Nanopore Technologies; A.F. is an employee of DNAnexus; C.-S.C. is an employee of Sema4 OpCo Inc.; N.-C.C. is an employee of Exai Bio; L.F.P. receives research support from Genetech; F.J.S. receives research support from Pacific Biosciences, Oxford Nanopore Technologies, Illumina, and Genetech; K.S. is an employee of Google LLC and owns Alphabet stock as part of the standard compensation package; W.T. has two patents (8,748,091 and 8,394,584) licensed to Oxford Nanopore Technologies; E.E.E. is a scientific advisory board member of Variant Bio, Inc.
Footnotes
↵# Retired
Subject Area
- Biochemistry (9581)
- Bioengineering (7086)
- Bioinformatics (24844)
- Biophysics (12597)
- Cancer Biology (9951)
- Cell Biology (14345)
- Clinical Trials (138)
- Developmental Biology (7944)
- Ecology (12101)
- Epidemiology (2067)
- Evolutionary Biology (15983)
- Genetics (10920)
- Genomics (14732)
- Immunology (9868)
- Microbiology (23645)
- Molecular Biology (9477)
- Neuroscience (50836)
- Paleontology (369)
- Pathology (1539)
- Pharmacology and Toxicology (2681)
- Physiology (4013)
- Plant Biology (8655)
- Synthetic Biology (2391)
- Systems Biology (6427)
- Zoology (1346)