Multi-platform discovery of haplotype-resolved structural variation in human genomes
ABSTRACT
The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long- and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥50 bp) per human genome, a sevenfold increase in structural variation compared to previous reports, including from the 1000 Genomes Project. We also discovered 156 inversions per genome—most of which previously escaped detection—as well as large unbalanced chromosomal rearrangements. We provide near-complete, haplotype-resolved structural variation for three genomes that can now be used as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.
Subject Area
- Biochemistry (11493)
- Bioengineering (8567)
- Bioinformatics (28728)
- Biophysics (14724)
- Cancer Biology (11850)
- Cell Biology (17059)
- Clinical Trials (138)
- Developmental Biology (9272)
- Ecology (13965)
- Epidemiology (2067)
- Evolutionary Biology (18064)
- Genetics (12105)
- Genomics (16553)
- Immunology (11651)
- Microbiology (27517)
- Molecular Biology (11320)
- Neuroscience (59839)
- Paleontology (446)
- Pathology (1839)
- Pharmacology and Toxicology (3169)
- Physiology (4853)
- Plant Biology (10221)
- Synthetic Biology (2827)
- Systems Biology (7271)
- Zoology (1605)