Multi-platform discovery of haplotype-resolved structural variation in human genomes
ABSTRACT
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.
Subject Area
- Biochemistry (11745)
- Bioengineering (8752)
- Bioinformatics (29200)
- Biophysics (14972)
- Cancer Biology (12096)
- Cell Biology (17411)
- Clinical Trials (138)
- Developmental Biology (9421)
- Ecology (14182)
- Epidemiology (2067)
- Evolutionary Biology (18308)
- Genetics (12245)
- Genomics (16803)
- Immunology (11869)
- Microbiology (28085)
- Molecular Biology (11592)
- Neuroscience (60969)
- Paleontology (451)
- Pathology (1871)
- Pharmacology and Toxicology (3238)
- Physiology (4959)
- Plant Biology (10427)
- Synthetic Biology (2885)
- Systems Biology (7340)
- Zoology (1651)