Multi-platform discovery of haplotype-resolved structural variation in human genomes

ABSTRACT
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, and strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent–child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per human genome. We also discover 156 inversions per genome—most of which previously escaped detection. Fifty-eight of the inversions we discovered intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The method and the dataset serve as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.
Subject Area
- Biochemistry (9101)
- Bioengineering (6749)
- Bioinformatics (23935)
- Biophysics (12086)
- Cancer Biology (9491)
- Cell Biology (13737)
- Clinical Trials (138)
- Developmental Biology (7614)
- Ecology (11656)
- Epidemiology (2066)
- Evolutionary Biology (15476)
- Genetics (10615)
- Genomics (14292)
- Immunology (9456)
- Microbiology (22773)
- Molecular Biology (9069)
- Neuroscience (48840)
- Paleontology (354)
- Pathology (1479)
- Pharmacology and Toxicology (2562)
- Physiology (3822)
- Plant Biology (8307)
- Synthetic Biology (2289)
- Systems Biology (6170)
- Zoology (1297)