Structural variation across 138,134 samples in the TOPMed consortium
Abstract
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Competing Interest Statement
V.G.S. serves as an advisor to and/or has equity in Branch Biosciences, Ensoma, Novartis, Forma, and Cellarity, all unrelated to the present work. A.M.S. receives funding from Seven Bridges Genomics to develop tools for the NHLBI BioData Catalyst consortium. FJS receives funding from Pacific Biosciences , Illumina, Genetech and Oxford Nanopore. E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. WS, OK, HK and GA are employed over Regeneron LMR is a consultant for the TOPMed Administrative Coordinating Center (through Westat). ES gets grant support from GSK and Bayer. A.M.S. receives funding from Seven Bridges Genomics to develop tools for the NHLBI BioData Catalyst consortium. JLS serves as a Scientific Advisor to Precion.
Subject Area
- Biochemistry (11715)
- Bioengineering (8723)
- Bioinformatics (29129)
- Biophysics (14936)
- Cancer Biology (12049)
- Cell Biology (17359)
- Clinical Trials (138)
- Developmental Biology (9406)
- Ecology (14144)
- Epidemiology (2067)
- Evolutionary Biology (18268)
- Genetics (12221)
- Genomics (16767)
- Immunology (11843)
- Microbiology (28014)
- Molecular Biology (11560)
- Neuroscience (60814)
- Paleontology (450)
- Pathology (1864)
- Pharmacology and Toxicology (3231)
- Physiology (4940)
- Plant Biology (10384)
- Synthetic Biology (2878)
- Systems Biology (7333)
- Zoology (1642)