Abstract
Background Copy number variation (CNV) is a class of genomic Structural Variation (SV) that underlie genomic disorders and can have profound implications for health. Short-read genome sequencing (sr-GS) enables CNV calling for genomic intervals of variable size and across multiple phenotypes. However, unresolved challenges include an overwhelming number of false-positive calls due to systematic biases from non-uniform read coverage and collapsed calls resulting from the abundance of paralogous segments and repetitive elements in the human genome.
Methods To address these interpretative challenges, we developed VizCNV. The VizCNV computational tool for inspecting CNV calls uses various data signal sources from sr-GS data, including read depth, phased B-allele frequency, as well as benchmarking signals from other SV calling methods. The interactive features and view modes are adept for analyzing both chromosomal abnormalities [e.g., aneuploidy, segmental aneusomy, and chromosome translocations], gene exonic CNV and non-coding gene regulatory regions. In addition, VizCNV includes a built-in filter schema for trio genomes, prioritizing the detection of impactful germline CNVs, such as de novo CNVs. Upon computational optimization by fine-tuning parameters to maximize sensitivity and specificity, VizCNV demonstrated approximately 83.8% recall and 77.2% precision on the 1000 Genome Project data with an average coverage read depth of 30x.
Results We applied VizCNV to 39 families with primary immunodeficiency disease without a molecular diagnosis. With implemented build-in filter, we identified two de novo CNVs and 90 inherited CNVs >10 kb per trio. Genotype-phenotype analyses revealed that a compound heterozygous combination of a paternal 12.8 kb deletion of exon 5 and a maternal missense variant allele of DOCK8 are likely the molecular cause of one proband.
Conclusions VizCNV provides a robust platform for genome-wide relevant CNV discovery and visualization of such CNV using sr-GS data.
Competing Interest Statement
J.R.L. serves on the Scientific Advisory Board of Baylor Genetics. J.R.L. has stock ownership in 23andMe, is a paid consultant for Genome International, and is a co-inventor on multiple United States and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, genomic disorders, and bacterial genomic fingerprinting.
List of abbreviations
- aCGH
- array Comparative Genomic Hybridization
- AAMR
- Alu-Alu Mediated Rearrangement
- BAF
- B-allele Frequency
- CNV
- Copy Number Variation
- ES
- Exome Sequencing
- FDR
- False Discovery Rate
- GATK
- Genome Analysis Toolkit
- GS
- Genome Sequencing
- GREGoR
- Genomics Research to Elucidate the Genetics of Rare Diseases
- IGV
- Integrative Genomics Viewer
- PIDD
- Primary Immunodeficiency Disease
- ROH
- Runs of Homozygosity
- SD
- Segmental Duplication
- SV
- Structural Variation
- SNP
- Single Nucleotide Polymorphism