Abstract
Motivation Structural variation (SV) is an important and diverse source of human genome variation. Over the past several years, much progress has been made in the area of SV detection, but predicting the functional impact of SVs discovered in whole genome sequencing (WGS) studies remains extremely challenging. Accurate SV impact prediction is especially important for WGS-based rare variant association studies and studies of rare disease.
Results Here we present SVScore, a computational tool for in silico SV impact prediction. SVScore aggregates existing per-base single nucleotide polymorphism pathogenicity scores across relevant genomic intervals for each SV in a manner that considers variant type, gene features, and uncertainty in breakpoint location. We show that in a Finnish cohort, the allele frequency spectrum of SVs with high impact scores is strongly skewed toward lower frequencies, suggesting that these variants are under purifying selection. We further show that SVScore identifies deleterious variants more effectively than naïve alternative methods. Finally, our results indicate that high-scoring tandem duplications may be under surprisingly strong selection relative to high-scoring deletions, suggesting that duplications may be more deleterious than previously thought. In conclusion, SVScore provides pathogenicity prediction for SVs that is both informative and meaningful for understanding their functional role in disease.
Availability SVScore is implemented in Perl and available freely at {{http://www.github.com/lganel/SVScore}} for use under the MIT license.
Contact ihall{at}wustl.edu