Abstract
The prioritization of Structural Variants (SV), which is needed to rank and identify potential pathogenic alleles, is still in its infancy. This is exemplified over gnomAD only being able to annotate 33.5% of GIAB SVs. To overcome this, we present the first long-read based annotation resource for both GRCh38 and CHM13-T2T reference using STIX. In contrast to previous methods, STIX indexes SV-informative long-reads themselves, can thus be easily extended and accurately annotate all SV types including insertions. STIX successfully annotated 95.9% of GIAB Tier1 SVs. STIX further improved cancer based SV prioritization by highlighting 3,563 SV from COSMIC being common in the population. We further showcase that mosaic SV can be independently gained and may be widely spread throughout the population. This highlights the need for accurate SV population frequency annotation to further facilitate the adoption of SV via long-read sequencing in medical research and clinical applications.
Competing Interest Statement
F.J.S. receives research support from Illumina, Pacbio and Oxford Nanopore. All other authors declare no competing interests.