Abstract
Transcripts are frequently modified by structural variations, which leads to a fused transcript of either multiple genes (known as a fusion gene) or a gene and a previously non-transcribing sequence. Detecting these modifications (called transcriptomic structural variations, or TSVs), especially in cancer tumor sequencing, is an important and challenging computational problem. We introduce SQUID, a novel algorithm to accurately predict both fusion-gene and non-fusion-gene TSVs from RNA-seq alignments. SQUID unifies both concordant and discordant read alignments into one model, and doubles the accuracy on simulation data compared to other approaches. With SQUID, we identified novel non-fusion-gene TSVs on TCGA samples.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.