PT - JOURNAL ARTICLE AU - Yutong Qiu AU - Cong Ma AU - Han Xie AU - Carl Kingsford TI - Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem AID - 10.1101/697367 DP - 2019 Jan 01 TA - bioRxiv PG - 697367 4099 - http://biorxiv.org/content/early/2019/07/09/697367.short 4100 - http://biorxiv.org/content/early/2019/07/09/697367.full AB - Transcriptomic structural variants (TSVs) — structural variants that affect expressed regions — are common, especially in cancer. Detecting TSVs is a challenging computational problem. Sample heterogeneity (including differences between alleles in diploid organisms) is a critical confounding factor when identifying TSVs. To improve TSV detection in heterogeneous RNA-seq samples, we introduce the MULTIPLE COMPATIBLE ARRANGEMENT PROBLEM (MCAP), which seeks k genome rearrangements to maximize the number of reads that are concordant with at least one rearrangement. This directly models the situation of a heterogeneous or diploid sample. We prove that MCAP is NP-hard and provide a -approximation algorithm for k = 1 and a -approximation algorithm for the diploid case (k = 2) assuming an oracle for k = 1. Combining these, we obtain a -approximation algorithm for MCAP when k = 2 (without an oracle). We also present an integer linear programming formulation for general k. We completely characterize the graph structures that require k > 1 to satisfy all edges and show such structures are prevalent in cancer samples. We evaluate our algorithms on 381 TCGA samples and 2 cancer cell lines and show improved performance compared to the state-of-the-art TSV-calling tool, SQUID.