Abstract
Motivation We introduce an algorithm for selectively aligning high-throughput sequencing reads to a transcriptome, with the goal of improving transcript-level quantification. This algorithm attempts to bridge the gap between fast “mapping” algorithms and more traditional alignment procedures.
Results We adopt a hybrid approach that is able to increase mapping accuracy while still retaining much of the efficiency of fast mapping algorithms. To achieve this, we introduce a new approach that explores the candidate search space with high sensitivity as well as a collection of carefully-engineered heuristics to efficiently filter these candidates. Additionally, unlike the strategies adopted in most aligners which first align the ends of paired-end reads independently, we introduce a notion of co-mapping. This procedure exploits relevant information between the “hits” from the left and right ends of paired-end reads before full alignments or mappings for each are generated, which improves the efficiency of filtering likely-spurious alignments. Finally, we demonstrate the utility of selective alignment in improving the accuracy of efficient transcript-level quantification from RNA-seq reads. Specifically, we show that selective-alignment is able to resolve certain complex mapping scenarios that can confound existing fast mapping procedures, while simultaneously eliminating spurious alignments that fast mapping approaches can produce.
Availability Selective-alignment is implemented in C++11 as a part of Salmon, and is available as open source software, under GPL v3, at: https://github.com/COMBINE-lab/salmon/tree/selective-alignment
Contact rob.patro{at}cs.stonybrook.edu