TY - JOUR T1 - A tandem simulation framework for predicting mapping quality JF - bioRxiv DO - 10.1101/103952 SP - 103952 AU - Ben Langmead Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/30/103952.abstract N2 - Read alignment is the first step in most sequencing data analyses. It is also a source of errors and interpretability problems. Repetitive genomes, algorithmic shortcuts, and genetic variation impede the aligner’s ability to find a read’s true point of origin. Aligners therefore report a mapping quality: the probability the reported point of origin for a read is incorrect. However, there is no established method for calculating mapping quality in a general way. We describe an accurate, aligner-agnostic framework for predicting mapping qualities that works by simulating a set of tandem reads, similar to the input reads in important ways, but for which the true point of origin is known. Alignments of tandem reads are used to build a model for predicting mapping quality, which is then applied to the input-read alignments. The model is automatically tailored to the alignment scenario at hand, allowing it to make accurate mapping-quality predictions across a range of scenarios. We implement this approach in a software tool called Qtip, which is accurate, low-overhead, and compatible with popular read aligners. Qtip is open source software available from https://github.com/BenLangmead/qtip. ER -