Abstract
Summary: TopHat is a popular spliced junction mapper for RNA sequencing data, and writes files in the BAM format – the binary version of the Sequence Alignment/Map (SAM) format. BAM is the standard exchange format for aligned sequencing reads, thus correct format implementation is paramount for software interoperability and correct analysis. However, TopHat writes its unmapped reads in a way that is not compatible with other software that implements the SAM/BAM format. We have developed TopHat-Recondition, a post-processor for TopHat unmapped reads that restores read information in the proper format. TopHat-Recondition thus enables downstream software to process the plethora of BAM files written by TopHat.
Availability and implementation: TopHat-Recondition is implemented in Python using the Pysam library and is freely available under a 2-clause BSD license on GitHub: https://github.com/cbrueffer/tophat-recondition.
Contact: christian.brueffer{at}med.lu.se, lao.saal{at}med.lu.se