TY - JOUR T1 - Overlapping long sequence reads: Current innovations and challenges in developing sensitive, specific and scalable algorithms JF - bioRxiv DO - 10.1101/081596 SP - 081596 AU - Justin Chu AU - Hamid Mohamadi AU - René L Warren AU - Chen Yang AU - Inanc Birol Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/17/081596.abstract N2 - Identifying overlaps between error-prone long reads, specifically those from Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PB), is essential for certain downstream applications, including error correction and de novo assembly. Though akin to the read-to-reference alignment problem, read-to-read overlap detection is a distinct problem that can benefit from specialized algorithms that perform efficiently and robustly on high error rate long reads. Here, we review the current state-of-the-art read-to-read overlap tools for error-prone long reads, including BLASR, DALIGNER, MHAP, GraphMap, and Minimap. These specialized bioinformatics tools differ not just in their algorithmic designs and methodology, but also in their robustness of performance on a variety of datasets, time and memory efficiency, and scalability. We highlight the algorithmic features of these tools, as well as their potential issues and biases when utilizing any particular method. We benchmarked these tools, tracking their resource needs and computational performance, and assessed the specificity and precision of each. The concepts surveyed may apply to future sequencing technologies, as scalability is becoming more relevant with increased sequencing throughput.Contact cjustin{at}bcgsc.ca; ibirol{at}bcgsc.caSupplementary information Supplementary data are available at Bioinformatics online. ER -