TY - JOUR T1 - SNaReSim: Synthetic Nanopore Read Simulator JF - bioRxiv DO - 10.1101/133652 SP - 133652 AU - Philippe Faucon AU - Parithi Balachandran AU - Sharon Crook Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/05/22/133652.abstract N2 - Nanopores represent the first commercial technology in decades to present a significantly different technique for DNA sequencing, and one of the first technologies to propose direct RNA sequencing. Despite significant differences with previous sequencing technologies, read simulators to date make similar assumptions with respect to error profiles and their analysis. This is a great disservice to both nanopore sequencing and to computer scientists who seek to optimize their tools for the platform. Previous works have discussed the occurrence of some k-mer bias, but this discussion has been focused on homopolymers, leaving unanswered the question of whether k-mer bias exists over general k-mers, how it occurs, and what can be done to reduce the effects. In this work, we demonstrate that current read simulators fail to accurately represent k-mer error distributions, We explore the sources of k-mer bias in nanopore basecalls, and we present a model for predicting k-mers that are difficult to identify. We also propose a new SNaReSim, a new state-of-the-art simulator, and demonstrate that it provides higher accuracy with respect to 6-mer accuracy biases. ER -