Fast and accurate read mapping with approximate seeds and multiple backtracking

Nucleic Acids Res. 2013 Apr;41(7):e78. doi: 10.1093/nar/gkt005. Epub 2013 Jan 28.

Abstract

We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Caenorhabditis elegans / genetics
  • Chromosome Mapping / methods*
  • Drosophila melanogaster / genetics
  • Escherichia coli / genetics
  • Genetic Variation
  • Genomics / methods
  • Humans
  • Software*