A bioinformatician's guide to the forefront of suffix array construction algorithms

Brief Bioinform. 2014 Mar;15(2):138-54. doi: 10.1093/bib/bbt081. Epub 2014 Jan 10.

Abstract

The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support 'spaced seeds' and 'subset seeds' used in many biological applications.

Keywords: linear-time algorithm; spaced seeds; subset seeds; suffix array construction; text index.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Databases, Nucleic Acid / statistics & numerical data
  • Humans
  • Pattern Recognition, Automated / statistics & numerical data
  • Software