A bioinformatician's guide to the forefront of suffix array construction algorithms

Anish Man Singh Shrestha; Martin C Frith; Paul Horton

doi:10.1093/bib/bbt081

A bioinformatician's guide to the forefront of suffix array construction algorithms

Brief Bioinform. 2014 Mar;15(2):138-54. doi: 10.1093/bib/bbt081. Epub 2014 Jan 10.

Authors

Anish Man Singh Shrestha¹, Martin C Frith, Paul Horton

Affiliation

¹ Computational Biology Research Center, AIST, Tokyo, Japan. computome@gmail.com.

Abstract

The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support 'spaced seeds' and 'subset seeds' used in many biological applications.

Keywords: linear-time algorithm; spaced seeds; subset seeds; suffix array construction; text index.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Databases, Nucleic Acid / statistics & numerical data
Humans
Pattern Recognition, Automated / statistics & numerical data
Software