RT Journal Article
SR Electronic
T1 On the Complexity of Sequence to Graph Alignment
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 522912
DO 10.1101/522912
A1 Chirag Jain
A1 Haowen Zhang
A1 Yu Gao
A1 Srinivas Aluru
YR 2019
UL http://biorxiv.org/content/early/2019/01/17/522912.abstract
AB Availability of extensive genetics data across multiple individuals and populations is driving the growing importance of graph based reference representations. Aligning sequences to graphs is a fundamental operation on several types of sequence graphs (variation graphs, assembly graphs, pan-genomes, etc.) and their biological applications. Though research on sequence to graph alignments is nascent, it can draw from related work on pattern matching in hypertext. In this paper, we study sequence to graph alignment problems under Hamming and edit distance models, and linear and affine gap penalty functions, for multiple variants of the problem that allow changes in query alone, graph alone, or in both. We prove that when changes are permitted in graphs either standalone or in conjunction with changes in the query, the sequence to graph alignment problem is -complete under both Hamming and edit distance models for alphabets of size ≥ 2. For the case where only changes to the sequence are permitted, we present an O(|V| + m|E|) time algorithm, where m denotes the query size, and V and E denote the vertex and edge sets of the graph, respectively. Our result is generalizable to both linear and affine gap penalty functions, and improves upon the run-time complexity of existing algorithms.