## Abstract

**Summary** Partial order alignment, which aligns a sequence to a directed acyclic graph, is now frequently used as a key component in long-read error correction and assembly. We present abPOA (**a**daptive **b**anded **P**artial **O**rder **A**lignment), a Single Instruction Multiple Data (SIMD) based C library for fast partial order alignment using adaptive banded dynamic programming. It can work as a stand-alone multiple sequence alignment and consensus calling tool or be easily integrated into any long-read error correction and assembly workflow. Compared to a state-of-the-art tool (SPOA), abPOA is up to 15 times faster with a comparable alignment accuracy.

**Availability and implementation** abPOA is implemented in C. A stand-alone tool and a C/Python software interface are freely available at https://github.com/yangao07/abPOA.

**Contact** ydwang{at}hit.edu.cn or XINGYI{at}email.chop.edu

## 1 Introduction

Partial order alignment (POA) was first introduced by Lee *et al.* (2002) to solve the multiple sequence alignment (MSA) problem. In POA, MSA is represented as a directed acyclic graph (DAG) and sequences are iteratively aligned to the DAG through dynamic programming (DP). Multiple consensus sequences are then generated by applying the heaviest bundling algorithm to the alignment graph (Lee, 2003). Recently, with the advent and growing popularity of long-read sequencing technologies on the Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) platforms, there is a renewed appreciation and interest of POA, and this algorithm is now broadly used for error correction and assembly of error-prone long reads (Loman *et al.*, 2015; Vaser *et al.*, 2017; Volden *et al.*, 2018; Gao *et al.*, 2019; Ruan and Li, 2019).

Although POA is much faster than classical MSA algorithms (Lassmann and Sonnhammer, 2002), the large datasets generated by state-of-the-art long-read sequencing platforms pose a major challenge. To address this challenge, Single Instruction Multiple Data (SIMD) implementation was used to accelerate the original POA algorithm (Vaser *et al.*, 2017). This SIMD version of POA, SPOA, takes advantage of the wider SIMD registers in modern processors that process multiple elements in parallel. SIMD vectors are used to store scores of multiple consecutive cells in each row of the DP matrix and processed using SIMD instructions, with parallel updating for all scores stored in each vector.

In addition to the SIMD parallelization, another acceleration strategy, “banded DP”, is also widely used in sequence-to-sequence alignment tools (Chao *et al.*, 1992). Specifically, in each row or column of the DP matrix, only cells inside a specific “band” need to be filled out. Recently, a graph version of this strategy was explored in GraphAligner (Rautiainen and Marschall, 2019), in which a band for each row of the DP matrix is dynamically defined based on the minimum score cell in that row. Other cells are considered to be inside the band only if the score is within a certain distance of that minimum score. Although GraphAligner is faster than other graph alignment tools, its strategy may not be suitable for MSA and consensus calling of long reads with high error rates. This is due to GraphAligner’s using of edit distance instead of general scoring function as the alignment metric, which may lead to incorrect alignments in regions with a large number of insertion or deletion errors that are common in PacBio and ONT sequencing data.

In this work, we have developed abPOA, an extended version of POA that performs adaptive banded DP with an SIMD implementation. abPOA supports flexible scoring schemes. It can work as a stand-alone MSA and consensus calling tool, or be easily integrated into any long-read error correction and assembly workflow.

## 2 Methods

abPOA adopts the same SIMD parallelization strategy as in SPOA, where the SIMD vectors are placed parallel to the linear sequence. In the DP matrix, each row corresponds to one node in the alignment graph and each column corresponds to one base of the linear sequence being aligned (Fig. 1A and Supplementary Fig. 1). Instead of filling out the entire DP matrix, abPOA adaptively defines a band for each row based on the scores in predecessor rows and the lengths of potential outgoing paths in the alignment graph, and only scores of cells inside this band are calculated.

In more detail, for each node in the alignment graph, abPOA first computes the length of the outgoing path with the largest number of supporting reads that starts from the current node to the end node. This length, *R*, is considered as the most likely number of additional bases to be included in the alignment path starting from the current node. abPOA iteratively calculates *R* for all nodes in a similar way to the heaviest bundling algorithm (Lee, 2003) (Supplementary Note). Note that *R* is calculated before each round of the sequence-to-graph alignment, based on the nodes and edge weights of the current alignment graph. Then, during the DP process, all rows of the DP matrix are sequentially processed following the partial order of the graph. As such, for each row, abPOA can collect the horizontal coordinates of maximum score cells in its predecessor rows, i.e. their positions in the linear sequence. The range (start and end positions) of possible maximum score cells in the current row, [*M*_{start}, *M*_{end}], can be derived as *M*_{start} = *P*_{left} + 1 and *M*_{end} = *P*_{right} + 1, where *P*_{left} and *P*_{right} are the positions of the leftmost and the rightmost maximum score cells in all predecessor rows. *M*_{start} and *M*_{end} of the first row are set as 0 since the start node has no predecessor.

With the above numbers calculated for each row, abPOA defines the start and end positions of the DP band in each row as *B*_{start} = max{0, min {*M*_{start}, *L* − *R*} − *w* and *B*_{end} = min{*L*, max{*M*_{end}, *L* − *R* + *w*}, where *L* is the length of the linear sequence and *w* is the number of extra bases added on both sides of the band, which is determined by two parameters *b* and *f* (default 10 and 0.01) as *b* + *f* × *L* (Fig. 1A and Supplementary Note). By taking both the scores in predecessor rows and poten-tial outgoing alignment paths into consideration, this adaptively defined DP band is expected to fully cover the optimal alignment path, even for divergent sequences and graphs. Next, abPOA maps the base-level boundary of the adaptive band onto SIMD vectors to only process vectors that contain bases inside the band. After iteratively aligning sequences to the graph and updating the graph (Lee *et al.*, 2002), abPOA generates a consensus sequence from the final alignment graph using the heaviest bundling algorithm (Lee, 2003).

## 3 Result

We evaluated abPOA using simulated long-read datasets along with SPOA (Vaser *et al.*, 2017), which to our knowledge is the only existing tool that uses SIMD to accelerate POA. Three read lengths (300 bp, 1000 bp, and 5000 bp) and four sequencing depths (3×, 10×, 30×, and 50×) were used to simulate 12 sets of sequences each with 100 clusters of sequences to be aligned, using NanoSim (Yang *et al.*, 2017) or PBSIM (Ono *et al.*, 2012) to incorporate error profiles of ONT or PacBio respectively (Supplementary Tables 1 and 2). Each cluster of sequences was aligned by abPOA or SPOA to generate a consensus sequence, and abPOA was run twice with adaptive banded DP enabled or disabled (see details of the evaluation procedure in Supplementary Note).

abPOA is 2.6–15.0 times faster than SPOA on NanoSim simulated sequence sets when adaptive banded DP is enabled (Fig. 1B and Supplementary Table 1). As expected, the speed improvement of abPOA over SPOA becomes particularly pronounced for longer sequences, as more SIMD vectors would be skipped outside of the adaptive band leading to a more significant efficiency gain. A similar trend can be observed from the PBSIM simulation (Supplementary Table 2). To evaluate alignment accuracy, we calculated the error rate of the generated consensus sequence (Supplementary Note). abPOA yields a comparably low error rate of consensus sequences when compared to SPOA (Supplementary Tables 1 and 2). Moreover, compared to abPOA without adaptive banding, abPOA with adaptive banding significantly reduces the run time i n all simulation settings without sacrificing the alignment accuracy.

In summary, our results demonstrate that abPOA can generate high-quality consensus sequences from error-prone long reads and offer significant speed improvement over existing tools. With the significant impact of POA on third-generation long-read sequencing data analysis, we expect that abPOA will be a useful and broadly applicable tool in long-read bioinformatics workflows.

## Funding

This work has been supported by the National Key Research and Development Program of China (Nos: 2018YFC0910504, 2017YFC1201201 and 2017YFC0907503).

## Conflict of Interest

Y.X. is a scientific cofounder of Panorama Medicine.

## Supplementary Note

### 1. Calculation of *R*

Before each round of the sequence-to-graph alignment, abPOA computes *R*, the length of the outgoing path with the largest number of supporting reads that starts from the current node to the end node. Note that neither the current node nor the end node is counted in the path. abPOA iteratively calculates *R* for all nodes in a similar way to the heaviest bundling algorithm in POA (Lee, 2003). Algorithm 1 gives the pseudo-code to calculate *R* through a graph traversal starting from the end node (line 5-6) to the start node (line 19-20). Here, the start node has no predecessor and the end node has no successor, and they both are auxiliary nodes that have no sequence bases (Fig. 1A and Supplementary Fig. 1). For each node, the outgoing edge with the heaviest weight and its corresponding successor node are picked out (line 15-17). Then, *R* of the current node is set as *R* of the chosen successor node increased by one (line 18). abPOA maintains a first-in-first-out queue and an array of all nodes’ out degrees to make sure every node is visited only after all of its successor nodes have been visited (line 23-24).

### 2. The number of extra bases added on both sides of the adaptive band

To further improve the alignment accuracy, abPOA allows the DP band in each row to be extended on both sides by *w*. Before each round of the sequence-to-graph alignment, *w*, the number of extra bases added on each side of the band is determined as *b* + *f* × *L*, where *b* and *f* are two parameters (default: 10 and 0.01), *L* is the length of the linear sequence being aligned to the graph. *w* is always rounded down to the nearest integer number.

### 3. Simulation procedure

To simulate long-read datasets with varying read lengths and sequencing depths, we first randomly extracted a sequence with a specific length from the GRCh38 human reference genome. Then NanoSim (Yang *et al.*, 2017) or PBSIM (Ono *et al.*, 2013) were used to simulate a specific number of reads from the extracted sequence. With three lengths (500 bp, 1000 bp, and 5000 bp) and four depths (3×, 10×, 30×, and 50×), a combination of 12 sequence sets were generated by each simulator. Within each simulation setting, the simulation was repeatedly run 100 times to generate 100 clusters of sequences and each cluster was aligned by abPOA or SPOA to generate a consensus sequence.

The run settings of NanoSim:

The run settings of PBSIM:

### 4. Evaluation procedure

Evaluation of abPOA and SPOA on the simulated long-read datasets was performed on a Linux system with Intel Core i5-6200U at 2.3 GHz and AVX2 instructions available. Two wrap-up programs (available at https://github.com/yangao07/abPOA) were written to make the two libraries allow each dataset that consists of 100 clusters of sequences as the input.

Both abPOA and SPOA were run in global alignment mode using a convex gap penalty scheme, i.e. two-piece gap penalty scheme. The convex penalty of a gap with length *g* is *min{O _{1}+g*×

*E*

_{1},

*O*

_{2}+

*g*×

*E*

_{2}}, where

*O*

_{1},

*E*

_{1},

*O*

_{2}, and

*E*

_{2}are the two pairs of gap opening and gap extension penalties. They were set as

*O*

_{1}=4,

*E*

_{1}=2,

*O*

_{2}=24, and

*E*

_{2}=1 in the evaluation runs for both abPOA and SPOA.

The run settings of SPOA:

The run settings of abPOA without adaptive banding:

The run settings of abPOA with adaptive banding:

To evaluate the alignment accuracy of abPOA and SPOA, we calculated the error rate of the generated consensus sequence. We aligned each consensus sequence to the original sequence using minimap2 (Li, 2018) with default settings. The error rate is calculated as the total number of mismatches, insertions, and deletions in the alignment divided by the length of the consensus sequence. We also computed the error rate of raw simulated sequences in the same way. The averaged error rates of raw simulated sequences and generated consensus sequences across each sequence set are shown in Supplementary Tables 1 and 2.

## Acknowledgements

The authors would like to thank Dr. Yuan Gao and Yadong Liu for assistance with testing abPOA and providing comments on the manuscript.

## Footnotes

↵† Co-first author