Co-linear Chaining with Overlaps and Gap Costs

Co-linear chaining has proven to be a powerful heuristic for finding near-optimal alignments of long DNA sequences (e.g., long reads or a genome assembly) to a reference. It is used as an intermediate step in several alignment tools that employ a seed-chain-extend strategy. Despite this popularity, efficient subquadratic-time algorithms for the general case where chains support anchor overlaps and gap costs are not currently known. We present algorithms to solve the co-linear chaining problem with anchor overlaps and gap costs in Õ(n) time, where n denotes the count of anchors. We also establish the first theoretical connection between co-linear chaining cost and edit distance. Specifically, we prove that for a fixed set of anchors under a carefully designed chaining cost function, the optimal ‘anchored’ edit distance equals the optimal co-linear chaining cost. Finally, we demonstrate experimentally that optimal co-linear chaining cost under the proposed cost function can be computed orders of magnitude faster than edit distance, and achieves correlation coefficient above 0.9 with edit distance for closely as well as distantly related sequences.


Introduction
Computing an optimal alignment between two sequences is one of the most fundamental problems in computational biology. Unfortunately, conditional lowerbounds suggest that an algorithm for computing an optimal alignment, or edit distance, in strongly subquadratic time is unlikely [3,10]. This lower-bound indicates a challenge for scaling the computation of edit distance to high-throughput sequencing data. Instead, heuristics are often used to obtain an approximate solution in less time and space. One such popular heuristic is co-linear chaining. This technique involves precomputing fragments between the two sequences that closely agree (in this work, exact matches called anchors), then determining which of these anchors should be kept within the alignment (see Fig. 1). Techniques along these lines are used in long-read mappers [6,11,14,15,23,24,26] and generic sequence aligners [2,5,13,18,22]. We will focus on the following problem (described formally in Section 2): Given a set of n anchors, determine an optimal ordered subset (or chain) of these anchors.
Solutions with different time complexities exist for different variations of this problem. These depend on the cost-function assigned to a chain and the types of chains permitted. Solutions include an algorithm running in O(n log n log log n) time for a simpler variant of the problem where anchors used in a solution must be non-overlapping [1]. More recently, Mäkinen and Sahlin gave an algorithm running in O(n log n) time where anchor overlaps are allowed, but gaps between anchors are not considered in the cost-function [16]. None of the solutions introduced thus far provide a subquadratic time algorithm for variations that use both overlap and gap costs. However, including overlaps and gaps into a costfunction is a more realistic model for anchor chaining. For example, consider a simple scenario where minimizers [25] are used to identify anchors. Suppose query and reference sequences are identical, then adjacent minimizer-anchors will likely overlap. Not allowing anchor overlaps during chaining will lead to a penalty cost associated with gaps between chained anchors despite the two strings being identical. Therefore, depending on the type of anchor, there may be no reason to assume that in an optimal alignment the anchors would be non-overlapping. At the same time, not penalizing long gaps between the anchors is unlikely to produce correct alignments. This is why both anchor overlaps and gap costs are supported during chaining in widely-used aligners, e.g., Minimap2 [12,14], Nucmer4 [18]. This work's contribution is the following: -We provide the first algorithm running in subquadratic, O(n) time for chaining with overlap and gap costs 1 . Refinements based on the specific type of anchor and chain under consideration are also given. These refinements include an O(n log 2 n) time algorithm for the case where all anchors are of the same length, as is the case with k-mers. -When n is not too large (less than the sequence lengths), we present an algorithm with O(n · OP T + n log n) average-case time where OP T is the optimal solution value. This provides a simple algorithm that is efficient in practice. -Using a carefully designed cost-function, we mathematically relate the optimal chaining cost with a generalized version of edit distance, which we call anchored edit distance. This is equivalent to the usual edit distance with the modification that matches performed without the support of an anchor have unit cost. A more formal definition appears in Section 2. With our cost function, we prove that the optimal chaining cost is equal to the anchored edit distance. -We empirically demonstrate that computing optimal chaining cost is orders of magnitude faster than computing edit distance, especially in semi-global comparison mode. We also demonstrate a strong correlation between optimal chaining cost and edit distance. The correlation coefficients are favorable when compared to suboptimal chaining methods implemented in Minimap2 and Nucmer4.

A T T C A G A T A T C G A A T T C A T A T C G A T T A T T C A G A T A T C G A
A T T C A T A T C G A T T Fig. 1: (Left) Anchors representing a set of exact matches are shown as rectangles. The co-linear chaining problem is to find an optimal ordered subset of anchors subject to some cost function. (Right) A chain of overlapping anchors.

Concepts and Definitions
We say that the character match Maximal exact matches (MEMs), maximal unique matches (MUMs), or k-mer matches are some of the common ways to define anchors. Maximal unique matches [7] are a subset of maximal exact matches, having the added constraint that the pattern involved occurs only once in both strings. If all intervals across all anchors have the same length (e.g., using k-mers), we say that the fixed-length property holds. Our algorithms will make use of dynamic range minimum queries (RmQs). For a set of n d-dimensional points, each with an associated weight, a 'query' consists of an orthogonal d-dimensional range. The query response is the point in that range with the smallest weight. Using known techniques in computational geometry, a data structure can be built in O(n log d−1 n) time and space, that can both answer queries and modify a point's weight in O(log d n) time [4].

Co-linear Chaining Problem with Overlap and Gap Costs
Given a set of n anchors A for strings S 1 and S 2 , we assume that A already contains two end-point anchors A lef t = ([0, 0], [0, 0]) and A right = ([|S 1 | + 1, |S 1 | + 1], [|S 2 |+1, |S 2 |+1]). We define the strict precedence relationship ≺ between two anchors I ′ := A[j] and I := A[i] as I ′ ≺ I if and only if I ′ .a ≤ I.a, I ′ .b ≤ I.b, I ′ .c ≤ I.c, I ′ .d ≤ I.d, and strict inequality holds for at least one of the four inequalities. In other words, the interval belonging to I ′ for S 1 (resp. S 2 ) should start before or at the starting position of the interval belonging to I for S 1 (resp. S 2 ) and should not extend past it. We also define the weak precedence relation ≺ w as I ′ ≺ w I if and only if I ′ .a ≤ I.a, I ′ .c ≤ I.c and strict inequality holds for at least one of the two inequalities, i.e., intervals belonging to I ′ should start before or at the starting position of intervals belonging to I, but now intervals belonging to I ′ can be extended past the intervals belonging to I. The aim of the problem is to find a totally ordered subset (a chain) of A that achieves the minimum cost under the cost function presented next. We specify whether we mean a chain under strict precedence or under weak precedence when necessary. Cost function. For I ′ ≺ I, the function connect(I ′ , I) is designed to indicate the cost of connecting anchor I ′ to anchor I in a chain. The chaining problem asks for a chain of m ≤ n anchors, , such that the following properties hold: We next define the function connect. In Section 4, we will see that this definition is well motivated by the relationship with anchored edit distance. For a pair of anchors I ′ , I such that I ′ ≺ I: -The gap in string S 1 between anchors I ′ and I is Similarly, the gap between the anchors in string S 2 is g 2 = max(0, I.c−I ′ .d− 1). Define the gap cost g(I ′ , I) = max(g 1 , g 2 ). The same definitions are used for weak precedence, only using ≺ w in the place of ≺. Regardless of the definition of connect, the above problem can be trivially solved in O(n 2 ) time and O(n) space. First sort the anchors by the component A[·].a and let A ′ be the sorted array. The chaining problem then has a direct dynamic programming solution by filling an n-sized array C from left-to-right, such that C[i] reflects the cost of an optimal chain that ends at anchor A ′ [i]. The value C[i] is computed using the recursion: where the base case associated with anchor A lef t is C[1] = 0. The optimal chaining cost will be stored in C[n] after spending O(n 2 ) time. We will provide an O(n log 4 n) time algorithm for this problem.

Anchored Edit Distance
The edit distance problem is to identify the minimum number of operations (substitutions, insertions, or deletions) that must be applied to string S 2 to transform it to S 1 . Edit operations can be equivalently represented as an alignment (a.k.a. edit transcript) that specifies the associated matches, mismatches and gaps while placing one string on top of another. The anchored edit distance problem is as follows: given strings S 1 and S 2 and a set of n anchors A, compute the optimal edit distance subject to the condition that a match supported by an anchor has edit cost 0, and a match that is not supported by an anchor has edit cost 1.
The above problem is solvable in O(|S 1 ||S 2 |) time and space. We can assume that input does not contain redundant anchors, therefore, the count of anchors is ≤ |S 1 ||S 2 |. Next, the standard dynamic programming recursion for solving the edit distance problem can be revised. Let D[i, j] denote anchored edit distance between S 1 [1, i] and S 2 [1, j] and the match is supported by some anchor, and x = 1 otherwise.

Graph Representation of Alignment
It is useful to consider the following representation of an alignment of two strings S 1 and S 2 . As illustrated in Figure 2, we have a set of |S 1 | top vertices and |S 2 | bottom vertices. There are two types of edges between the top and bottom vertices: (i) A solid edge from ith top vertex to the jth bottom vertex. This represents an anchor supported character match between the ith character in S 1 and the jth character in S 2 ; (ii) A dashed edge from the ith top vertex to the jth bottom vertex. This represents a character being substituted to form a match between S 1 [i] and S 2 [j] or a character match not supported by an anchor. All unmatched vertices are labeled with an 'x' to indicate that the corresponding character is deleted. An important observation is that no two edges cross.
In a solution to the anchored edit distance problem every solid edge must be 'supported' by an anchor. By 'supported' here we mean that the match between the corresponding characters in S 1 and S 2 is supported by an anchor. In Figure  2, these anchors are represented with rectangles above and below the vertices. We use M to denote a particular alignment. We also associate an edit cost with the alignment, denoted as EDIT (M). This is equal to the number of vertices marked with x in M plus the number of dashed edges in M.  The proposed algorithm still uses the recursive formula given in Section 2.1. However, it uses range minimum query (RmQ) data structures to avoid having to check every anchor .a. We achieve this by considering six cases concerning the optimal choice of the prior anchor. We use the best of the six distinct possibilities to determine the optimal C[i] value. This C[i] value is then used to update the RmQ data structures. For the strict precedence case, some of the six cases require up to four dimensions for the range minimum queries. When only weak precedence is required, we reduce this to at most three dimensions. When the fixed-length property holds (e.g., k-mers), we reduce this to two dimensions.
Algorithm for chains under strict precedence. The first step is to sort the set of anchors A using the key A[·].a. Let A ′ be the sorted array. We will next use six RmQ data structures labeled T 1a , T 1b , T 2a , T 2b , T 3a , T 3b . These RmQ data structures are initialized with the following points for every anchor: For anchor where the corresponding points are given weight 0. We then process the anchors in sorted order and update the RmQ data structures after each iteration. On the ith iteration, for j < i, we let C[j] be the optimal co-linear chaining cost of any ordered subset of . For i > 1, RmQ queries are used to find the optimal j < i by considering six different cases. We The query for each RmQ structure is determined by the different inequalities relating I.a, I.b, I.c, and I.d to previous anchors in the case considered. For example, in Case 1.a (Figure 3), it can be seen that 1. Case: I ′ disjoint from I.
(a) Case: The gap in S 1 is less or equal to gap in S 2 (Fig. 3 (Left)). The range minimum query (query input) is of the form: (b) Case: The gap in S 2 is less than gap in S 1 . The range minimum query is of the form [0, (a) Case: I ′ and I overlap only in S 2 (Fig 3 (Middle)). The range minimum query is of the form [0, Hence, the range minimum query is of the form Finally, let C[i] = min(C 1a , C 1b , C 2a , C 2b , C 3a , C 3b ) and update the RmQ structures as shown in the Pseudo-code in Appendix 1. In the pseudo-code, every RmQ structure T has the query method T .RmQ() which takes as arguments an interval for each dimension. It also has the method T .update(), which takes a point and a weight and updates the point to have the new weight. The fourdimensional RmQ structures for Case 3.a require O(log 4 n) time per query and update, causing an over time complexity that is O(n log 4 n). In Appendix 2 we present the modifications for weak precendence and fixed-length anchors.

Proof of Equivalence
Theorem 2. For a fixed set of anchors A, the following quantities are equal: the anchored edit distance, the optimal co-linear chaining cost under strict precedence, and the optimal co-linear chaining cost under weak precedence.
The optimal co-linear chaining cost is defined using the cost function described in Section 2.1. An implication of Theorems 1 and 2 is that if only the anchored edit distance is desired (and not an optimal strictly ordered anchor chain), there exists a O(n log 3 n) for computing this value. Theorem 2 will follow from Lemmas 1 and 2.
Lemma 1. Anchored edit distance ≤ optimal co-linear chaining cost under weak precedence ≤ optimal co-linear chaining cost under strict precedence.
Proof. The second inequality follows from the observation that every set of anchors ordered under strict precedence is also ordered under weak precedence. We now focus on the inequality between anchored edit distance and co-linear chaining cost under weak precedence. Starting with an anchor chain under weak precedence, A[1], A [2], . . . with associated co-linear chaining cost x, we provide an alignment with an anchored edit distance that is at most x. This alignment is obtained using a greedy algorithm that works from left-to-right, always taking the closest exact match when possible, and when not possible, a character substitution or unsupported exact match, or if none of these are possible, a deletion. We now present the details.   unsupported exact matches and g 2 − g 1 deletions, which is g 2 edits in total. Also, connect(I ′ , I) = max{g 1 , g 2 } = g 2 .
Continuing this process until A right , all symbols in S 1 and S 2 become included in the alignment. We delay the details of Lemma 2's proof to Section 4.1.
Lemma 2. For a set of anchors A, optimal chaining cost under strict precedence ≤ anchored edit distance.
Proof. We start with an arbitrary alignment M supported by A. We will show in

Details of Lemma 2 Proof
We apply Algorithm Algorithm (i). Algorithm for removing incomparable anchors. Let I and I ′ be two incomparable anchors under weak precedence (Fig 6). The anchor that has the rightmost supported solid edge will be the anchor we keep. Suppose wlog it is I. Working from right-to-left, starting with that rightmost edge, for any edge e that is contained but not supported by I, we replace e with the rightmost of e ′ and e ′′ . Note that at least one side of every edge supported by I ′ is within an interval of I. Hence, all edges supported by I ′ are eventually replaced. We then remove I ′ . This algorithm is repeated until a total ordering under weak precedence is possible.
Algorithm (ii). Algorithm for removing anchors with nested intervals. Consider two anchors I and I ′ where wlog I ′ has an interval nested in one of the intervals belonging to I. Let e R be the rightmost edge supported by I. Working from right-to-left, we replace any edge e to the left of e R that is contained but not supported by I with the rightmost of e ′ and e ′′ . Next, working from left-to-right, we replace any edge e to the right of e R that is contained but not supported by I with the leftmost of e ′ and e ′′ . These procedures combined will replace all edges supported by I ′ with those supported by I. We repeat this until there are no two nested intervals amongst all remaining anchors. Finally, remove all anchors that do not support any edge. We call such an anchor chain where every anchor supports at least one edge minimal.   Proof. For Algorithm (i), suppose we are replacing an edge e not supported by the anchor I, the anchor we wish to keep. Suppose wlog that e ′ is the rightmost of e ′ and e ′′ , so we replace e with e ′ . Because the edge immediately to the right of e is also aligned with I, deleting S 2 [k] and matching S 2 [I.c + h − I.a], does not require modifying any additional edges. If e was a solid edge the edit cost is unaltered, since the total number of deletions and matches is unaltered. If e was a dashed edge, replacing e with e ′ converts a substitution or unsupported exact match at S 2 [k] to a deletion, and removes a deletion at S 2 [I.c + h − I.a], decreasing the edit cost by 1. The same arguments hold for Algorithm (ii) when we replace edges from right-to-left. In Algorithm (ii) when we process edges from left-to-right, since any edges to left of the edge e being replaced are supported by I, replacing e with the leftmost of e ′ and e ′′ does not require modifying any additional edges. Again, if e is solid, the edit cost is unaltered, and if e is dashed, the edit cost is decreased by 1.
Lemma 4. The greedy algorithm described in the proof of Lemma 1 produces an optimal alignment for a 'minimal' anchor chain under strict precedence.
Proof. Proof is similar to proof of Lemma 3 and is deferred to Appendix 3.

Lemma 5.
For an anchor chain under strict precedence, the edit cost of the alignment produced by the greedy algorithm described in the proof of Lemma 1 is equal to the chaining cost.
Proof. This follows from induction on the number of anchors processed, using the same arguments used in the proof of Lemma 1. However, only I ′ .b = I.b needs to be considered in Cases 1 and 2 leading to equality in these cases.

Implementation
In multi-dimensional RmQs, O(nlog d−1 n) storage requirement and irregular memory access during a query can limit their efficacy in practice [4]. We can take advantage of two observations to design a more practical algorithm. First, if sequences are highly similar, their edit distance will be relatively small. Hence the anchored edit distance, denoted in this section as OP T , will be relatively small for MUM or MEM anchors. Second, if the sequences are dissimilar, then the number of MUM or MEM anchors, n, will likely be small. These observations allow us to design an alternative algorithm (Algorithm 1) that requires O(n) worst-case space and O(n · OP T + n log n) average-case time over all possible inputs where n ≤ max(|S 1 |, |S 2 |), i.e., the number of anchors is less than the longer sequence length (proof is deferred to Appendix 3) . This property always holds when the anchors are MUMs and is typically true for MEMs as well. This makes the algorithm presented here a practical alternative.
As before, let A be the initial (possibly unsorted) set of anchors, but with A lef t = A [1] and A right = A[n]. We assume wlog |S 1 | ≥ |S 2 |. We begin by sorting anchor set A by the component A[·].a and making a guess for the optimal solution, B (Algorithm 1). The value B is used at every step to bound the range of A[·].a values that need to be examined. This bounds the number of anchors that need to be considered (on average). If C[n] is greater than our current guess B after processing all n anchors, we update our guess to B 2 · B.
Extending the above pseudo-code to enable semi-global chaining, i.e., free anchor gap on both ends of reference sequences, is also simple. In each i-loop, the connection to anchor A lef t must be always considered, and for last iteration when i = n, j must be set to 1. Second, a revised cost function must be used when connecting to either A lef t or A right where a gap penalty is used only for anchor gap over the query sequence. The experiments in the next section use an implementation of this algorithm.
Algorithm 1: O(OP T · n + n log n) average-case algorithm.
In this section, we aim to show that: (i) the proposed algorithm as well as existing chaining methods achieve significant speedup compared to computing exact edit distance using Edlib, and (ii) in contrast to existing chaining methods, our implementation consistently achieves high Pearson correlation (> 0.90) with edit distance while requiring modest time and memory resources.
We implemented Algorithm 1 in C++, and refer to it as ChainX. The code is available at https://github.com/at-cg/ChainX. Inputs are a target string, query strings, comparison mode (global or semi-global), anchor type preferred, i.e., maximal unique matches (MUMs) or maximal exact matches (MEMs), and a minimum match length. We include a pre-processing step to index target string using the same suffix array-based algorithm [31] used in Nucmer4 [18]. Chaining costs computed using ChainX for each query-target pair are provably-optimal.
Existing co-linear chaining implementations. Co-linear chaining has been implemented previously as a stand-alone utility [2,22] and also used as a heuristic inside widely used sequence aligners [5,14,18]. Out of these, Clasp (v1.1), Nucmer4 (v4.0.0rc1) and Minimap2 (v2.22-r1101) tools are available as opensource, and used here for comparison purpose. Unlike our algorithm where the optimization problem involves minimizing a cost function, these tools execute their respective chaining algorithms using a score maximization objective function. Clasp, being a stand-alone chaining method returns chaining scores in its output, whereas we modified Minimap2 and Nucmer4 to print the maximum chaining score for each query-target string pair, and skip subsequent steps. To enable a fair comparison, all methods were run with single thread and same minimum anchor size 20. Accordingly, ChainX, Clasp and Nucmer4 were run with MUMs of length ≥ 20, and Minimap2 was configured to use minimizer k-mers of length 20. For these tests, we made use of an Intel Xeon Processor E5-2698 v3 processor with 32 cores and 128 GB RAM. All tools were required to match only the forward strand of each query string. ChainX and Clasp are both exact solvers of co-linear chaining problem, but use different gap-cost functions. Clasp only permits non-overlapping anchors in a chain, and supports two cost functions which were referred to as sum-of-pair and linear gap cost functions in their paper [22]. We tested Clasp with both of its gap-cost functions, and refer to these two versions as Clasp-sop and Clasp-linear respectively. Both the versions solve co-linear chaining using RmQ data structures, requiring O(n log 2 n) and O(n log n) time respectively. Both require a set of anchors as input, therefore, we supplied the same set of anchors, i.e., MUMs of length ≥ 20 as used by ChainX. Minimap2 and Nucmer4 use co-linear chaining as part of their seed-chain-extend pipelines. Both Minimap2 and Nucmer2 support anchor overlaps in a chain, as well as penalize gaps using custom functions. However, both these tools employ heuristics (e.g., enforce a maximum gap between adjacent chained anchors) for faster processing which can result in suboptimal chaining scores. Runtime and memory comparison. We downloaded the same set of query and target strings that were used for benchmarking in Edlib paper [28] 2 . These test strings, ranging from 10 kbp to 5000 kbp in length, allowed us to compare tools for end-to-end global sequence comparisons as well as semi-global comparisons at various degrees of similarity levels. To test end-to-end comparisons, the target string had been artificially mutated at various rates using mutatrix (https://github.com/ekg/mutatrix), whereas for the semi-global comparisons, a substring of the target string had been sampled and mutated. Table 1 presents runtime and memory comparison of all tools. Columns of the table are organized to show tools in three categories: edit distance solver (Edlib); optimal co-linear chaining solvers (ChainX, Clasp-sop, Clasp-linear); and heuristic implementations (Nucmer4, Minimap2). We make the following observations here. First, chaining methods (both optimal and heuristic-based) are significantly faster than Edlib in most cases, and we see up to three orders of magnitude speedup. Second, within optimal chaining methods, Clasp-sop's time and memory consumption increases quickly with increase in count of anchors, which is likely due to irregular memory access and storage overhead of its algorithm that uses a 2d-RmQ data structure. Finally, we note that Minimap2 and Nucmer4 are often faster than exact algorithms during global string comparisons due to their fast heuristics.
All tools (except Edlib) use an indexing step such as building a k-mer hash table (Minimap2) or computing suffix array (ChainX, Clasp-sop, Clasp-linear, Nucmer4). Indexing time was excluded from reported results, and was found to be comparable. For instance, in the case of semi-global comparisons, ChainX, Nucmer4, Minimap2 required 590 ms, 736 ms, 236 ms for index computation respectively. Correlation with edit distance. We checked how well the chaining cost (or score) correlates with edit distance. We use absolute value of Pearson correlation coefficients for a comparison. In this experiment, we simulated 100 query strings within three similarity ranges: 90−100%, 80−90% and 75−80%. shows the correlation achieved by all the tools. Here we observe that ChainX and Clasp-sop are more consistent in terms of maintaining high correlation across all similarity ranges. Between the two, ChainX was shown to offer superior scalability in terms of runtime and memory usage (Table 1). Hence, ChainX can be useful in practice when good performance and accuracy is desired across a wide similarity range. Effect of anchor type and minimum match length. How many anchors are given as input will naturally affect the performance and output quality of a chaining algorithm. We tested runtime and correlation with edit distance achieved by ChainX while varying the anchor type (MUMs/MEMs) and minimum matchlength l min parameter (Table 3). When MUMs are used as anchors, we observe good scalability, and lowering l min from 20 to 10 improves the correlation, but the correlation saturates afterwards. This is because very short exact matches will unlikely be unique and won't be selected as MUMs. However, when MEMs are used as anchors, correlation continues to improve with decreasing minimum length parameter, however, runtime grows exponentially. Excessive count of anchors improves the correlation but then anchor chaining becomes computationally demanding.

Conclusions
This work provides new algorithms for co-linear chaining, a fundamental problem in bioinformatics. Variants of this technique have been regularly used in alignment tools since four decades [32]. We addressed an open problem pertaining to the general case of this problem which allows anchor overlaps and penalizes gap cost between adjacent chained anchors. The proposed algorithms for multiple versions of this problem are provably-optimal and efficient, and can be incorporated in read mappers. We also discussed a new cost function for the co-linear chaining problem that enabled us to establish the first mathematical link between co-linear chaining and the edit distance problem. This result is a useful addition to a prior result [16] where a connection between the co-linear chaining problem and the longest common subsequence problem was made.
Proof. This follows from an exchange argument. Suppose there exists an optimal alignment M on the anchor chain B that is not the same as the alignment M G produced by the greedy algorithm. As we process the edges from left-to-right, consider when the first discrepancy in the edges is found, the leftmost edges Let e prev be the previous edge on which the M and M G coincided. We claim e can be replaced with e G without increasing the edit cost.
-Case: e G is solid and left of e. Then e can clearly be exchanged for e G with no increase in edit cost. -Case: e G is solid and not left of e. We assume WLOG k < k G (Figure 7).
Suppose e G is supported by an anchor I. This contradicts our assumption that e G is the first edge in M G to the right of e prev . . Here the dashed edge e G is optimal and swapping e with e G will reduce the edit cost.  Proof. The n log n term is from the sorting the anchors. The total expected time is a constant factor from B 1 n(1+B 2 +. . .+B ⌈log B 2 OP T ⌉ 2 ) = O(n · OP T ).