## Abstract

Single-cell RNA sequencing enables the construction of trajectories describing the dynamic changes in gene expression underlying biological processes such as cell differentiation and development. The comparison of single-cell trajectories under two distinct conditions can illuminate the differences and similarities between the two and can thus be a powerful tool. Recently developed methods for the comparison of trajectories rely on the concept of dynamic time warping (dtw), which was originally proposed for the comparison of two time series. Consequently, these methods are restricted to simple, linear trajectories. Here, we adopt and theoretically link arboreal matchings to dtw and propose an algorithm to compare complex trajectories that more realistically contain branching points that divert cells into different fates. We implement a suite of exact and heuristic algorithms suitable for the comparison of trajectories of different characteristics in our tool Trajan. Trajan automatically pairs similar biological processes between conditions and aligns them in a globally consistent manner. In an alignment of singlecell trajectories describing human muscle differentiation and myogenic reprogramming, Trajan identifies and aligns the core paths without prior information. From Trajan’s alignment, we are able to reproduce recently reported barriers to reprogramming. In a perturbation experiment, we demonstrate the benefits in terms of robustness and accuracy of our model which compares entire trajectories at once, as opposed to a pairwise application of dtw. Trajan is available at https://github.com/canzarlab/Trajan.

## 1 Introduction

Single-cell RNA sequencing (scRNA-seq) has allowed the detailed dissection of biological processes such as differentiation, development and cell reprogramming. By describing the trajectories along which cells transition to achieve specific cell fates, scRNA-seq can illuminate the dynamic changes in gene expression underlying these processes [19]. Much can be learned from the comparative analysis of single-cell trajectories. Comparing gene expression dynamics along trajectories from two conditions can aid in elucidating the key differences between them and the regulatory programs underpinning the process. For example, comparing the trajectories underlying a given differentiation process in two species would shed light onto the evolutionary differences between these organisms. Comparing the trajectory defining a normal developmental process to that affected by a particular mutation would yield insights into disease mechanisms. Recently, methods have been developed for this purpose, which make use of dynamic time warping (dtw).

Dynamic time warping is a class of algorithms for comparing two time series that advance at different speeds [22]. It was originally developed in the context of automatic speech recognition, but has gained increasing popularity more recently in the comparison of single cell trajectories [1, 6, 9]. Similar to a pairwise sequence alignment that allows for insertions and deletions, dtw finds a mapping (warping) between similar elements in the two sequences to overcome locally stretched and compressed sections. In single-cell trajectories, cells are ordered along pseudo-time and can be aligned based on the expression values of (a subset of) their genes to establish a common pseudotime axis along which expression kinetics become comparable between different conditions.

Dynamic time warping can only compare two time series at a time, and thus current methods for comparing single-cell trajectories are limited to linear trajectories or rely on picking the correct path from a complex trajectory. Even though not based on dtw, MATCHER [23] is similarly restricted to two linear trajectories, built from transcriptomic and epigenetic measurements. It is relevant to mention that complex cell trajectories are common in developmental processes and also arise in response to genetic perturbations [18]. In these cases, prior information such as a set of defined markers would be necessary to pick the most relevant path, but this information is often not available. Another potential caveat of dtw is that it ignores cells that lie on alternative paths and could potentially amplify the signal used to infer the mapping between trajectories.

We present Trajan, a novel method to compare and align complex trajectories with multiple branch points diverting cells into alternative fates (Fig. 1). Trajan automatically identifies the correspondence between biological processes in two trajectories and aligns all of them simultaneously, taking into account their overlap. Given that cells that diverted into different fates share a common ancestry, they cannot be treated as independent from each other. Their independent pairwise alignment (using dtw) could introduce inconsistencies with respect to the mapping of common progenitor cells. Akin to the extension of pairwise alignments to multiple sequence alignment, we seek the best alignment between all corresponding pairs of paths that agree on common progenitor cells. To this end, Trajan adopts arboreal matchings [5] to capture globally consistent similarities between trajectories. Arboreal matchings were originally proposed in the context of phylogentic trees and here we theoretically link them to dynamic time warping. We develop a suite of exact and heuristic algorithms that are suitable for the comparison of trajectories of different characteristics. When aligning single-cell trajectories describing human muscle differentiation and myogenic reprogramming, Trajan automatically identifies the core paths from which we are able to reproduce recently reported barriers to reprogramming. In a perturbation experiment, Trajan correctly maps identical cells in a global view of trajectories, as opposed to a pairwise application of dtw.

## 2 Methods

Dynamic time warping is the algorithmic workhorse underlying current methods that compare linear single-cell trajectories. In the next section we briefly review the concept of dynamic time warping and show that an attempt to generalize dtw to complex trajectories naturally leads to arboreal matchings between trees, which we have introduced previously in the context of phylogenetic trees [5]. Proofs of Theorems and Lemmas can be found in the Appendix, Section 6.1.

### 2.1 DTW versus arboreal matching

As in classical sequence alignment, dtw matches similar elements in two sequences while preserving their order. To account for different speeds at which the two sequences advance, however, each element of one sequence can be mapped to multiple elements in the other sequence (Fig. 2 left). More formally, given two time series , and a distance or similarity measure *d*(*x _{i}, y_{j}*) ≥ 0 between the time points

*x*and

_{i}*y*, a

_{j}*warping*is a sequence

*p*= (

*p*

_{1},…,

*p*) with

_{L}*p*= (

_{ℓ}*n*) ∈ [1 :

_{ℓ}, m_{ℓ}*n*] × [1 :

*m*] for

*ℓ*∈ [1 :

*L*] that satisfies the following three conditions: (i)

*Boundary*:

*p*

_{1}= (1,1) and

*p*= (

_{L}*n, m*). (ii)

*Monotonicity*:

*n*

_{1}≤

*n*

_{2}≤ ⋯ ≤

*n*and

_{L}*m*

_{1}≤

*m*

_{2}≤ ⋯ ≤

*m*. (iii)

_{L}*Step size*:

*p*

_{ℓ+1}−

*p*∈ {(1, 0), (0,1), (1,1)} for

_{ℓ}*ℓ*∈ [1 :

*L*− 1]. Note that the example warping in Figure 1 (left) contains no pair of crossing edges and thus preservers the order of the two sequences.

The classic dtw aims to find a warping p minimizing the total distance between mapped elements:

The optimal warping can be computed by a dynamic program that solves:

There are various extensions of the classic dtw described above that can be mainly classified as 1) restricting the range of the mapping to a certain window; 2) assigning different weights to different types of steps; and 3) using different step patterns, e.g. *p*_{ℓ+1} − *p _{ℓ}* ∈ {(1,1), (1,2), (2,1)} in (iii). In the following, we consider the widely used classic dtw which is also the default scheme for computing dtw [24]. Since state-of-the-art methods like Monocle 2 [20] and DPT [11] aim to construct smooth trajectories, the classic dtw provides the necessary flexibility for most single-cell alignment tasks.

Here, we propose a generalization of classic dtw from paths, i.e., linear trajectories, to trees, i.e., complex trajectories: We want to align each path in tree *T*_{1} to at most one path in *T*_{2} and vice versa and, similar to dtw, preserve the order of nodes along the paths, i.e., no crossing edges. In addition, we require all alignments to be consistent, that is, every node must be matched to the same node in all pairwise alignments it is part of. In [5], we have introduced *arboreal matchings* that formalize such a consistent path-by-path alignment of trees: An arboreal matching is a matching *M*, i.e., one-to-one correspondence between nodes in trees *T*_{1} and *T*_{2} such that for any (*u*_{1}, *v*_{1}), (*u*_{2}, *v*_{2}) ∈ *M*, *u*_{2} is a descendant of *u*_{1} iff *v*_{2} is a descendant of *v*_{1}.

In contrast to dtw, an arboreal matching *M* matches each node (cell) to at most one similar node (cell) in the other tree (trajectory). It is not required to cover all nodes between each pair of paths, but we can flexibly penalize nodes that remain unmatched by *M* in the objective function:
where the cost of leaving node *u*(*v*) unmatched is *d*(*u*, −) > 0 (*d*(−, *v*) > 0). In fact, the arboreal matching of minimum cost (2) between two paths *P* = (*x*_{1},…, *x _{n}*) and

*Q*= (

*y*

_{1},…,

*y*) can be solved by a very similar dynamic program as in dtw (1):

_{m}An example arboreal matching between two paths is shown in Figure 2 (right). Again, the noncrossing edges align the two time-series to reveal similarities and unmatched nodes indicate compressed or stretched sections. This makes arboreal matchings as flexible as dtw in the comparison of two trajectories. More specifically, we will show that by choosing an appropriate penalty for unmatched vertices, the optimal dtw and the optimal arboreal matching yield similar measures of similarity or distance of the compared trajectories. Denote by *d _{dtw}* and

*d*the optimal value of the classic dtw and the arboreal matching between two paths

_{M}*P*and

*Q*, respectively. The following theorem provides an upper bound on

*d*.

_{dtw}*Let* . *If* *and* , *then*
*where k is the minimum number of edges that need to be removed to transform the optimal warping to an arboreal matching*.

Next, we develop a lower bound theorem for the classic dtw. An edge (*x, y*) in the warping *p* is called *redundant* if both vertices *x* and *y* are covered by at least two edges in *p*.

*There exists an optimal warping of the classic dtw without redundant edges*.

Given an optimal warping *p**, we assign penalties to unmatched vertices such that *d _{M}* ≤

*d*. Let

_{dtw}*p** be an optimal warping without redundant edges, define

Then, we impose penalties

We define penalties *d*(*x*, −), *x* ∈ *P*, analogously. Since *p** has no redundant edges, *d*(−, *y*) is uniquely defined. Conversely, if there exist *x*_{1}, *x*_{2} ∈ *L*_{1}(*p**) such that (*x*_{1}, *y*) ∈ *p**, (*x*_{2}, *y*) ∈ *p**, the non-redundancy of *p** is violated. We have the following lower bound theorem.

*If d*(*x*, −), *d*(−, *y*) *are defined as in* (4), *then d _{M}* ≤

*d*.

_{dtw}In Section 3.1 we illustrate how closely the optimal arboreal matchings based on lower and upper bound penalty scheme follow the optimal dtw path.

### 2.2 Limitations of the naïve ILP formulation

Finding the matching minimizing (2) can be phrased as a maximum matching problem that explicitly forbids the two possible types of ancestry violations: Two edges can be crossing, or two nodes on the same root-to-leaf path are matched to nodes on different root-to-leaf paths (Figure 3). The former constraint is equally imposed by dtw, the latter is a consequence of the simultaneous comparison of multiple paths and prevents arbitrary jumps between biological processes in the comparison. In our proof-of-concept study [5], we describe feasible arboreal matchings between two rooted trees *T*_{1} = (*V, E*_{1}), *T*_{2} = (*V*_{2}, *E*_{2}), by the following simple ILP:
where indicator variables *x _{i,j}* denote the presence or absence of an edge (

*i,j*), weights

*w*(

*i,j*) :=

*d*(

*i*, −) +

*d*(−,

*j*) −

*d*(

*i, j*). Pairs of edges (

*i, j*) and (

*k, l*) are

*compatible*if it holds that

*k*is a descendant of

*i*in

*T*

_{1}iff

*l*is a descendant of

*j*in

*T*

_{2}. Set contains pairs of edges {(

*i, j*), (

*k, l*)} that are incompatible, i.e., they are either crossing or one-sided independent (Figure 3).

As our experiments in Section 3.3 show, this ILP formulation does not allow to practically align trajectories comprising as few as 100 single cells. In the following theorem, we identify its weak LP-relaxation as a theoretical explanation for this empirical performance, since the search space that needs to be explicitly explored by an ILP solver depends on the strength of the LP relaxation. Let OPT denote an optimal solution to the above ILP and let *w*(OPT) be its optimal score. Let |*V*_{1}| = *n*, |*V*_{2}| = *m*, and w.l.o.g we assume *n* ≤ *m*.

*The integrality gap of the linear programming relaxation of* (P) *is n* − *o*(1).

### 2.3 A branch-and-cut algorithm for arboreal matchings

In this section, we introduce a thoroughly engineered branch-and-cut algorithm that allows to practically compare complex single-cell trajectories. Its main ingredients are (i) cuts that trim the LP relaxation closer to the convex hull of feasible arboreal matchings (Section 2.3.1), (ii) polynomial-time algorithms that can find these cuts on demand (Section 2.3.2), (iii) a branch-and-bound scheme that makes use of modern CPU architectures (Section 2.3.3), and (iv) an in-house developed, non-commercial, non-linear solver that we use for all continuous optimization problems (Appendix, Section 6.3). In Section 6.5 in the appendix, we generalize arboreal matchings and our branch-and-cut algorithm to directed acyclic graphs.

#### 2.3.1 Valid clique constraints

The next theorem motivates the addition of valid inequalities to reduce the integrality gap of the LP relaxation. Let (PC) be the LP-relaxation of the ILP that we obtain by replacing the pairwise incompatibilities (7) in (P) by the more general clique inequalities
where is a set of cliques, i.e., sets of pairwise incompatible edges. Denote by *E*(*T*_{1}, *T*_{2}) the set of edges between two trees. For each *M* ⊂ *E*(*T*_{1}, *T*_{2}), let *r*(*M*) denote the maximum size of any feasible (unweighted) arboreal matching contained in *M*.

*If we can write any set M* ⊂ *E*(*T*_{1}, *T*_{2}) *as union of at most r*(*M*) *cliques, that is*, , *then the integrality gap of the linear program* (PC) *is at most* log *n* + 1.

Inspired by this theorem, we strengthen the LP relaxation by lifting pairwise incompatibility constraints (7) to maximal sets (cliques) of pairwise crossing edges and maximal sets (cliques) of edges that are pairwise semi-independent (Figure 3). As our experiments in Section 3.3 indicate, these lifted constraints result in stronger bounds that allow us to prune larger parts of the search space. Due to their exponential number, we add lifted constraints only on demand, that is, if they cut off the current optimal fractional solution. In the next section, we describe how to assess this demand in polynomial time.

#### 2.3.2 Polynomial-time separation algorithms

In the following, we consider trees *T*_{1} = (*V*_{1}, *E*_{1}) and *T*_{2} = (*V*_{2}, *E*_{2}) with roots *r*_{1} and *r*_{2}, sets of leaves and , and parent mappings *π*_{1} and *π*_{2}, respectively. For two vertices *p, q* ∈ *V _{i}*, denote by [

*p, q*] the unique path in

*T*between

_{i}*p*and

*q*.

##### Crossing edges clique constraints

A maximal set of pairwise crossing edges between two fixed root-to-leaf paths [*r*_{1}, *ℓ*_{1}], [*r*_{2}, *ℓ*_{2}] can be obtained by the following procedure. Starting from an edge between leaf *ℓ*_{1} in *T*_{1} and the root of *T*_{2}, in each step we either move up along [*r*_{1}, *ℓ*_{1}] and keep the node in *T*_{2} fixed, or we move down along [*r*_{2}, *ℓ*_{2}] and keep the node in *T*_{1} fixed. Analogously we can start from edge (*r*_{1}, *ℓ*_{2}). Figure 3 (left) shows one possible outcome of this procedure.

Given a fractional solution *x** to the current LP relaxation, the separation problem asks to find a hyperplane that cuts off (separates) *x** from the polytope without losing any feasible integral solution, i.e., arboreal matching. Since each maximal set (clique) of edges obtained by the procedure above are pairwise incompatible, the sum of their fractional values must not exceed 1:

Among the exponentially many crossing cliques we can identify one for which (9) is (most) violated efficiently by a dynamic program. For fixed paths *P*_{1} = [*r*_{1}, *ℓ*_{1}], *P*_{2} = [*r*_{2}, *ℓ*_{2}], let *D*[*u, v*] denote the maximum (with respect to *x**) clique between [*r*_{1}, *u*] and [*v, ℓ*_{2}]. It can be defined recursively as the better choice between moving up in *P*_{1} or down in *P*_{2}:
where *π*_{2}(*v*′) = *v*. The maximum *x**-weight clique *D*[*ℓ*_{1}, *r*_{2}] can then be computed by a dynamic program in time . We can generalize this dynamic program to two trees by considering all child nodes *v*′ in the recursion instead of the unique descendant along the path:

*Given a fractional solution x** *we can determine whether a crossing edge clique inequality* (9) *is violated in time* .

##### Semi-independent clique constraints

We strengthen the LP relaxation even further by lifting pairs of semi-independent edges to maximal sets (cliques) of pairwise semi-independent edges. Such a set consists of edges that are all incident to nodes on a common root-to-leaf path in one tree, and are incident to nodes in the second tree that all lie on distinct root-to-leaf paths, i.e., are independent (Figure 3 right). Again, edges in such a clique must satisfy

Formally, the separation of semi-independent clique constraints with respect to *T*_{1} and *T*_{2} is given by the following theorem.

*Given a fractional solution x** *we can determine whether a semi-independent clique inequality* (11) *is violated in time* .

Again, Theorem 6 allows us to cut off large parts of the polytope in polynomial time, without losing any feasible arboreal matching.

#### 2.3.3 Obtaining integral solutions

In the previous section we described a tighter relaxation of the original ILP formulation. While this relaxed LP improves upon previous approaches it still provides only fractional solutions in the worst case. Trajan implements four different strategies to obtain integral solutions, ranging from fast, but suboptimal, to more expensive, but optimal methods. With this we try to address the need for tailored trade-offs between accuracy and speed imposed by different single-cell sequencing technologies that assay a variable number of genes at varying resolution in hundreds to millions of cells. We describe heuristic approaches implemented in Trajan in Appendix Section 6.2.

##### Branch and bound

Trajan can compute an optimal arboreal matching by a classical branch and bound algorithm, whose running time can be exponential in the worst case. We have implemented several node selection strategies [7], including best first, depth first and a hybrid approach as well as various common variable selection schemes [3], including most and least fractional variables, and most constrained variables. The initial primal bound is obtained by the simple greedy approach described above or the fixed-parameter tractable algorithm described in Section 2.4. Taking advantage of modern CPU architectures, Trajan can run multiple instances of our solver in parallel, each one using a different node and variable selection strategy while sharing the current best primal bound in memory. In best first mode, Trajan can distribute open subproblems across a user-specified number of processors. Note that a tighter relaxation can speed up the running time of the branch and bound exploration since better dual bounds allow for a better pruning of some of the subtrees in the branch and bound computation graph. This can be seen in the experiments in Section 3.3.

### 2.4 FPT algorithm for small number of cell fates

For trajectories with a small number of cell fates *k* we employ a fixed-parameter tractable algorithm, parameterized by *k*. It guesses the correspondence between paths in the two trajectories and applies a dynamic program similar to [13] to align them optimally, in total time , where *k* is the smaller number of leaves among the two trees comprising *n* and *m* nodes, respectively. Even for large *k*, our branch and bound solver optionally runs the FPT algorithm for a small number of path permutations to derive a primal bound on the optimal solution.

## 3 Results

We have implemented the branch-and-cut algorithm described above and have bundled it with our non-linear solver (Section 6.3) in our novel trajectory alignment tool Trajan. In addition to a full branch and bound scheme, Trajan offers heuristic approaches to transform strong bounds into feasible arboreal matchings as well as an FPT algorithm for small number of cell fates at a dramatically reduced computational cost. Trajan adopts a strategy similar to [6] to prepare the output of Monocle 2 (or similar trajectory reconstruction methods) for a meaningful alignment, including the smoothing and scaling of expression curves.

### 3.1 Lower and upper bounds on dtw

Here, we illustrate the practical relevance of the upper bound (UB, Theorem 1) and lower bound (LB, Theorem 2) that the optimal arboreal matching between two paths can provide on the optimal dtw. We align two simple trajectories constructed from scRNA-seq data on dendritic cells stimulated under two conditions (LPS and PAM) collected at 4 time points after stimulation [21]. The two linear trajectories and the dissimilarity matrix were obtained from a recent study that introduced cellAlign [1], a method that aligns two simple trajectories based on dtw. The optimal solutions computed by cellAlign using dtw and by Trajan using the LB penalty scheme are equivalent (Figure 4). When using the UB penalty scheme, Trajan’s optimal path through the dissimilarity matrix roughly follows the dtw path and represents a solution with almost 2 times larger score.

### 3.2 Trajan reproduces barriers in myogenic reprogramming

Here, we re-analyzed two public single-cell datasets: human skeletal muscle myoblast (HSMM) differentiation and human fibroblasts undergoing MYOD-mediated myogenic reprogramming (hFib-MyoD). These datasets were previously analyzed in [6], where the authors set out to compare these related processes in order to identify molecular barriers that hinder the efficient reprogramming of fibroblasts to myotubes. The authors used known myoblast differentiation markers (CDK1, ENO3, MYOG) to identify the core path within the complex trajectory constructed from hFib-MyoD, and they aligned this path to the core path in normal muscle development (HSMM) using dtw. The authors pointed out that the combined trajectory constructed from cells in both conditions did not intermix cells and thus did not allow to assess critical commonalities and differences in expression dynamics. We repeated the single-cell data analysis described in [6] to obtain the corresponding trajectories from Monocle 2 (Figure 5).

We then sought to align these complex trajectories using our algorithm. We show that Trajan is able to align the core paths of each complex trajectory, without any previous knowledge or path picking, using the same distance measure (correlation) as in the original publication. The global dynamics alignment of HSMM and hFib-MyoD are shown in Figure 4. Interestingly, our approach not only aligns the core trajectories, but it also aligns the branches corresponding to failure of reprogramming, which are characterized in both processes by cells that exited the cell cycle, yet failed to proceed toward differentiation [6], [19].

After performing the trajectory alignment with Trajan, we constructed gene expression kinetics plots for a set of genes that were assessed in [6] to investigate whether our alignment was able to reproduce their reported findings regarding similarities and differences between these two processes. Indeed, we were able to reproduce their key findings: Proliferation marker CDK1 is downregulated both in HSMM and hFib-MyoD; Muscle transcriptional regulators (MEF2C, MYOG) are upregulated later and to a lesser extent in hFib-MyoD compared to HSMM; BMP4 is only expressed in hFib-MyoD and ID family proteins (ID1, ID3) which lie downstream of BMP signaling fail to be downregulated in hFib-MyoD; IGF pathway genes (IGF2, IGF1R) are expressed at higher levels in HSMM (Figure 6).

We evaluated Trajan using penalty schemes that assign the maximum and average weight of incident edges as well as the minimum cost implied by the lower bound Theorem 2 over all pairs of paths (*lb*). While the maximum scheme (*max*) is a direct generalization of the cost scheme applied by Theorem 1, the averaging scheme (*avg*) tries to capture the expected cost of leaving a vertex unmatched and is the default scheme applied by Trajan. All schemes correctly picked the correct core paths in the two trajectories and are robust under subsampling. (Appendix, Figure 8).

### 3.3 Accuracy of Trajan

Here, we compare the accuracy of Trajan in matching “correct” cells between complex trajectories to the path-wise alignment by dtw. To this end, we perturb the hFib-MyoD trajectory output by Monocle 2 by randomly subsampling 80% of the input cells and 80% of the genes used for ordering them along pseudo-time. We align isomorphic trajectories (trees) comprising a variable number of nodes (parameter *ncenter* in Monocle 2), measuring the difference between nodes by Euclidean distance. Since we know the true correspondence of nodes between different perturbed trees, we can count false positive and false negative alignments as a measure of accuracy. In Table 1 we report the number of false positive (FP) and false negative (FN) alignments of the classic dtw run on each true pair of paths, and Trajan using different penalty schemes (avg, max, lb). Trajan takes the entire trees as input, it is not given the correct path-to-path correspondence. Nevertheless, Trajan almost always finds the true correspondence between cells, compared to the path-wise dtw scheme, that introduced both FP and FN alignments.

Table 2 reports the running times of the naïve ILP using the commercial solver IBM ILOG CPLEX 12.7 and Trajan coupled with our in-house non-linear solver on a random subset of the instances introduced above. In addition to the full branch-and-cut implementation (Trajan-BnC), we ran Trajan switching to the FPT algorithm (Trajan-FPT). On a 2.30GHz Linux system using up to 15 threads, Trajan-BnC is at least 13 times faster than the naïve ILP using CPLEX, while Trajan-FPT is another 10 times faster then Trajan-BnC. The speedup of Trajan-FPT on this set of instances is not surprising, since these trees comprise only 3 different leaves (cell fates). CPLEX was not able to solve instances with more than 200 nodes since it exceeded the memory limit of 320 GB.

## 4 Conclusion

We have introduced Trajan, a novel method that allows for the first time the alignment of complex (non-linear) single-cell trajectories. Originally introduced to compare phylogenetic trees, in Trajan we adopt arboreal matchings to perform an unbiased alignment enabling the meaningful comparison of gene expression dynamics along a common pseudo-time scale. Trajan does not make any assumptions concerning the algorithm used to reconstruct the trajectory and can in principle be coupled with any available reconstruction method. In a future algorithm, an arboreal matching between cells might prove useful in guiding a joint learning of trajectories for two biological processes. Furthermore, our generalization to directed acyclic graphs (see Appendix) can be used to align data-driven ontologies and the manually curated Gene Ontology (GO) to assign genes to existing GO terms, but also to infer new terms and potentially confirm or correct hierarchical term-term relationships [8].

## 5 Acknowledgments

Sören Laue has been funded by Deutsche Forschungsgemeinschaft (DFG) under grant LA 2971/1-1. Mislav Blaževič was supported in part by BAYHOST. Francisca Rojas Ringeling was supported by the Bavarian Gender Equality Grant (BGF).

## 6 Appendix

### 6.1 Proofs of Theorems and Lemmas

(sketch) The first inequality is proven by induction on *i* + *j*.

Let *p** be the optimal warping and k the minimum number of edges that need to be removed to transform *p** to a feasible arboreal matching *M*. Since *M* has *k* unmatched vertices, we have

This implies that *d _{M}* ≤

*d*+

_{dtw}*kD*, which also completes the proof of the theorem.

Conversely, let *p** be an optimal warping such that (*x _{i}, y_{j}*) ∈

*p** and both

*x*and

_{i}*y*are covered by at least two edges in

_{j}*p**. From the coverage property of the warping we must have (

*x*

_{i}, y_{j−1}) ∈

*p** or (

*x*

_{i}, y_{j+1}) ∈

*p**. If (

*x*

_{i}, y_{j−1}) –

*p**, we get (

*x*

_{i+1},

*y*) ∈

_{j}*p** since

*y*is covered by at least two edges in

_{j}*p** and by the warping conditions. As a results,

*D*(

*i, j*) =

*D*(

*i, j*− 1) +

*d*(

*x*

_{i}, y_{j}) and

*D*(

*i*+ 1,

*j*) =

*D*(

*i, j*) +

*d*(

*x*

_{i+1},

*y*) =

_{j}*D*(

*i, j*− 1) +

*d*(

*x*) +

_{i}, y_{j}*d*(

*x*

_{i+1},

*y*). From (1), we obtain

_{j}*D*(

*i*+ 1,

*j*) ≤

*D*(

*i, j*− 1) +

*d*(

*x*

_{i+1},

*y*). This implies that

_{j}*d*(

*x*) ≤ 0. Since

_{i}, y_{j}*d*(

*x*) > 0 we must have

_{i}, y_{i}*d*(

*x*) =0. As a consequence, we can remove (

_{i}, y_{j}*x*) from

_{i}, y_{j}*p** without violating the warping conditions. The case (

*x*

_{i}, y_{j+1}) ∈

*p** is proven in an analogous manner.

Let *p** be a non-redundant optimal warping. For every vertex *x* ∈ *L*_{1}(*p**) and *y* ∈ *L*_{2}(*p**) we delete all incident edges but one, which results in an arboreal matching of the same cost as dtw *p**. Hence, it implies that *d _{M}* ≤

*d*.

_{dtw}Let , hence *K* is bounded above by *w*(OPT). Moreover, for any feasible solution *x* to the relaxation, we have

Therefore, the optimal value of the LP relaxation is at most *n* times *w*(OPT). Our bad instance consists of the two rooted trees shown in Figure 7, with *w*(red/blue edges) = 1 and *w*(·) = 0 otherwise. Any pair of nonzero weight edges are incompatible, so the maximum cost matching is 1. Let , and *x*(·) = 0 otherwise, where is a binary number such that iff *n* = *m*. Hence, *x* is a feasible solution with cost of (*n* − 1)^{2}/(*n* − 1) + 1 = *n* if *n* ≠ *m* and if *n* = *m*. Therefore, the optimal value of the LP relaxation at least *n* − *o*(1).

Let *x** be an optimal solution of the LP relaxation with clique constraints. We show that there exists a feasible arboreal matching with cost at least , where . Let *l* be the smallest integer such that for all *e* ∈ *E*(*T*_{1}, *T*_{2}), , where *n _{e}* ∈ ℕ. We will decompose

*lx** =

*x*

^{1}+

*x*, where

^{N}*x*is an incident vector (with ground set E) of some feasible arboreal matching. Next, we are going to show that

^{i}*N*≤

*l*(log

*n*+ 1). The procedure for finding

*x*is inductive. Let

^{i}*y*=

^{i}*lx** − (

*x*

^{1}+ ⋯ +

*x*) for

^{i}*i*= 1,2,… and

*y*

^{0}=

*lx**. Define

Let *x _{i}* be an incident vector of a maximum feasible arboreal matching in

*E*such that

_{i}*x*(

^{i}*E*) =

_{i}*r*(

*E*). By assumption, we can write , where is a clique. As the result, we have

_{i}Hence,

Inductively,

Hence, we have

After at most *k*(*k* ≤ *l* log *n*) steps we have . It is easy to see that at most *l* more steps are needed to reach integer *N* such that *lx** = *x*^{1} + ⋯ + *x*^{N}, where *x _{i}* is an incident vector of some feasible matching. Hence, we have

By an averaging argument, there exists an *x ^{i}* such that .

For *u* ∈ *V*_{1} and *v* ∈ *V*_{2}, let *D*[*u, v*] denote the weight of a maximum *x**-weight clique between [*r*_{1}, *u*] and [*v, ℓ _{i}*], for any leaf

*ℓ*in the subtree rooted at

_{i}*v*. Then and

For a root-to-leaf path [*r*_{1}, *ℓ _{j}*] in

*T*

_{1}we assign weights to nodes

*v*in

*T*

_{2}as follows:

Note that this can be done in . Then, a -maximum weight independent set of vertices in *T*_{2} gives a maximum semi-independent clique with respect to *P*_{1}. Let denote the weight of a maximum *x**-weight clique between [*r*_{1}, *ℓ _{j}*] and an independent set in the subtree of

*T*

_{2}rooted at

*v*. Then

The maximum *x**-weight clique between [*r*_{2}, *ℓ _{j}*] and an independent set in the subtree of

*T*

_{1}rooted at

*v*is defined analogously in time . Finally, the maximum weight semiindependent clique constraint can be computed as

### 6.2 Heuristic approaches implemented in Trajan

A simple *greedy approach* adds the best, i.e., lowest cost edge to a set of already selected edges that is not in conflict with any of them. This strategy can be applied to large scRNA-seq sample comprising hundreds of thousands of cells. Since it completely ignores any optimal fractional solution, it can return very suboptimal solutions.

In contrast, a *randomized rounding* scheme takes into account the global dependence of variables in the LP relaxation: A fractional variable in the optimal solution to the improved LP relaxation is rounded to 1 with probability *x _{i,j}*, such that the expected objective function value of the integral solution is the same as of the optimal fractional solution. Potentially introduced conflicts are subsequently resolved by greedily retaining the most valuable (with respect to

*w*(

*i, j*)) edges. The tighter the LP relaxation the better the integral solution will be.

Another approach is to add the following *non-linear constraints* to the LP to force the variables *x _{i,j}* to be either 0 or 1:

From a high level point of view these integrality constraints can be thought of as a rank-1 approach to the first level of the Lasserre hierarchy [14, 15]. When trying to solve this NP-hard problem using our non-linear solver (Section 6.3), we obtain integral, but suboptimal solutions. We expect this method to return slightly better solution than the randomized rounding scheme, since in contrast to the latter, it enforces integrality at the same time as it solves the relaxed, fractional problem.

### 6.3 Non-linear solver

We use our non-linear solver to solve all continuous optimization problems, unless stated otherwise. It implements the standard augmented Lagrangian approach [12, 17] in order to deal with nonlinear constraints. The augmented Lagrangian approach runs in iterations and maps the constrained optimization problem to a sequence of unconstrained optimization problems that can have bounds on the variables. These unconstrained optimization problems are then solved by the quasi-Newton solver (L-BFGS-B [16, 25]) that can also deal with bounds on the variables. A detailed description on the individual parts of this quasi-Newton solver can be found in [25].

It has been observed before [2] that when the problem is non-convex an augmented Lagrangian approach can be superior to interior point methods with respect to the solution quality. It often returns better local minimal solutions than an interior point solver. Note, that this is the situation when adding the non-linear integrality constraints (12).

### 6.4 Supplemental Figure

### 6.5 Arboreal DAG matchings

In this section, we generalize arboreal matchings between rooted trees to directed acyclic graphs (DAGs) *G*_{1} = (*V*_{1}, *E*_{1}) and *G*_{2} = (*V*_{2}, *E*_{2}). As for trees, we define pairs of edges (*i, j*) and (*k, l*) between *G*_{1} and *G*_{2} to be *compatible* in an arboreal matching if they preserve the ancestry relationship, that is, if it holds that *k* is a descendant of *i* in *T*_{1} if and only if *l* is a descendant of *j* in *T*_{2}. Adjusting the definition of set of compatible edges to the descendants relationship given by a DAG rather than a tree, ILP (P) defines the maximum weight arboreal matching between DAGs.

The lifted clique constraints (9) and (11) are analogously valid for DAGs, but their separation needs to be adapted. For crossing clique constraints, the dynamic program can be generalized to DAGs by considering all parents of node *u* (i.e., *π*_{1}(*u*) and *π*_{2}(*v*′) here stand for sets of nodes) instead of the unique parent in a tree.
and

The running time of this DP is and thus remains polynomial.

The semi-independent clique constraints need to be computed for each root-to-leaf path in *G*_{1} and *G*_{2}. For a given root-to-leaf path in *G*_{1} (or *G*_{2}), the independent set in *G*_{2} (or *G*_{1}) can be represented as a maximum weight antichain which can be reduced to a standard max-flow problem (see [10]), and hence can be computed efficiently. Since the number of paths in a DAG can be exponential, our approach to separate clique constraints is exponential in the worst case. In fact, we show below that we cannot hope for a polynomial time separation algorithm (assuming **P** ≠ **NP**). Next, we formally state the problem of identifying the most violated (maximum) semi-independent clique between two DAGs and show that it is NP-hard.

**Problem**. *Maximum Semi-independent Clique* (MSC)

*Given two DAGs G*_{1} = (*V*_{1}, *E*_{1}) *and G*_{2} = (*V*_{2}, *E*_{2}) *with weights α*: *V*_{1} × *V*_{2} → ℝ_{+}. *Find a maximum semi-independent clique with respect to the weights α, that is, find a root-to-leaf path P** *in G*_{1} *and an independent set S** *in G*_{2} *such that the total weight* *is maximized*.

Here the weights *α* correspond to the current fractional solution and our goal is to find the most violated semi-independent clique. We prove NP-hardness of MSC via a reduction from the *maximum label cover* problem [4]. Given a bipartite graph *G* = (*A, B, E*), with a partition of *A* and *B* into *k* disjoint sets *A*_{1},…,*A _{k}* and

*B*

_{1},…,

*B*, respectively, the maximum label cover problems is to find subsets of vertices

_{k}*A*′ ⊆

*A*and

*B*′ ⊆

*B*, such that, |

*A*Ȳ ∩

*A*| ≤ 1 and |

_{i}*B*′ ∩

*B*| ≤ 1, for

_{i}*i*= 1,…,

*k*, so as to maximize the number of edges

MSC *is NP-hard*.

*Proof*. Starting from an instance of the maximum label cover problem *G* = (*A, B, E*) and *k* disjoint sets *A*_{1},…, *A _{k}* and

*B*

_{1},…,

*B*, we construct a corresponding instance of MSC problem as follows. For every subset

_{k}*A*, we define a directed graph

_{i}*D*= (

_{i}*N*(

_{i}, A*D*)), where

_{i}*N*= {

_{i}*s*

_{i}, s_{i+1}} ∪

*A*,

_{i}*A*(

*D*) = {(

_{i}*s*), (

_{i}, a*a, s*

_{i+1})|

*a*∈

*A*|. The whole graph

_{i}*G*

_{1}= (

*V*

_{1},

*E*

_{1}) is constructed by concatenating the graphs

*D*according to the order of their indices. Figure 9 depicts such a construction. Let

_{i}*P*be a directed path whose vertex set is

_{i}*B*. The graph

_{i}*G*

_{2}= (

*V*

_{2},

*E*

_{2}) is obtained by connecting a vertex

*s*to one of the endpoints of

*P*= 1,…,

_{i}, i*k*(see Figure 10). The weights

*α*between

*V*

_{1}and

*V*

_{2}are defined as follows.

Here, each root-to-leaf path in *G*_{1} (from *s*_{1} to *s*_{k+1}) corresponds to a selection of exactly one element from set *A _{i}, i* = 1,…,

*k*, and an independent set in

*G*

_{2}corresponds to choosing at most one element from set

*B*= 1,…,

_{i}, i*k*. Since only the weight of edges between

*a*∈

*A*and

*b*∈

*B*with (

*a, b*) ∈

*E*is nonzero, the maximum semi-independent clique between

*G*

_{1}and

*G*

_{2}with respect to the weights

*α*is equivalent to the maximum label cover.