Preprocessing of general stenotic vascular flow data for topological data analysis

A new analysis and classification method of vascular disease based on topological data analysis (TDA) has been proposed in [1]. The proposed method utilizes the application of persistent homology to hemodynamic variables. Particularly, 2D homology is obtained from the velocity field of the flow projected onto the unit sphere, known as so-called the S2 projection. It was shown that such homology is closely related to the degree of vascular disease. The original method was developed based on the computational fluid dynamic (CFD) solutions of the straight stenotic vessels. In this paper, we develop a preprocessing method that enables the proposed TDA method to be applied to general stenotic vessels of irregular geometry. The velocity field is subject to a coordinate transformation correcting for orientation and curved geometry. The preprocessed data is projected onto S2 and the corresponding homology is calculated. We show that this preprocessing is necessary for the proposed TDA method to be successfully applied to general types of stenotic vessels. Validation was performed on a set of clinical data including reconstructed vascular geometry with corresponding diagnostic indices.

Cardiovascular disease is the leading cause of death worldwide. Accurate diagnosis and 2 prediction is crucial. The diagnosis methods today are based on the anatomical 3 approach by investigating the geometry of the diseased vessel using image analysis and 4 the functional approach by measuring the pressure drops across the stenosis. The gold 5 standard diagnostic method uses fractional flow reserve (FFR) values, which are 6 calculated by dividing a distal pressure measurement by a pressure reading proximal to alternative diagnosis methodology [5,6,7]. For these methods, clinicians need a 11 canonical index that can represent the collective knowledge of the diseased vessel. 12 Especially for the machine learning approach, one needs to extract features of the 13 vascular disease for the given patient's data. 14 In [1] a new method of providing a measure of vascular disease has been proposed 15 based on topological data analysis (TDA) [8]. In order to apply TDA to vascular flows 16 for the proposed method in [1], the velocity field was projected onto a unit sphere (S 2 ), 17 referred as the S 2 projection. It was shown that this simple projection method is highly 18 efficient and powerful for visualizing and quantifying the vascular disease, with which 19 the flow complexity is measured by observing the directional information of the velocity 20 fields. For example, the overall flow pattern is laminar for the healthy vessel and is 21 more turbulent when there is stenosis. On S 2 , the laminar flow is projected as a point 22 cloud centered around one pole on the sphere while the fully developed turbulent flow 23 provides a point cloud all over the sphere including two poles. Then the proposed TDA 24 method computes the persistent homology of the point cloud on S 2 . It was shown that 25 the two dimensional homology of S 2 projection of vascular flows is closely related to the 26 degree of disease and can be used along with the value of FFR for diagnosis. The 27 original development of the proposed method was based on the 3D spectral 28 computational fluid dynamic (CFD) data of the regular straight stenotic vasculature. 29 In this paper, we verify that the proposed TDA method can be applied effectively to 30 curved vessels with more general irregularity in a similar way to the case of the straight 31 vessel. In order to apply TDA to general vessel type, we develop a projection method 32 which preprocesses the hemodynamic data such that it becomes suitable for TDA. The 33 proposed projection algorithm is based on the singular value decomposition (SVD) and 34 principle component analysis (PCA). After the preprocessing, we decompose each vessel 35 into small segments and TDA is performed to the velocity fields on S 2 in all the (conjugate transpose) of X. 61 For the hemodynamic variables, let ρ = ρ(x, t) be the density, p = p(x, t) the 62 pressure, u = (u, v, w) T the velocity vector for the position vector x = (x 0 , x 1 , x 2 ) T ∈ Ω 63 and time t ∈ R + . Here Ω is the closed domain in R 3 .

64
Preprocessing 65 The preprocessing happens in several steps. First the orientation of the vessel is 66 regularized by PCA on the spatial coordinate point cloud. Then the vessel is segmented 67 along the first principal axis, followed by an iterative process of segmentation and 68 fitting of a cubic spline. This results in a curvilinear coordinate system that reflects the 69 curvature of the vessel, which can be used to project the velocity field and segment the 70 vessel. 71 The process assumes that the geometry of the vessel is contiguous, non-bifurcating, 72 and has high aspect ratio. 73 Let x k be the coordinate vector, x k = (x 0 k , x 1 k , x 2 k ) ∈ R 3 where the solutions to the 74 incompressible Navier-Stokes equations are calculated and X ∈ R n×3 be the set 75 composed of all x k , k = 1, 2, · · · , n, X k, * = (x 0 k , x 1 k , x 2 k ). x k constitute the point cloud 76 we use for the analysis. Here x 0 k , x 1 k , x 2 k denote the x-, y-and z-coordinate of each point 77 cloud. To each x k , the hemodynamic variables are assigned, e.g. the 3D velocity vector 78 v k = (v 0 k , v 1 k , v 2 k ) ∈ R 3 and the pressure p k ∈ R + . v k and p k are all functions of x k and t. 79 We define W ∈ R n×3 as the velocity field, composed of v k such that W k, . 80 In order to discover this principle axis, effectively regularizing orientation, we employ 81 PCA. The principle axis is the direction of highest aspect ratio, along which the blood x i k , i = 0, 1, 2.
Let T 1 be the affine transformation that translates X by µ as below The principle axis can then be found using the SVD.

88
In the following SVD plays a crucial role. Each point in space can be thought of as a 89 sample of a three dimensional random variable in our setting. After centering the data, 90 the matrix T 1 (X) · T 1 (X) T is then the covariance matrix of the sampled random 91 variable, which projects a linear combination to a vector of covariances. The 92 eigenvectors of this operator are the linear combinations which give a multiple of 93 themselves as covariances. These eigenvectors can be interpreted as the directions in the 94 space where the sample of the random variable has the most variation. These 95 eigenvectors are the right singular vectors of T 1 (X). This gives us a basis that we can 96 use to represent the data, the one corresponding to the largest singular value having the 97 largest covariance, and therefore the accounting for the more variation in the data than 98 any other possible choice of basis vector. Since the points are a point cloud defining the 99 geometry of a three dimensional object, this gives us the direction where the object is 100 longest, effectively orienting the geometry in the way that is ideal linear approximation 101 for us to chop it up in the direction of the flow field.

102
Let the following be the SVD of T 1 (X) where U ∈ R n×n and V ∈ R 3×3 are unitary matrices and Σ ∈ R n×3 is a rectangular diagonal matrix with non-negative real values on its diagonal. Since T 1 (X) is real, U and V are orthonormal, i.e. U H = U and V H = V . The principle axes/directions are the columns of V . Let T 2 be the linear transformation of X projected onto the right principle vectors V T 2 : X −→ X · V.
LetX be the following In practice, one can use a sparse sampling of X for the SVD as it also provides a 105 good approximation to V . This is the case particularly because we are dealing with only 106 3 dimensions while there is a large number of points, so the shape of the vessel is well 107 represented by a relatively sparse random samples. Figure 1 shows Now we want to partition the vessel into m parts along the principal axis. First we 112 find the maximum coordinate and minimum coordinate values in the first principle 113 direction, x − = min(X * ,0 ) and x + = max(X * ,0 ). We split the vessel into m 114 non-overlapping intervals by diving the total interval L = [x − , x + ] = m j=1X j . Let X j be 115 the j th segment corresponding toX j in the original coordinate system. Similarly W j is 116 the corresponding velocity data corresponding to X j .

117
Center line projection 118 The process above results in a set of initial conditions to a nonlinear parametric fit to 119 the vascular geometry, which will be used to project the velocity field. The next step, is 120 to begin the nonlinear fit by calculating the centroids of the segments X j : where n j is the number of elements in X j . Now with m centroids, we can fit a 122 parametric curve through them. The fitting can be done exactly in a collocation sense, 123 January 3, 2021 4/20 but we choose cubic splines with a small smoothing parameter to keep the derivative 124 continuous. We will denote this as P (t), where t ∈ [0, 1] is the parametric variable. We 125 first use this curve P to re-partition the vessel into a larger number of points M , this is 126 done by nearest neighbors to each of the M points on the curve.

127
The centroids are calculated again, and a new curve is fit to these centroids. This 128 can be done several times to get a better fit. The following Figure 2 shows the resulting 129 bins after several iterations, with a final value of M = 80:  Now that we have a good curve fit, we proceed to construct the projection operator 131 for each point in X. This is done by sampling both P and ∂P ∂t at a larger number of 132 points N p = 300, t i ∈ T ⊂ [0, 1]. This sampling is the one used to project the velocity 133 field. Then, like the above segmentation process, we solve the nearest neighbor problem 134 for each x ∈ X: t x allows us to address the curve P (t) for each point x. The normalized derivative is 136 then calculated: Now, we find the null space of this vector and call them ν 1 (x) and ν 2 (x). Then we 138 stack them into a projection matrix This is used to project the velocity vector coresponding to point x. This is done for all 140 v ∈ W . In this section, we summarize the preprocessing algorithm and provide the pseudo codes. 143 The algorithm is broken into several functions and several stages for clarity. The first 144 stage finds the longest axis along the geometry and partitions the data along this axis. 145 This is a linear approximation to seed the process. The centroids of each partition are 146 used as points to fit a parametric cubic spline. This cubic spline is then used to partition the geometry into more segments, again calculating the centroids, and fitting 148 another spline.

149
After several rounds of this, the cubic spline fit is used as the centerline, 150 approximating the first dimensional geometry of the vessel. This is used in a coordinate 151 transform on the velocity field. The result is a velocity field represented in terms of its 152 flow with respect to the geometry of the vessel. Figure 3 shows the velocity fields before 153 (left) and after (right) the curvilinear projection. Notice how the sign of the velocity 154 flips due to the curvature of the vessel in the left image (red area), while this is absent 155 in the right image.   Algorithm 1: Initial linear partitioning. This serves serves as the initial conditions for the cubic spline fit. . Data: Spatial coordinates stacked row-wise in matrix X, velocity vectors stacked row-wise as matrix W , m (∼ 4 − 15) Chosen to maintain high aspect ratio and contiguity of partitions Project onto the right singular vectors T 2 :X := T 1 (X) · V Partition along the principal axis into m equal length intervals; sorting X bỹ X * ,0 and binning into partitions, {X j } m j=1 Algorithm 2: Segmentation. Partitioning along nonlinear axis. . Data: Spatial coordinates stacked row-wise in matrix X, M (about 10 -100), partitions {X j } m j=1 from Algorithm 1 Calculate the centroids for each partitionx k = 1 Find the nearest point on the sample of P : Partition X in this manner, so that each point in the partition belongs to a point Algorithm 3: Projection of velocity field. Transform of velocity field onto curvilinear co-ordinates. . input : Spatial coordinates stacked row-wise in matrix X, Velocity vectors stacked row-wise as matrix W , N p (larger, 300+). Partitions from ∂t . Find two orthonormal complements to ν 0 (t) and stack into a matrix V (t) = (ν 0 (t), ν 1 (t), ν 2 (t)) (column-wise). end for each row x in X do Find the nearest neighbor on P (t): t x = arg min t∈T P (t) − x and its associated projection operator V (t x ).
Project v, the element of W , corresponding to x onto basis V : TDA to the transformed velocity field. As mentioned in Introduction, the main purpose 164 of this paper is to show the applicability of the TDA analysis method proposed in [1] to 165 clinical data of general type of vasculature and to show that the proposed method is 166 efficient in describing stenotic disease. First we must briefly explain the key element of 167 TDA that we apply to the flow field. For more details, we refer readers to [1,10].

168
Let X be the topological space. The topological space X that we consider in this 169 paper is the space constructed by the velocity field. Note that, however, it is possible to 170 construct X from more hemodynamic variables than the velocity field such as the 171 pressure variable. Although more general space is interesting, such cases are not 172 considered in this paper. In order to construct a meaningful topological structure given 173 the point cloud, we first explain singular homology. Homology is a topological invariant 174 that describes the holes of various dimensions of X .

175
The assumption is that there exists X and the point cloud is a set of sampling out of 176 X . Singular homology is on X rather than the point cloud. An n-simplex is a convex 177 set composed of n + 1 vertices (e.g. points in velocity field in our case). For example, 178 0-simplex is a point, 1-simplex is an edge, 2-simplex is a filled triangle and 3-simplex is 179 a filled tetrahedron. The n-simplex is the n-dimensional version of the triangle. The 180 standard n-simplex is the convex hull of the standard basis in R n+1 . Singular n-simplex 181 in X is a continuous map σ from the standard n-simplex to X . Let C n (X ) be the free 182 abelian group whose basis is the set of singular n-simplices in X . Then we can define the 183 boundary map, δ n from C n (X ) to C n−1 (X ) where C n−1 (X ) is constructed from C n (X ) 184 by removing the vertices of C n (X ) one at a time. Then the n-dimensional homology 185 group H n is defined by H n (X ) = Ker δ n /Im δ n+1 where Ker and Im are the kernel 186 and image groups, respectively. Roughly we interpret the rank of H n as the number of 187 holes residing in the n-dimension of X . Thus if we can find H n for n = 0, 1, 2, · · · , we 188 can obtain a rough idea of the structure of X in terms of holes in it. However, it is 189 difficult to find H n of X because X is arbitrary in general. Thus, instead of dealing with 190 the homology of X , we use rather the homology of the roughly imitated space of X .

191
To imitate the original space X algorithmically, we consider a simplicial complex K 192 which is a collection of simplices. The collection satisfies the conditions that K contains 193 all lower-dimensional simplices of σ if σ is a simplex of K and that the intersection of 194 two simplices in K is a simplex in K if the intersection exists. Then in the similar way, 195 the homology group of K, H n (K) is defined as H n (K) = Ker δ n /Im δ n+1 where δ n is 196 defined similarly as above. K is a collection of simplicies, so given the point cloud, we 197 basically connect simplices hierarchically. The number of generators of H n (K) is known 198 as the Betti number, β n . Roughly speaking, β 0 is the number of the connected components of the resulting K, β 1 is the number of the one-dimensional holes of K, e.g. 200 the loop or cycle structures of K, β 2 is the number of the two-dimensional holes of K, 201 and so on.

202
In this work, we use the Vietoris-Rips simplicial complex, for which we consider the 203 metric space (X , d) where d is a usual Euclidean metric in this paper. Then the 204 Vietoris-Rips complex, V (τ ) is built on the point cloud out of X having a k-simplex for 205 every collection of k + 1 vertices within in a distance τ of each other in the Euclidean 206 metric. Here, τ is known as the filtration parameter. Thus the Vietoris-Rips complex is 207 constructed with the scale of τ . Once constructed, we can compute H n of V (τ ).

208
However, it is difficult to know which value of τ can generate the most similar simplex 209 to X . Thus, instead of finding H n for a fixed value of τ , we construct V (τ ) for various 210 values of τ in ascending order, say, V (τ i ), i = 0, 1, 2, · · · and τ i < τ j for i < j. Then we 211 have various H n (V (τ i )) for various τ i . Clearly given N number of vertices, if τ = 0, 212 there are N vertices isolated and not-connected and obviously β 0 = N , β 1 = 0, β 2 = 0, 213 and so on.  The following diagram shows the barcodes for H 0 (left) and H 1 (right), respectively, 226 for the point cloud which is composed of the four vertices of the unit square. From the 227 left barcode we know that there are four vertices given (4 bars in y-axis) when τ = 0. 228 We also know that when τ = 1 all the four vertices are connected and there remains together, for which a gluing condition is needed. For the condition, the topology is 241 described with a metric with the parameter τ explained above. The distance between 242 two simplices is determined using the Euclidean metric d. For example, two 0-simplices 243 must be connected forming an edge if their Euclidean distance is less than or equal to 244 the value of τ . Here we note that it is hard to use the raw data of vascular flows for 245 TDA. In [1] it was shown that the raw data needs to be transformed into a point cloud 246 that is suitable for a meaningful TDA. We use the S 2 projection proposed in [1] also 247 described in the following, but we also note that this is not a unique transform. There 248 could be a better transform for the construction of the simplicial complex.

268
According to [1], the 2D persistence homology of the velocity fields projected on S 2 269 can be used to determine whether there is a stenosis. Therefore, by decomposing the 270 whole vessel into smaller segments and calculating the 2D persistences homology within 271 each one, we can get the identity of the segment where stenosis occurs. With the 272 centerline projection methodology described previously, the segmentation of vessels is 273 obtained by chopping up the 3D space of vessels into M small segments along with the 274 parametric curve fit. One can see that the size of each segment is not necessarily the 275 same since it depends on the slope of the centerline in the segment region. 276 We work with the maximum persistence in both H 1 and H 2 . This is the length of 277 the longest interval in the persistence diagram, which can be visually understood as the 278 longest line in the bar code. The maximum H 2 persistence is used to determine the 279 segment closest to the stenosis.

281
Here we validate the proposed method using clinical data, which is the primary result of 282 this paper. We show a statistically significant correlation between the FFR and the 283 topological persistence calculated in the manner described in the previous sections. To 284 be specific, after segmentation of vessels with preprocessing, we consider the maximum 285 H 1 and H 2 persistences in each segment of vessels, which are the maximal length of the 286 generated 1D and 2D barcodes of the given point cloud, i.e. the velocity field projected 287 on S 2 in this work. We find that there is highly weak correlation in the case where the 288 preprocessing steps are omitted. This is due to the confounding factor introduced by The results are given in Table 1. The column labeled Segment number is the number 295 of the segment in the vessel where the largest maximum H 2 persistence occurs, which is 296 presumably the location of the stenosis. The column labeled FFR is the FFR value, 297 which is estimated by dividing a distal pressure by proximal pressure, both of which are 298 measured after a probable lesion is located using X-ray angiography. And the column 299 labeled H 1 Persistence is the maximum H 1 persistence in the segment where the largest 300 maximum H 2 persistence occurs. And the column labeled H 2 Persistence is the value of 301 the largest maximum H 2 persistence, i.e., the maximum H 2 persistence in the 302 corresponding Segment. The first column is the vessel id.    Figure 7 while the H 2 persistences 325 for the vessels of CBN0018, CBN0045 and CBN2122 are significant as shown in Figure 326 8. We also observe that the H 1 persistences of the vessels of CBN0013, CBN0582 and 327 CBN1096 are smaller than those of the vessels of CBN0018, CBN0045 and CBN2122.

328
Note that H 1 persistences in Figure 9 are existent while those in Figure 7 are not. This 329 is due the fact that the H 1 in Figure 7 is only from a segment of the given vessel but 330 the H 1 in Figure 9 is calculated using the point cloud from the whole vessel.  In Figure 11 , we show the correlation between the FFR values and the maximum 336 H 1 persistences from linear regression with least squares method to the corresponding points in each image. 340 We can see that the p-values for the linear fit are 0.138 and 0.00248 for the 341 non-preprocessed and preprocessed data, respectively. This demonstrates that we do not 342 have a statistically significant correlation between FFR and our potential diagnostic 343 index without the preprocessing scheme, but have a potentially useful diagnostic index 344 with TDA if the proper preprocessing is applied. based on TDA has been proposed, employing topological features as a potential 351 diagnostic index. The proposed method was tested with the CFD solutions calculated in 352 the straight stenotic vessels without curvatures in [1]. In this paper, we developed a 353 preprocessing method that enables the TDA method to be applied to the curved 354 stenotic vessel. We applied the proposed method to clinically sourced data, showing a 355 statistically significant correlation to the gold standard diagnostic index, the FFR. It 356 was also shown that the preprocessing method was necessary to achieve a statistically 357 significant correlation. This validates the proposed method as a potentially powerful 358 diagnostic tool.

359
In future work we plan to refine the preprocessing method and make adaptations for 360 application to more general vascular geometries such as bifurcated vessels, multiple 361 stenoses. There is also potential for improvement of computational efficiency, parameter 362 selection and other methodologies in the preprocessing scheme. 363 S1 Appendix.